Overture Instead of OSM – Advanced Pipeline¶

In a nutshell, yes 100%! However, can it be much better integrated? Of course, always!

The following notebook showcases the UrbanMapper library to process and visualise building counts among multiple other enrichments along road segments in Manhattan, NYC using data coming entirely from Overture. It follows a structured pipeline approach, including data loading, filtering, enrichment, and visualisation.

https://overturemaps.org/

Setup¶

Prior all, let's simply initialise an UrbanMapper instance, setting the foundation for the pipeline.

In [ ]:

Copied!

import urban_mapper as um
import geopandas as gpd
from urban_mapper.pipeline import UrbanPipeline

mapper = um.UrbanMapper()
import urban_mapper as um
import geopandas as gpd
from urban_mapper.pipeline import UrbanPipeline

mapper = um.UrbanMapper()

Pre Requisites –– Data Preparation¶

Make sure you went through the overture_pipeline.py easy mode to understand how to get the data and the right way.

Cheers!

Component Instantiation: `Loader`¶

The loader component is defined to read the preprocessed building data from the shapefile. Make the primitive ready to be used throughout the pipeline later on.

In [ ]:

Copied!





loader = (
    mapper.loader
    .from_file("./manhattan_buildings.shp")
    .build()
)
loader = (
    mapper.loader
    .from_file("./manhattan_buildings.shp")
    .build()
)

Component Instantiation: `Urban Layer`, `Imputer`, and `Filter`¶

Urban Layer: Loads road segments from manhattan_roads.geojson and sets up the mapping configuration to associate building data with the nearest road.
Imputer: Uses SimpleGeoImputer to handle missing longitude and latitude values, ensuring all data points can be mapped.
Filter: Applies a BoundingBoxFilter to retain only the data (buildings) within Manhattan’s spatial bounds.

Here, this is making the primitives ready to be used throughout the pipeline later on.

In [ ]:

Copied!





urban_layer = (
    mapper.urban_layer
    .with_type("custom_urban_layer")
    .from_file("./manhattan_roads.geojson")
    .with_mapping(
        longitude_column="temporary_longitude",
        latitude_column="temporary_latitude",
        output_column="nearest_road"
    )
    .build()
)

imputer = (
    mapper.imputer
    .with_type("SimpleGeoImputer")
    .on_columns("temporary_longitude", "temporary_latitude")
    .build()
)

filter_step = (
    mapper.filter
    .with_type("BoundingBoxFilter")
    .build()
)
urban_layer = (
    mapper.urban_layer
    .with_type("custom_urban_layer")
    .from_file("./manhattan_roads.geojson")
    .with_mapping(
        longitude_column="temporary_longitude",
        latitude_column="temporary_latitude",
        output_column="nearest_road"
    )
    .build()
)

imputer = (
    mapper.imputer
    .with_type("SimpleGeoImputer")
    .on_columns("temporary_longitude", "temporary_latitude")
    .build()
)

filter_step = (
    mapper.filter
    .with_type("BoundingBoxFilter")
    .build()
)

Component Instantiation: `Enrichers`¶

Multiple enrichers are defined to compute various building characteristics per road segment:

Building count
Proportion of multi-floor buildings
Average building height
Predominant facade color (as a name)
Predominant building class
Average number of floors
Proportion of underground buildings
Height variety (standard deviation)
Proportion of named buildings

These enrichers provide a comprehensive analysis of the building landscape along each road segment.

–––

We first partially install a library needed throughout one of the enricher. Followed by defining all lambda functions to accurately explore the buildings dataset throughout the various enrichers next defined.

Lastly, recall that here, this is making the primitives ready to be used throughout the pipeline later on.

In [ ]:

Copied!

!uv pip install colory # could be without uv depending on your environmnent
!uv pip install colory # could be without uv depending on your environmnent

In [ ]:

Copied!





import pandas as pd
import numpy as np
from colory.color import Color

def proportion_multi_floor(series):
    if series.empty or series.isna().all():
        return 0.0
    multi_floor_count = (series > 1).sum()
    total_count = series.notna().sum()
    return multi_floor_count / total_count if total_count > 0 else 0.0

def most_common_value(series):
    if series.empty or series.isna().all():
        return None
    mode = series.mode()
    return mode.iloc[0] if not mode.empty else None

def proportion_underground(series):
    if series.empty or series.isna().all():
        return 0.0
    underground_count = series.sum()
    total_count = series.notna().sum()
    return underground_count / total_count if total_count > 0 else 0.0

def proportion_named(series):
    if series.empty or series.isna().all():
        return 0.0
    named_count = series.notna().sum()
    total_count = series.notna().sum()
    return named_count / total_count if total_count > 0 else 0.0

def hex_to_rgb(hex_color):
    hex_color = hex_color.lstrip('#')
    return np.array([int(hex_color[i:i+2], 16) for i in (0, 2, 4)])

def rgb_to_hex(rgb):
    return '#{:02x}{:02x}{:02x}'.format(*rgb)

def avg_color_name(series):
    valid_colors = [color for color in series if color and isinstance(color, str) and color.startswith('#')]
    if not valid_colors:
        return 'Unknown'
    rgb_values = [hex_to_rgb(color) for color in valid_colors]
    avg_rgb = np.mean(rgb_values, axis=0).astype(int)
    avg_hex = rgb_to_hex(avg_rgb)
    return Color(avg_hex, 'xkcd').name
import pandas as pd
import numpy as np
from colory.color import Color

def proportion_multi_floor(series):
    if series.empty or series.isna().all():
        return 0.0
    multi_floor_count = (series > 1).sum()
    total_count = series.notna().sum()
    return multi_floor_count / total_count if total_count > 0 else 0.0

def most_common_value(series):
    if series.empty or series.isna().all():
        return None
    mode = series.mode()
    return mode.iloc[0] if not mode.empty else None

def proportion_underground(series):
    if series.empty or series.isna().all():
        return 0.0
    underground_count = series.sum()
    total_count = series.notna().sum()
    return underground_count / total_count if total_count > 0 else 0.0

def proportion_named(series):
    if series.empty or series.isna().all():
        return 0.0
    named_count = series.notna().sum()
    total_count = series.notna().sum()
    return named_count / total_count if total_count > 0 else 0.0

def hex_to_rgb(hex_color):
    hex_color = hex_color.lstrip('#')
    return np.array([int(hex_color[i:i+2], 16) for i in (0, 2, 4)])

def rgb_to_hex(rgb):
    return '#{:02x}{:02x}{:02x}'.format(*rgb)

def avg_color_name(series):
    valid_colors = [color for color in series if color and isinstance(color, str) and color.startswith('#')]
    if not valid_colors:
        return 'Unknown'
    rgb_values = [hex_to_rgb(color) for color in valid_colors]
    avg_rgb = np.mean(rgb_values, axis=0).astype(int)
    avg_hex = rgb_to_hex(avg_rgb)
    return Color(avg_hex, 'xkcd').name

In [ ]:

Copied!





building_count = (
    mapper.enricher
    .with_data(group_by="nearest_road")
    .count_by(output_column="building_count")
    .build()
)

multi_floor = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="num_floors")
    .aggregate_by(method=proportion_multi_floor, output_column="prop_multi_floor")
    .build()
)

avg_height = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="height")
    .aggregate_by(method="mean", output_column="avg_height")
    .build()
)

predom_color_name = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="facade_col")
    .aggregate_by(method=avg_color_name, output_column="avg_facade_color_name")
    .build()
)

predom_class = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="class")
    .aggregate_by(method=most_common_value, output_column="predom_building_class")
    .build()
)

avg_floors = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="num_floors")
    .aggregate_by(method="mean", output_column="avg_floors")
    .build()
)

undergr_prop = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="is_undergr")
    .aggregate_by(method=proportion_underground, output_column="prop_underground")
    .build()
)

height_variety = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="height")
    .aggregate_by(method=lambda x: x.std(), output_column="height_std_dev")
    .build()
)

named_prop = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="names")
    .aggregate_by(method=proportion_named, output_column="prop_named_buildings")
    .build()
)
building_count = (
    mapper.enricher
    .with_data(group_by="nearest_road")
    .count_by(output_column="building_count")
    .build()
)

multi_floor = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="num_floors")
    .aggregate_by(method=proportion_multi_floor, output_column="prop_multi_floor")
    .build()
)

avg_height = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="height")
    .aggregate_by(method="mean", output_column="avg_height")
    .build()
)

predom_color_name = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="facade_col")
    .aggregate_by(method=avg_color_name, output_column="avg_facade_color_name")
    .build()
)

predom_class = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="class")
    .aggregate_by(method=most_common_value, output_column="predom_building_class")
    .build()
)

avg_floors = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="num_floors")
    .aggregate_by(method="mean", output_column="avg_floors")
    .build()
)

undergr_prop = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="is_undergr")
    .aggregate_by(method=proportion_underground, output_column="prop_underground")
    .build()
)

height_variety = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="height")
    .aggregate_by(method=lambda x: x.std(), output_column="height_std_dev")
    .build()
)

named_prop = (
    mapper.enricher
    .with_data(group_by="nearest_road", values_from="names")
    .aggregate_by(method=proportion_named, output_column="prop_named_buildings")
    .build()
)

Component Instantiation: `Visualiser`¶

The visualiser is configured for an interactive map with a dark theme, displaying the following enriched columns in tooltips:

Building count
Proportion of multi-floor buildings
Average building height
Predominant facade color name
Predominant building class
Average number of floors
Proportion of underground buildings
Height variety (standard deviation)
Proportion of named buildings

In [ ]:

Copied!





visualiser = (
    mapper.visual
    .with_type("Interactive")
    .with_style({
            "tiles": "CartoDB dark_matter",
            "tooltip": [
                "building_count",
                "prop_multi_floor",
                "avg_height",
                "avg_facade_color_name",
                "predom_building_class",
                "avg_floors",
                "prop_underground",
                "height_std_dev",
                "prop_named_buildings"
            ],
            "colorbar_text_color": "white",
    })
    .build()
)
visualiser = (
    mapper.visual
    .with_type("Interactive")
    .with_style({
            "tiles": "CartoDB dark_matter",
            "tooltip": [
                "building_count",
                "prop_multi_floor",
                "avg_height",
                "avg_facade_color_name",
                "predom_building_class",
                "avg_floors",
                "prop_underground",
                "height_std_dev",
                "prop_named_buildings"
            ],
            "colorbar_text_color": "white",
    })
    .build()
)

Pipeline Assembly¶

The pipeline combines all pre-instantiated components in a logical sequence for processing.

In [ ]:

Copied!





pipeline = UrbanPipeline([
    ("loader", loader),
    ("urban_layer", urban_layer),
    ("impute", imputer),
    ("filter", filter_step),
    ("enrich_building_count", building_count),
    ("enrich_multi_floor", multi_floor),
    ("enrich_avg_height", avg_height),
    ("enrich_predom_color", predom_color_name),
    ("enrich_predom_class", predom_class),
    ("enrich_avg_floors", avg_floors),
    ("enrich_undergr_prop", undergr_prop),
    ("enrich_height_variety", height_variety),
    ("enrich_named_prop", named_prop),
    ("visualiser", visualiser),
])
pipeline = UrbanPipeline([
    ("loader", loader),
    ("urban_layer", urban_layer),
    ("impute", imputer),
    ("filter", filter_step),
    ("enrich_building_count", building_count),
    ("enrich_multi_floor", multi_floor),
    ("enrich_avg_height", avg_height),
    ("enrich_predom_color", predom_color_name),
    ("enrich_predom_class", predom_class),
    ("enrich_avg_floors", avg_floors),
    ("enrich_undergr_prop", undergr_prop),
    ("enrich_height_variety", height_variety),
    ("enrich_named_prop", named_prop),
    ("visualiser", visualiser),
])

Pipeline Execution¶

This step runs the pipeline, transforming the data and generating the enriched layer. Note that there is a nice animation during the pipeline execution for you to follow-up with what's going on!

In [ ]:

Copied!

mapped_data, enriched_layer = pipeline.compose_transform()
mapped_data, enriched_layer = pipeline.compose_transform()

Visualisation¶

The enriched layer is visualised interactively, displaying multiple building characteristics along road segments, including building counts, height metrics, and facade color names. This allows for an in-depth exploration of the data.

Feel free to use the tiny widger appearing above the map to focus on a specific enriched column.

In [ ]:

Copied!





fig = pipeline.visualise([
    "building_count",
    "prop_multi_floor",
    "avg_height",
    "avg_facade_color_name",
    "predom_building_class",
    "avg_floors",
    "prop_underground",
    "height_std_dev",
    "prop_named_buildings"
])
fig
fig = pipeline.visualise([
    "building_count",
    "prop_multi_floor",
    "avg_height",
    "avg_facade_color_name",
    "predom_building_class",
    "avg_floors",
    "prop_underground",
    "height_std_dev",
    "prop_named_buildings"
])
fig

Export Results¶

Finally, the processed data is saved to a JupyterGIS file for future analysis in a collaborative-in-real-time manner.

https://jupytergis.readthedocs.io/

In [ ]:

Copied!





pipeline.to_jgis(
    filepath="new_york_city_overture_advanced_pipeline.JGIS",
    urban_layer_name="NYC Overture Roads & Buildings –– Advanced Pipeline"
)
pipeline.to_jgis(
    filepath="new_york_city_overture_advanced_pipeline.JGIS",
    urban_layer_name="NYC Overture Roads & Buildings –– Advanced Pipeline"
)

In [ ]:

Copied!

Overture Instead of OSM – Advanced Pipeline¶

Setup¶

Pre Requisites –– Data Preparation¶

Component Instantiation: Loader¶

Component Instantiation: Urban Layer, Imputer, and Filter¶

Component Instantiation: Enrichers¶

Component Instantiation: Visualiser¶

Pipeline Assembly¶

Pipeline Execution¶

Visualisation¶

Export Results¶

Component Instantiation: `Loader`¶

Component Instantiation: `Urban Layer`, `Imputer`, and `Filter`¶

Component Instantiation: `Enrichers`¶

Component Instantiation: `Visualiser`¶