Overture Instead of OSM – Advanced Pipeline¶
In a nutshell, yes 100%! However, can it be much better integrated? Of course, always!
The following notebook showcases the UrbanMapper
library to process and visualise building counts
among multiple other enrichments along road segments
in Manhattan, NYC
using data coming entirely from Overture
. It follows a structured pipeline approach, including data loading, filtering, enrichment, and visualisation.
Setup¶
Prior all, let's simply initialise an UrbanMapper
instance, setting the foundation for the pipeline.
import urban_mapper as um
import geopandas as gpd
from urban_mapper.pipeline import UrbanPipeline
mapper = um.UrbanMapper()
Pre Requisites –– Data Preparation¶
Make sure you went through the overture_pipeline.py
easy mode to understand how to get the data and the right way.
Cheers!
Component Instantiation: Loader
¶
The loader
component is defined to read the preprocessed building data from the shapefile. Make the primitive ready to be used throughout the pipeline
later on.
loader = (
mapper.loader
.from_file("./manhattan_buildings.shp")
.build()
)
Component Instantiation: Urban Layer
, Imputer
, and Filter
¶
- Urban Layer: Loads road segments from
manhattan_roads.geojson
and sets up the mapping configuration to associate building data with the nearest road. - Imputer: Uses
SimpleGeoImputer
to handle missing longitude and latitude values, ensuring all data points can be mapped. - Filter: Applies a
BoundingBoxFilter
to retain only the data (buildings) within Manhattan’s spatial bounds.
Here, this is making the primitives ready to be used throughout the pipeline
later on.
urban_layer = (
mapper.urban_layer
.with_type("custom_urban_layer")
.from_file("./manhattan_roads.geojson")
.with_mapping(
longitude_column="temporary_longitude",
latitude_column="temporary_latitude",
output_column="nearest_road"
)
.build()
)
imputer = (
mapper.imputer
.with_type("SimpleGeoImputer")
.on_columns("temporary_longitude", "temporary_latitude")
.build()
)
filter_step = (
mapper.filter
.with_type("BoundingBoxFilter")
.build()
)
Component Instantiation: Enrichers
¶
Multiple enrichers
are defined to compute various building
characteristics per road segment
:
- Building count
- Proportion of multi-floor buildings
- Average building height
- Predominant facade color (as a name)
- Predominant building class
- Average number of floors
- Proportion of underground buildings
- Height variety (standard deviation)
- Proportion of named buildings
These enrichers provide a comprehensive analysis of the building landscape along each road segment.
–––
We first partially install a library needed throughout one of the enricher. Followed by defining all lambda functions to accurately explore the buildings
dataset throughout the various enrichers next defined.
Lastly, recall that here, this is making the primitives ready to be used throughout the pipeline
later on.
!uv pip install colory # could be without uv depending on your environmnent
import pandas as pd
import numpy as np
from colory.color import Color
def proportion_multi_floor(series):
if series.empty or series.isna().all():
return 0.0
multi_floor_count = (series > 1).sum()
total_count = series.notna().sum()
return multi_floor_count / total_count if total_count > 0 else 0.0
def most_common_value(series):
if series.empty or series.isna().all():
return None
mode = series.mode()
return mode.iloc[0] if not mode.empty else None
def proportion_underground(series):
if series.empty or series.isna().all():
return 0.0
underground_count = series.sum()
total_count = series.notna().sum()
return underground_count / total_count if total_count > 0 else 0.0
def proportion_named(series):
if series.empty or series.isna().all():
return 0.0
named_count = series.notna().sum()
total_count = series.notna().sum()
return named_count / total_count if total_count > 0 else 0.0
def hex_to_rgb(hex_color):
hex_color = hex_color.lstrip('#')
return np.array([int(hex_color[i:i+2], 16) for i in (0, 2, 4)])
def rgb_to_hex(rgb):
return '#{:02x}{:02x}{:02x}'.format(*rgb)
def avg_color_name(series):
valid_colors = [color for color in series if color and isinstance(color, str) and color.startswith('#')]
if not valid_colors:
return 'Unknown'
rgb_values = [hex_to_rgb(color) for color in valid_colors]
avg_rgb = np.mean(rgb_values, axis=0).astype(int)
avg_hex = rgb_to_hex(avg_rgb)
return Color(avg_hex, 'xkcd').name
building_count = (
mapper.enricher
.with_data(group_by="nearest_road")
.count_by(output_column="building_count")
.build()
)
multi_floor = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="num_floors")
.aggregate_by(method=proportion_multi_floor, output_column="prop_multi_floor")
.build()
)
avg_height = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="height")
.aggregate_by(method="mean", output_column="avg_height")
.build()
)
predom_color_name = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="facade_col")
.aggregate_by(method=avg_color_name, output_column="avg_facade_color_name")
.build()
)
predom_class = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="class")
.aggregate_by(method=most_common_value, output_column="predom_building_class")
.build()
)
avg_floors = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="num_floors")
.aggregate_by(method="mean", output_column="avg_floors")
.build()
)
undergr_prop = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="is_undergr")
.aggregate_by(method=proportion_underground, output_column="prop_underground")
.build()
)
height_variety = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="height")
.aggregate_by(method=lambda x: x.std(), output_column="height_std_dev")
.build()
)
named_prop = (
mapper.enricher
.with_data(group_by="nearest_road", values_from="names")
.aggregate_by(method=proportion_named, output_column="prop_named_buildings")
.build()
)
Component Instantiation: Visualiser
¶
The visualiser is configured for an interactive map
with a dark theme
, displaying the following enriched columns in tooltips:
- Building count
- Proportion of multi-floor buildings
- Average building height
- Predominant facade color name
- Predominant building class
- Average number of floors
- Proportion of underground buildings
- Height variety (standard deviation)
- Proportion of named buildings
visualiser = (
mapper.visual
.with_type("Interactive")
.with_style({
"tiles": "CartoDB dark_matter",
"tooltip": [
"building_count",
"prop_multi_floor",
"avg_height",
"avg_facade_color_name",
"predom_building_class",
"avg_floors",
"prop_underground",
"height_std_dev",
"prop_named_buildings"
],
"colorbar_text_color": "white",
})
.build()
)
Pipeline Assembly¶
The pipeline combines all pre-instantiated components in a logical sequence for processing.
pipeline = UrbanPipeline([
("loader", loader),
("urban_layer", urban_layer),
("impute", imputer),
("filter", filter_step),
("enrich_building_count", building_count),
("enrich_multi_floor", multi_floor),
("enrich_avg_height", avg_height),
("enrich_predom_color", predom_color_name),
("enrich_predom_class", predom_class),
("enrich_avg_floors", avg_floors),
("enrich_undergr_prop", undergr_prop),
("enrich_height_variety", height_variety),
("enrich_named_prop", named_prop),
("visualiser", visualiser),
])
Pipeline Execution¶
This step runs the pipeline, transforming the data and generating the enriched layer. Note that there is a nice animation during the pipeline execution for you to follow-up with what's going on!
mapped_data, enriched_layer = pipeline.compose_transform()
Visualisation¶
The enriched layer is visualised interactively, displaying multiple building characteristics along road segments, including building counts, height metrics, and facade color names. This allows for an in-depth exploration of the data.
Feel free to use the tiny widger appearing above the map to focus on a specific enriched column.
fig = pipeline.visualise([
"building_count",
"prop_multi_floor",
"avg_height",
"avg_facade_color_name",
"predom_building_class",
"avg_floors",
"prop_underground",
"height_std_dev",
"prop_named_buildings"
])
fig
Export Results¶
Finally, the processed data is saved to a JupyterGIS file
for future analysis in a collaborative-in-real-time manner.
pipeline.to_jgis(
filepath="new_york_city_overture_advanced_pipeline.JGIS",
urban_layer_name="NYC Overture Roads & Buildings –– Advanced Pipeline"
)