Overture Instead of OSM – Easy Pipeline¶
In a nutshell, yes 100% you can! However, can it be much better integrated? Of course, always!
The following notebook showcases the UrbanMapper
library to process and visualise building counts
along road segments
in Manhattan, NYC
using data coming entirely from Overture
. It follows a structured pipeline approach, including data loading, filtering, enrichment, and visualisation.
Setup¶
Prior all, let's simply initialise an UrbanMapper
instance, setting the foundation for the pipeline.
import urban_mapper as um
import geopandas as gpd
from urban_mapper.pipeline import UrbanPipeline
mapper = um.UrbanMapper()
Pre Requisites –– Data Preparation¶
As the goal is to use Overture
data we must ensure to have them prior all. To do so, follow the (1) https://docs.overturemaps.org/getting-data/overturemaps-py/ or (2) assuming you already have overture installed as in your general pip
packages (in your CLI):
overturemaps download --bbox=-74.257159,40.495992,-73.699215,40.915568 -f geojson --type=segment -o nyc_segments.geojson
overturemaps download --bbox=-74.016367,40.702726,-73.934212,40.821589 -f geoparquet --type=building -o manhattan_buildings.parquet
This will, nothing more than downloading the right information (roads
and buildings
) at the right location from Overture
to proceed with Urban Mapper
.
Next we simply need to make sure to clip the segments acquired from Overture
, to Manhattan
for computation's sake, but feel free to explore more!
from shapely.geometry import Polygon
west, south, east, north = -74.016367, 40.702726, -73.934212, 40.821589
bbox = Polygon([(west, south), (east, south), (east, north), (west, north)])
roads_gdf = gpd.read_file("./nyc_segments.geojson")
if roads_gdf.crs != "EPSG:4326":
roads_gdf = roads_gdf.to_crs("EPSG:4326")
road_subtype_gdf = roads_gdf[ # Keeping only the essential!
(roads_gdf['subtype'] == 'road') &
(roads_gdf['class'].isin(['motorway', 'residential', 'living_street', 'primary', 'secondary']))
]
filtered_roads = gpd.clip(road_subtype_gdf, bbox)
filtered_roads.reset_index(drop=True, inplace=True)
filtered_roads.to_file("manhattan_roads.geojson", driver="GeoJSON")
Pre-Requisites –– Transforming the buildings into Shapefile
¶
The following step converts building data from a parquet
file to a shapefile
, as the UrbanMapper API
currently requires shapefile
input for longitude
and latitude
to be automatically inferred as later-on are heavily required.
If the parquet
buildings file was having longitude
and longitude
the following step would not be required.
Meanwhile, note that the mechanism behind our ShapefileLoader
will need to be repeated in the Parquet
's one and others to allow for input files to not have longitude
and latitude
by default in, yet, via geometry
coordinates should automatically be inferred. Mechanism is present already, simply needs to be scaled to more primitives.
tmp_nyc_buildings = gpd.read_parquet("./manhattan_buildings.parquet")
tmp_nyc_buildings.to_file("./manhattan_buildings.shp")
Component Instantiation: Loader
¶
The loader
component is defined to read the preprocessed building data from the shapefile. Make the primitive ready to be used throughout the pipeline
later on.
loader = (
mapper.loader
.from_file("./manhattan_buildings.shp")
.build()
)
Component Instantiation: Urban Layer
¶
The urban layer
component uses the filtered road segments
, mapping building coordinates to the nearest road. Make the primitive ready to be used throughout the pipeline
later on.
urban_layer = (
mapper.urban_layer
.with_type("custom_urban_layer")
.from_file("./manhattan_roads.geojson")
.with_mapping(
longitude_column="temporary_longitude",
latitude_column="temporary_latitude",
output_column="nearest_road"
)
.build()
)
Component Instantiation: Imputer
¶
The imputer
fills in missing longitude and latitude values to ensure data integrity. Make the primitive ready to be used throughout the pipeline
later on.
imputer = (
mapper.imputer
.with_type("SimpleGeoImputer")
.on_columns("temporary_longitude", "temporary_latitude")
.build()
)
Component Instantiation: Filter
¶
The filter
applies a bounding box to refine the dataset spatially, making sure no buildings from Brooklyn
are being attached to a road around Manhattan
. Make the primitive ready to be used throughout the pipeline
later on.
filter_step = (
mapper.filter
.with_type("BoundingBoxFilter")
.build()
)
Component Instantiation: Enricher
¶
The following enricher
counts buildings
per road segment
, providing the key analytical output. Make the primitive ready to be used throughout the pipeline
later on.
building_count = (
mapper.enricher
.with_data(group_by="nearest_road")
.count_by(output_column="building_count")
.build()
)
Component Instantiation: Visualiser
¶
The visualiser
sets up a basic static matplotlib figure. Make the primitive ready to be used throughout the pipeline
later on.
visualiser = (
mapper.visual
.with_type("Static")
.build()
)
Pipeline Assembly¶
The pipeline combines all pre-instantiated components in a logical sequence for processing.
pipeline = UrbanPipeline([
("loader", loader),
("urban_layer", urban_layer),
("impute", imputer),
("filter", filter_step),
("enrich_building_count", building_count),
("visualiser", visualiser),
])
Pipeline Execution¶
This step runs the pipeline, transforming the data and generating the enriched layer. Note that there is a nice animation during the pipeline execution for you to follow-up with what's going on!
mapped_data, enriched_layer = pipeline.compose_transform()
Visualisation¶
The enriched layer is visualised, showing building counts along road segments statically.
fig = pipeline.visualise([
"building_count",
])
Export Results¶
Finally, the processed data is saved to a JupyterGIS file
for future analysis in a collaborative-in-real-time manner.
pipeline.to_jgis(
filepath="new_york_city_overture_easy_pipeline.JGIS",
urban_layer_name="NYC Overture Roads & Buildings – Easy Pipeline"
)