Step-By-Step¶
This notebook guides you through a complete UrbanMapper
workflow, step-by-step, using the PLUTO
dataset in Downtown Brooklyn
.
We’ll load data, create a street intersections layer, impute missing coordinates, filter data, map it to intersections, enrich with average floors, and visualise the results interactively. This essentially walks through Basics/[1-6]
examples in a single notebook.
Data source used:
- PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change
import urban_mapper as um
# Initialise UrbanMapper
mapper = um.UrbanMapper()
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 1 ----> 1 import urban_mapper as um 3 # Initialise UrbanMapper 4 mapper = um.UrbanMapper() File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/__init__.py:3 1 from loguru import logger ----> 3 from .mixins import ( 4 LoaderMixin, 5 EnricherMixin, 6 VisualMixin, 7 TableVisMixin, 8 AuctusSearchMixin, 9 PipelineGeneratorMixin, 10 UrbanPipelineMixin, 11 ) 12 from .modules import ( 13 LoaderBase, 14 CSVLoader, (...) 30 PipelineGeneratorFactory, 31 ) 33 from .urban_mapper import UrbanMapper File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/mixins/__init__.py:1 ----> 1 from .loader import LoaderMixin 2 from .enricher import EnricherMixin 3 from .visual import VisualMixin File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/mixins/loader.py:1 ----> 1 from urban_mapper.modules.loader.loader_factory import LoaderFactory 4 class LoaderMixin(LoaderFactory): 5 def __init__(self): File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/__init__.py:1 ----> 1 from .loader import LoaderBase, CSVLoader, ShapefileLoader, ParquetLoader 2 from .imputer import ( 3 GeoImputerBase, 4 SimpleGeoImputer, 5 AddressGeoImputer, 6 ) 7 from .filter import ( 8 GeoFilterBase, 9 BoundingBoxFilter, 10 ) File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/loader/__init__.py:3 1 from .abc_loader import LoaderBase 2 from .loaders import CSVLoader, ShapefileLoader, ParquetLoader ----> 3 from .loader_factory import LoaderFactory 5 __all__ = [ 6 "LoaderBase", 7 "CSVLoader", (...) 10 "LoaderFactory", 11 ] File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/loader/loader_factory.py:19 17 from urban_mapper.modules.loader.loaders.csv_loader import CSVLoader 18 from urban_mapper.modules.loader.loaders.parquet_loader import ParquetLoader ---> 19 from urban_mapper.modules.loader.loaders.raster_loader import RasterLoader # Importing RasterLoader of the new raster loader module 20 from urban_mapper.modules.loader.loaders.shapefile_loader import ShapefileLoader 21 from urban_mapper.utils import require_attributes File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/loader/loaders/raster_loader.py:2 1 from ..abc_loader import LoaderBase ----> 2 import rasterio 3 from typing import Any 4 import numpy as np ModuleNotFoundError: No module named 'rasterio'
Step 1: Load Data¶
Goal: Load the PLUTO dataset to begin our analysis.
Input: A CSV dataset available per the OSCUR HuggingFace datasets hub containing PLUTO data with columns like longitude
, latitude
, and numfloors
. Replace with your own csv filepath here.
Output: A GeoDataFrame (gdf
) with the loaded data, tagged with longitude and latitude columns for geospatial analysis.
Here, we use the loader
module to read the CSV and specify the coordinate columns, making the data ready for geospatial operations.
# Note: For the documentation interactive mode, we only query 5000 records from the dataset. Feel free to remove for a more realistic analysis.
data = (
mapper
.loader
.from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True)
.with_columns(longitude_column="longitude", latitude_column="latitude")
.load()
)
data.head(10) # Preview the first ten rows
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[2], line 4 1 # Note: For the documentation interactive mode, we only query 5000 records from the dataset. Feel free to remove for a more realistic analysis. 3 data = ( ----> 4 mapper 5 .loader 6 .from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True) 7 .with_columns(longitude_column="longitude", latitude_column="latitude") 8 .load() 9 ) 10 data.head(10) # Preview the first ten rows NameError: name 'mapper' is not defined
Step 2: Create Urban Layer¶
Goal: Build a foundational layer of street intersections in Downtown Brooklyn
to map our data onto.
Input: A place name (Downtown Brooklyn, New York City, USA
) and mapping configuration (longitude
, latitude
, output column
, and threshold distance
).
Output: An UrbanLayer
object representing street intersections, ready to associate data points with specific intersections.
We use the urban_layer
module with type streets_intersections
, fetch the network via OSMnx (using drive
network type), and configure mapping to assign data points to the nearest intersection within 50 meters.
layer = (
mapper.urban_layer.with_type("streets_intersections")
.from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
.with_mapping(
longitude_column="longitude",
latitude_column="latitude",
output_column="nearest_intersection",
threshold_distance=50,
)
.build()
)
layer.static_render() # Visualise the plain intersections statically (Optional)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[3], line 2 1 layer = ( ----> 2 mapper.urban_layer.with_type("streets_intersections") 3 .from_place("Downtown Brooklyn, New York City, USA", network_type="drive") 4 .with_mapping( 5 longitude_column="longitude", 6 latitude_column="latitude", 7 output_column="nearest_intersection", 8 threshold_distance=50, 9 ) 10 .build() 11 ) 12 layer.static_render() # Visualise the plain intersections statically (Optional) NameError: name 'mapper' is not defined
Step 3: Impute Missing Data¶
Goal: Fill in missing longitude
and latitude
values to ensure all data points can be mapped and played with.
Input: The GeoDataFrame from Step 1 (with potential missing coordinates) and the urban layer from Step 2.
Output: A GeoDataFrame with imputed coordinates, reducing missing values.
The SimpleGeoImputer
from the imputer
module removes records that simply are having missing coordinates (naive way) –– Further look in the documentation for more. We check missing values before and after to see the effect.
print(f"Missing before: {data[['longitude', 'latitude']].isna().sum()}")
imputed_data = (
mapper.imputer.with_type("SimpleGeoImputer")
.on_columns(longitude_column="longitude", latitude_column="latitude")
.transform(data, layer)
)
print(f"Missing after: {imputed_data[['longitude', 'latitude']].isna().sum()}")
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 1 ----> 1 print(f"Missing before: {data[['longitude', 'latitude']].isna().sum()}") 2 imputed_data = ( 3 mapper.imputer.with_type("SimpleGeoImputer") 4 .on_columns(longitude_column="longitude", latitude_column="latitude") 5 .transform(data, layer) 6 ) 7 print(f"Missing after: {imputed_data[['longitude', 'latitude']].isna().sum()}") NameError: name 'data' is not defined
Step 4: Filter Data¶
Goal: Narrow down the data to only points within Downtown Brooklyn’s bounds.
Input: The imputed GeoDataFrame from Step 3 and the urban layer from Step 2.
Output: A filtered GeoDataFrame containing only data within the layer’s bounding box.
Using the BoundingBoxFilter
from the filter
module, we trim the dataset to match the spatial extent of our intersections layer, reducing irrelevant data.
print(f"Rows before: {len(imputed_data)}")
filtered_data = mapper.filter.with_type("BoundingBoxFilter").transform(
imputed_data, layer
)
print(f"Rows after: {len(filtered_data)}")
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[5], line 1 ----> 1 print(f"Rows before: {len(imputed_data)}") 2 filtered_data = mapper.filter.with_type("BoundingBoxFilter").transform( 3 imputed_data, layer 4 ) 5 print(f"Rows after: {len(filtered_data)}") NameError: name 'imputed_data' is not defined
Step 5: Map to Nearest Layer¶
Goal: Link each data point to its nearest street intersection so later on we could enrich the intersections with some basic aggregations or geo-statistics.
Input: The filtered GeoDataFrame from Step 4.
Output: An updated UrbanLayer
and a GeoDataFrame with a new nearest_intersection
column indicating the closest intersection for each point.
The map_nearest_layer
method uses the mapping configuration from Step 2 to associate data points with intersections, enabling spatial aggregation in the next step.
_, mapped_data = layer.map_nearest_layer(filtered_data) # Outputs both the layer (unnecessary here) and the mapped data
mapped_data.head() # Check the new 'nearest_intersection' column
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[6], line 1 ----> 1 _, mapped_data = layer.map_nearest_layer(filtered_data) # Outputs both the layer (unnecessary here) and the mapped data 2 mapped_data.head() # Check the new 'nearest_intersection' column NameError: name 'layer' is not defined
Step 6: Enrich the Layer¶
Goal: Add meaningful insights by calculating the average number of floors per intersection.
Input: The mapped GeoDataFrame from Step 5 and the urban layer from Step 2.
Output: An enriched UrbanLayer
with an avg_floors
column in its GeoDataFrame.
The enricher
module aggregates the numfloors
column by nearest_intersection
using the mean, adding this statistic to the layer for visualisation or further analysis like Machine Learning-based.
enricher = (
mapper.enricher.with_data(group_by="nearest_intersection", values_from="numfloors")
.aggregate_by(method="mean", output_column="avg_floors")
.build()
)
enriched_layer = enricher.enrich(mapped_data, layer)
enriched_layer.get_layer().head() # Preview the enriched layer's GeoDataFrame content
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[7], line 2 1 enricher = ( ----> 2 mapper.enricher.with_data(group_by="nearest_intersection", values_from="numfloors") 3 .aggregate_by(method="mean", output_column="avg_floors") 4 .build() 5 ) 6 enriched_layer = enricher.enrich(mapped_data, layer) 7 enriched_layer.get_layer().head() # Preview the enriched layer's GeoDataFrame content NameError: name 'mapper' is not defined
Step 7: Visualise Results¶
Goal: Display the enriched data on an interactive map for exploration.
Input: The enriched GeoDataFrame from Step 6.
Output: An interactive Folium map showing average floors per intersection with a dark theme.
The visual
module creates an interactive map with the Interactive
type and a dark CartoDB dark_matter
style, highlighting the avg_floors
column.
fig = (
mapper.visual.with_type("Interactive")
.with_style({"tiles": "CartoDB dark_matter", "colorbar_text_color": "white"})
.show(columns=["avg_floors"]) # Show the avg_floors column
.render(enriched_layer.get_layer())
)
fig # Display the map
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[8], line 2 1 fig = ( ----> 2 mapper.visual.with_type("Interactive") 3 .with_style({"tiles": "CartoDB dark_matter", "colorbar_text_color": "white"}) 4 .show(columns=["avg_floors"]) # Show the avg_floors column 5 .render(enriched_layer.get_layer()) 6 ) 7 fig # Display the map NameError: name 'mapper' is not defined
Conclusion¶
Congratulations! You’ve completed a full UrbanMapper workflow, step-by-step. You’ve transformed raw PLUTO data into a visually rich map of average building floors per intersection in Downtown Brooklyn. For a more streamlined approach, check out the Pipeline End-To-End notebook!