Enricher¶
In this notebook we’ll sprinkle some extra magic onto your urban layers. Let’s add e.g. average building floors to a layer and see it sparkle!
Data source used:
- PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change
Let’s jazz things up! 🏙️
import urban_mapper as um
# Start UrbanMapper
mapper = um.UrbanMapper()
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 1 ----> 1 import urban_mapper as um 3 # Start UrbanMapper 4 mapper = um.UrbanMapper() File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/__init__.py:3 1 from loguru import logger ----> 3 from .mixins import ( 4 LoaderMixin, 5 EnricherMixin, 6 VisualMixin, 7 TableVisMixin, 8 AuctusSearchMixin, 9 PipelineGeneratorMixin, 10 UrbanPipelineMixin, 11 ) 12 from .modules import ( 13 LoaderBase, 14 CSVLoader, (...) 30 PipelineGeneratorFactory, 31 ) 33 from .urban_mapper import UrbanMapper File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/mixins/__init__.py:1 ----> 1 from .loader import LoaderMixin 2 from .enricher import EnricherMixin 3 from .visual import VisualMixin File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/mixins/loader.py:1 ----> 1 from urban_mapper.modules.loader.loader_factory import LoaderFactory 4 class LoaderMixin(LoaderFactory): 5 def __init__(self): File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/__init__.py:1 ----> 1 from .loader import LoaderBase, CSVLoader, ShapefileLoader, ParquetLoader 2 from .imputer import ( 3 GeoImputerBase, 4 SimpleGeoImputer, 5 AddressGeoImputer, 6 ) 7 from .filter import ( 8 GeoFilterBase, 9 BoundingBoxFilter, 10 ) File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/loader/__init__.py:3 1 from .abc_loader import LoaderBase 2 from .loaders import CSVLoader, ShapefileLoader, ParquetLoader ----> 3 from .loader_factory import LoaderFactory 5 __all__ = [ 6 "LoaderBase", 7 "CSVLoader", (...) 10 "LoaderFactory", 11 ] File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/loader/loader_factory.py:19 17 from urban_mapper.modules.loader.loaders.csv_loader import CSVLoader 18 from urban_mapper.modules.loader.loaders.parquet_loader import ParquetLoader ---> 19 from urban_mapper.modules.loader.loaders.raster_loader import RasterLoader # Importing RasterLoader of the new raster loader module 20 from urban_mapper.modules.loader.loaders.shapefile_loader import ShapefileLoader 21 from urban_mapper.utils import require_attributes File ~/checkouts/readthedocs.org/user_builds/urbanmapper/checkouts/70/src/urban_mapper/modules/loader/loaders/raster_loader.py:2 1 from ..abc_loader import LoaderBase ----> 2 import rasterio 3 from typing import Any 4 import numpy as np ModuleNotFoundError: No module named 'rasterio'
Loading Data and Creating a Layer¶
First, let’s grab some PLUTO data and set up a street intersections layer for Downtown Brooklyn.
Note that:
- Loader example can be seen in
examples/Basics/loader.ipynb
- Urban Layer example can be seen in
examples/Basics/urban_layer.ipynb
- Imputer example can be seen in
examples/Basics/imputer.ipynb
# Load data
# Note: For the documentation interactive mode, we only query 5000 records from the dataset. Feel free to remove for a more realistic analysis.
data = (
mapper
.loader
.from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True).with_columns("longitude", "latitude").load()
# From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude`
)
# Create urban layer
layer = (
mapper
.urban_layer # From the urban_layer module
.with_type("streets_intersections") # With the type streets_intersections
.from_place("Downtown Brooklyn, New York City, USA") # From place
.build()
)
# Impute your data if they contain missing values
data = (
mapper
.imputer # From the imputer module
.with_type("SimpleGeoImputer") # With the type SimpleGeoImputer
.on_columns(longitude_column="longitude", latitude_column="latitude") # On the columns longitude and latitude
.transform(data, layer) # All imputers require access to the urban layer in case they need to extract information from it.
)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[2], line 4 1 # Load data 2 # Note: For the documentation interactive mode, we only query 5000 records from the dataset. Feel free to remove for a more realistic analysis. 3 data = ( ----> 4 mapper 5 .loader 6 .from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True).with_columns("longitude", "latitude").load() 7 # From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude` 8 ) 10 # Create urban layer 11 layer = ( 12 mapper 13 .urban_layer # From the urban_layer module (...) 16 .build() 17 ) NameError: name 'mapper' is not defined
Enriching the Layer with Debug Enabled¶
Now that we've gathered the ingredients let's enrich our urban layer. E.g with the average number of floors per intersection. We’ll map the data, set up the enricher with the debug feature enabled, and apply it.
Feel free for further readings to explore our Figma system workflow at: https://www.figma.com/board/0uaU4vJiwyZJSntljJDKWf/Developer-Experience-Flow-Diagram---Snippet-Code?node-id=0-1&t=mESZ52qU1D2lfzvH-1
# Map data to the nearest layer
# Here the point is to say which intersection of the city maps with which record(s) in your data
# so that we can take into account when enriching.
_, mapped_data = layer.map_nearest_layer(
data,
longitude_column="longitude",
latitude_column="latitude",
output_column="nearest_intersection", # Will create this column in the data, so that we can re-use that throughout the enriching process below.
)
# Set up and apply enricher with debug enabled
enricher = (
mapper
.enricher # From the enricher module
.with_data(
group_by="nearest_intersection", values_from="numfloors"
) # Reading: With data grouped by the nearest intersection, and the values from the attribute numfloors
.aggregate_by(
method="mean", output_column="avg_floors"
) # Reading: Aggregate by using the mean and output the computation into the avg_floors new attribute of the urban layer
.with_debug() # Enable debug to add DEBUG_avg_floors column which will contain the list of indices from the input data used for each enrichment
.build()
)
enriched_layer = enricher.enrich(
mapped_data, layer
) # Data to use, Urban Layer to Enrich.
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[3], line 4 1 # Map data to the nearest layer 2 # Here the point is to say which intersection of the city maps with which record(s) in your data 3 # so that we can take into account when enriching. ----> 4 _, mapped_data = layer.map_nearest_layer( 5 data, 6 longitude_column="longitude", 7 latitude_column="latitude", 8 output_column="nearest_intersection", # Will create this column in the data, so that we can re-use that throughout the enriching process below. 9 ) 11 # Set up and apply enricher with debug enabled 12 enricher = ( 13 mapper 14 .enricher # From the enricher module (...) 22 .build() 23 ) NameError: name 'layer' is not defined
Inspecting the Enriched Layer with Debug Information¶
Let’s take a look at the enriched layer, which now includes the avg_floors
column and the DEBUG_avg_floors
column with the list of indices from the input data used for each enrichment.
# Preview the enriched layer with debug information
print(enriched_layer.layer[['avg_floors', 'DEBUG_avg_floors']].head(50))
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 2 1 # Preview the enriched layer with debug information ----> 2 print(enriched_layer.layer[['avg_floors', 'DEBUG_avg_floors']].head(50)) NameError: name 'enriched_layer' is not defined
Be Able To Preview Your Enricher¶
Fancy a peek at your enricher? Use preview()
to see the setup—great for when you’re digging into someone else’s work!
# Preview enricher
print(enricher.preview())
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[5], line 2 1 # Preview enricher ----> 2 print(enricher.preview()) NameError: name 'enricher' is not defined
Provide many different datasets to the same enricher¶
You can load many datasets and feed the enricher with a dictionary. All the provided datasets should have the same columns provided in with_data
, aggregate_by
, etc.
The user can use the argument data_id
to specify which dataset from the dictionary should be enrichered.
The output will have an enriched layer, with the specific columns, and and additional data_id
column that identifies the origin of that row based on the input dictionary keys.
# Load CSV data
data1 = (
mapper
.loader
.from_huggingface("oscur/pluto", number_of_rows=1000, streaming=True).with_columns("longitude", "latitude").load()
# From the loader module, from the following file and with the `longitude` and `latitude`
)
# Load Parquet data
data2 = (
mapper
.loader
.from_huggingface("oscur/taxisvis1M", number_of_rows=1000, streaming=True) # To update with your own path
.with_columns("pickup_longitude", "pickup_latitude") # Inform your long and lat columns
.with_map({"pickup_longitude": "longitude", "pickup_latitude": "latitude"}) ## Routines like layer.map_nearest_layer needs datasets with the same longitude_column and latitude_column
.load()
)
data = {
"pluto_data": data1,
"taxi_data": data2,
}
# Create a new urban layer to the data
layer = (
mapper
.urban_layer # From the urban_layer module
.with_type("streets_intersections") # With the type streets_intersections
.from_place("Downtown Brooklyn, New York City, USA") # From place
.build()
)
# Map datasets to the nearest layer
# Here the point is to say which intersection of the city maps with which record(s) in each of your datasets
# so that we can take into account when enriching.
_, mapped_data = layer.map_nearest_layer(
data,
longitude_column="longitude",
latitude_column="latitude",
output_column="nearest_intersection", # Will create this column in the data, so that we can re-use that throughout the enriching process below.
)
# Set up and apply enricher with debug enabled
enricher = (
mapper
.enricher # From the enricher module
.with_data(
group_by="nearest_intersection", values_from="numfloors", data_id="pluto_data"
) # Reading: With data grouped by the nearest intersection, and the values from the attribute numfloors
#Both datasets should have the same group_by and values_from columns
.aggregate_by(
method="mean", output_column="avg_floors"
) # Reading: Aggregate by using the mean and output the computation into the avg_floors new attribute of the urban layer
.with_debug() # Enable debug to add DEBUG_avg_floors column which will contain the list of indices from the input data used for each enrichment
.build()
)
enriched_layer = enricher.enrich(
mapped_data, layer
) # Data to use, Urban Layer to Enrich.
#present only the layer items with data_id
layer = enriched_layer.layer[~enriched_layer.layer.data_id.isna()]
layer.head()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[6], line 3 1 # Load CSV data 2 data1 = ( ----> 3 mapper 4 .loader 5 .from_huggingface("oscur/pluto", number_of_rows=1000, streaming=True).with_columns("longitude", "latitude").load() 6 # From the loader module, from the following file and with the `longitude` and `latitude` 7 ) 9 # Load Parquet data 10 data2 = ( 11 mapper 12 .loader (...) 16 .load() 17 ) NameError: name 'mapper' is not defined
More Enricher / Aggregators primitives ?¶
Yes ! We deliver cont_by
instead of aggregate_by
which simply count the number of records rather than aggregating. Further is shown per future examples outside Basics
.
Wants more? Come shout that out on https://github.com/VIDA-NYU/UrbanMapper/issues/11
Wrapping Up¶
Smashing work! 🎉 Your layer’s now enriched with average floors and includes debug information to trace back to the original data. Try visualising it next with visualiser
.