Pipeline Generator¶
The following notebook shows a nifty Large Language Model (LLM) helper that builds a pipeline from a simple description.
Let’s get generating! 🛠️
Prepare the environment¶
Make sure you have, per the LLM type you would like to use from those available, prepare your environment adequately. For instance, below we will be showcasing gpt-4o. Hence, OPENAI API key is required.
export OPENAI_API_KEY="your-api-key"
Such API key can be obtained / generated from OPENAI Console.
import urban_mapper as um
# Start UrbanMapper
mapper = um.UrbanMapper()
Defining a User Description¶
Let’s tell it what we want: mapping PLUTO data to intersections and finding average floors.
# Our urban wish list
user_description = """
I’ve got PLUTO data at './pluto.csv', with 'longitude' and 'latitude' columns.
Map this to street intersections in Downtown Brooklyn, New York City, USA, using a 50-metre threshold.
Work out the average number of floors per intersection from the 'numfloors' column.
The data’s for all of NYC and might have missing coordinates, so (1) fill in the gaps and (2) filter it to the urban layer’s bounding box.
Show the results on an interactive dark-themed map.
"""
We tried many user_description such as as:
user_description = """
I have Taxi Trips data from New York City at './taxisvis1M.csv', including two pairs of coordinates latitude, longitude, whihch are: (1) 'pickup_latitude' and 'pickup_longitude, (2) 'dropoff_latitude', and 'dropoff_longitude' columns.
Map this all data to street roads in Downtown Brooklyn, New York City, USA.
Calculate (1) the number of pick ups per street segments, and (2) the number of drop offs per street segments. The data covers all of NYC and contains missing coordinate records, so
(1) handle missing coordinates by excluding those records and (2) filter the data to Downtown Brooklyn’s bounding box.
Display the results on an interactive dark-themed map, while showing both columns output of interest.
"""
user_description = """
I have motor vehicle collision data at './NYC_Motor_Vehicle_Collisions_Mar_12_2025.csv', including 'LATITUDE' and 'LONGITUDE' columns.
Map this data to street intersections in Downtown Brooklyn, New York City, USA.
Calculate the number of collisions per street intersections. The data covers all of NYC and contains missing coordinate records, so
(1) handle missing coordinates by excluding those records and (2) filter the data to Downtown Brooklyn’s bounding box.
Display the results on an interactive dark-themed map, with collision counts visualized per segment.
"""
user_description = """
I have PLUTO data at './pluto.csv', including 'longitude' and 'latitude' columns.
Map this data to street intersections in Downtown Brooklyn, New York City, USA, using a 50-metre threshold.
Calculate the average number of floors per intersection from the 'numfloors' column.
The data covers all of NYC and may have missing coordinates, so (1) impute missing coordinates and (2) filter it to the urban layer’s bounding box.
Display the results on an interactive dark-themed map, please.
"""
Generating the Pipeline¶
Now, let’s ask the generator to whip up a pipeline for us. We’ll use GPT-4o for the time being.
Note that we use Ipython.display to highlight the code in the cell. This is Ipython widget, so it will only work in Jupyter Notebook or Jupyter Lab.
The following are the available LLMs primitives:
gpt-4o
: OpenAI’s GPT-4o model.gpt-4
: OpenAI’s GPT-4 model.gpt-3.5-turbo
: OpenAI’s GPT-3.5-turbo model.
from IPython.display import Code
# Generate pipeline suggestion
suggestion = (
mapper
.pipeline_generator # From the pipeline_generator module
.with_LLM("gpt-4o") # With gpt-4o type of LLM
.generate_urban_pipeline(user_description) # Generate the pipeline based on the user description previously instantiated
)
# print(suggestion) # See what it suggests (without highlighting)!
# Display the suggestion while highlighting the code in the cell
Code(suggestion, language="python")
More LLMs primitives ? Such as Open Source Ones?¶
Wants more? Come shout that out in creating a new issue in the GitHub repo. We’re all ears! 👂
https://github.com/VIDA-NYU/UrbanMapper/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen
Wrapping Up¶
How fab is that? 🌟 You’ve got a pipeline suggestion from a quick description. Use it as is or tweak it!