Imputers¶
What is the Imputer module?
The imputer
module is responsible for handling missing data in geospatial datasets.
Meanwhile, we recommend to look through the Example
's Imputer for a more hands-on introduction about
the Imputer module and its usage.
Documentation Under Alpha Construction
This documentation is in its early stages and still being developed. The API may therefore change, and some parts might be incomplete or inaccurate.
Use at your own risk, and please report anything that seems incorrect
/ outdated
you find.
GeoImputerBase
¶
Bases: ABC
Abstract base class for geographic data imputers in UrbanMapper
Provides the interface for imputing
missing geographic data or transforming
data
into coordinates
. Subclasses must implement the required methods to handle
specific imputation tasks such as geocoding or spatial interpolation.
Attributes:
Name | Type | Description |
---|---|---|
latitude_column |
Optional[str]
|
Column name for latitude values post-imputation. |
longitude_column |
Optional[str]
|
Column name for longitude values post-imputation. |
data_id |
Optional[str]
|
Column name for processing specific values post-imputation. |
**extra_params |
Optional[str]
|
Any other argument used by a child class. |
Note
This class is abstract and cannot be instantiated directly. Use concrete
implementations like SimpleGeoImputer
or AddressGeoImputer
.
Source code in src/urban_mapper/modules/imputer/abc_imputer.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
|
_transform(input_geodataframe, urban_layer)
abstractmethod
¶
Internal method to impute geographic data.
Called by transform()
after validation. Subclasses must implement this method.
To be implemented by subclasses
This method should contain the core logic for imputing geographic data.
It should handle the specific imputation task (e.g., geocoding, spatial
interpolation) and return the modified GeoDataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_geodataframe
|
GeoDataFrame
|
GeoDataFrame with data to impute. |
required |
urban_layer
|
UrbanLayerBase
|
Urban layer providing spatial context. |
required |
Returns:
Name | Type | Description |
---|---|---|
GeoDataFrame |
GeoDataFrame
|
Data with imputed geographic information. |
Raises:
Type | Description |
---|---|
ValueError
|
If imputation fails due to invalid inputs. |
Abstract Method
This method must be overridden in subclasses. Failure to implement will raise a NotImplementedError.
Source code in src/urban_mapper/modules/imputer/abc_imputer.py
transform(input_geodataframe, urban_layer)
¶
Public method to impute geographic data.
Validates inputs and delegates to _transform()
for imputation.
What to keep in mind here?
Every Imputer primitives (e.g. AddressGeoImputer
, SimpleGeoImputer
) should
implement the _transform()
method. This method is called by transform()
after validating the inputs. The _transform()
method is where the actual
imputation logic resides. It should handle the specific imputation task
(e.g., geocoding, spatial interpolation) and return the modified
GeoDataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_geodataframe
|
Union[Dict[str, GeoDataFrame], GeoDataFrame]
|
one or more |
required |
urban_layer
|
UrbanLayerBase
|
Urban layer for spatial context. |
required |
Returns:
Name | Type | Description |
---|---|---|
GeoDataFrame |
Union[Dict[str, GeoDataFrame], GeoDataFrame]
|
Data with imputed coordinates. |
Raises:
Type | Description |
---|---|
ValueError
|
If inputs are None or columns are unset. |
Examples:
>>> from urban_mapper.modules.imputer import AddressGeoImputer
>>> from urban_mapper.modules.urban_layer import OSMNXStreets
>>> imputer = AddressGeoImputer(
... address_column="address",
... latitude_column="lat",
... longitude_column="lng"
... )
>>> streets = OSMNXStreets().from_place("London, UK")
>>> gdf = imputer.transform(data_gdf, streets)
Note
Ensure latitude_column and longitude_column are set before calling.
Source code in src/urban_mapper/modules/imputer/abc_imputer.py
preview(format='ascii')
abstractmethod
¶
Generate a preview of the imputer's configuration.
To be implemented by subclasses
This method should provide a summary of the imputer's settings, including any parameters or configurations that are relevant to the imputation process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format
|
str
|
Output format ("ascii" or "json"). Defaults to "ascii". |
'ascii'
|
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
Preview in specified format (e.g., str for "ascii", dict for "json"). |
Raises:
Type | Description |
---|---|
ValueError
|
If format is unsupported. |
Abstract Method
Subclasses must implement this method to provide configuration insights.
Source code in src/urban_mapper/modules/imputer/abc_imputer.py
SimpleGeoImputer
¶
Bases: GeoImputerBase
Imputer that removes (naively) rows with missing coordinates.
Filters out rows with NaN
in latitude
or longitude
columns, cleaning data for
spatial operations.
Attributes:
Name | Type | Description |
---|---|---|
latitude_column |
str
|
Column with latitude values. |
longitude_column |
str
|
Column with longitude values. |
Examples:
>>> from urban_mapper.modules.imputer import SimpleGeoImputer
>>> imputer = SimpleGeoImputer(latitude_column="lat", longitude_column="lng")
>>> clean_gdf = imputer.transform(data_gdf, urban_layer)
Note
This imputer does not add coordinates; it only removes incomplete rows.
Source code in src/urban_mapper/modules/imputer/imputers/simple_geo_imputer.py
_transform(input_geodataframe, urban_layer)
¶
Filter rows with missing coordinates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_geodataframe
|
GeoDataFrame
|
|
required |
urban_layer
|
UrbanLayerBase
|
|
required |
Returns:
Name | Type | Description |
---|---|---|
GeoDataFrame |
GeoDataFrame
|
Cleaned data without missing coordinates. |
Tip
Use this as a preprocessing step before spatial analysis.
Source code in src/urban_mapper/modules/imputer/imputers/simple_geo_imputer.py
preview(format='ascii')
¶
Preview the imputer configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format
|
str
|
Output format ("ascii" or "json"). Defaults to "ascii". |
'ascii'
|
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
Configuration summary. |
Raises:
Type | Description |
---|---|
ValueError
|
If format is unsupported. |
Source code in src/urban_mapper/modules/imputer/imputers/simple_geo_imputer.py
AddressGeoImputer
¶
Bases: GeoImputerBase
Imputer that geocodes addresses to coordinates.
What is that about?
Uses OpenStreetMap via osmnx
to convert address strings into latitude and
longitude values.
You have an address
/ equivalent name column in your data, but no coordinates? Or missing coordinates?
This imputer will geocode the addresses to fill in the missing latitude and longitude values.
Understanding the extra parameters
If you look at the GeoImputerBase
, addres_column_name is not a parameter there.
As a result, below is an example localised around this primitive, but when using the factory,
you will need to pass your address
/ equivalent name column to the kwards of .on_columns(.)
.
Examples:
import urban_mapper as um factory = um.UrbanMapper().imputer.with_type("AddressGeoImputer") ... .on_columns(longitude_column="lng", latitude_column="lat", address_column="address") ... # or .on_columns("lng", "lat", "address") gdf = factory.transform(data_gdf, urban_layer)
Attributes:
Name | Type | Description |
---|---|---|
latitude_column |
str
|
Column for latitude values. |
longitude_column |
str
|
Column for longitude values. |
address_column |
str
|
Column with address strings. |
Examples:
>>> from urban_mapper.modules.imputer import AddressGeoImputer
>>> imputer = AddressGeoImputer(
... latitude_column="lat",
... longitude_column="lng",
... address_column="address"
... )
>>> geocoded_gdf = imputer.transform(data_gdf, urban_layer)
Warning
Requires an internet connection for geocoding via OpenStreetMap.
Source code in src/urban_mapper/modules/imputer/imputers/address_geo_imputer.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
_transform(input_geodataframe, urban_layer)
¶
Geocode addresses into coordinates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_geodataframe
|
GeoDataFrame
|
|
required |
urban_layer
|
UrbanLayerBase
|
|
required |
Returns:
Name | Type | Description |
---|---|---|
GeoDataFrame |
GeoDataFrame
|
Data with geocoded coordinates. |
Note
Urban layer is included for interface compatibility but not used.
Source code in src/urban_mapper/modules/imputer/imputers/address_geo_imputer.py
preview(format='ascii')
¶
Preview the imputer configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format
|
str
|
Output format ("ascii" or "json"). Defaults to "ascii". |
'ascii'
|
Returns:
Name | Type | Description |
---|---|---|
Any |
Any
|
Configuration summary. |
Raises:
Type | Description |
---|---|
ValueError
|
If format is unsupported. |
Source code in src/urban_mapper/modules/imputer/imputers/address_geo_imputer.py
ImputerFactory
¶
Factory for creating and configuring geographic imputers.
Offers a fluent chaining-methods-based API to instantiate imputers, configure settings, and apply them.
Attributes:
Name | Type | Description |
---|---|---|
_imputer_type |
str
|
Type of imputer to create. |
_latitude_column |
str
|
Column for latitude values. |
_longitude_column |
str
|
Column for longitude values. |
Examples:
>>> import urban_mapper as um
>>> factory = um.UrbanMapper().imputer.with_type("SimpleGeoImputer") ... .on_columns(longitude_column="lng", latitude_column="lat")
>>> gdf = factory.transform(data_gdf, urban_layer)
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
|
with_type(primitive_type)
¶
Set the imputer type to instantiate.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
primitive_type
|
str
|
Imputer type (e.g., "SimpleGeoImputer"). |
required |
Returns:
Name | Type | Description |
---|---|---|
ImputerFactory |
ImputerFactory
|
Self for chaining. |
Raises:
Type | Description |
---|---|
ValueError
|
If primitive_type is not in IMPUTER_REGISTRY. |
Tip
Check IMPUTER_REGISTRY keys for valid imputer types.
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
on_columns(longitude_column, latitude_column, **extra_params)
¶
Configure latitude and longitude columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
longitude_column
|
str
|
Column name for longitude. |
required |
latitude_column
|
str
|
Column name for latitude. |
required |
**extra_params
|
Any other argument to be passed to a child class, such as address to |
{}
|
Returns:
Name | Type | Description |
---|---|---|
ImputerFactory |
ImputerFactory
|
Self for chaining. |
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
transform(input_geodataframe, urban_layer)
¶
Apply the configured imputer to data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_geodataframe
|
Union[Dict[str, GeoDataFrame], GeoDataFrame]
|
one or more |
required |
urban_layer
|
UrbanLayerBase
|
Urban layer for context. |
required |
Returns:
Type | Description |
---|---|
Union[Dict[str, GeoDataFrame], GeoDataFrame]
|
Union[Dict[str, GeoDataFrame], GeoDataFrame]: Imputed data. |
Raises:
Type | Description |
---|---|
ValueError
|
If configuration is incomplete. |
Note
Call with_type() and on_columns() before transform().
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
build()
¶
Build and return an imputer instance without applying it.
This method creates and returns an imputer instance without immediately applying
it to data. It is primarily intended for use in the UrbanPipeline
, where the
actual imputation is deferred until pipeline execution.
To Keep In Mind
For most use cases outside of pipelines, using transform()
is preferred as it
directly applies the imputer and returns the imputed data.
Returns:
Type | Description |
---|---|
GeoImputerBase
|
A GeoImputerBase instance configured and ready to use. |
Raises:
Type | Description |
---|---|
ValueError
|
If the imputer type or latitude/longitude columns have not been specified. |
Examples:
>>> # Creating a pipeline component
>>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer") ... .on_columns(longitude_column="lng", latitude_column="lat") ... .build()
>>> pipeline.add_imputer(imputer_component)
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
preview(format='ascii')
¶
Display a preview of the imputer configuration and settings.
This method generates and displays a preview of the imputer, showing its configuration, settings, and other metadata. The preview can be displayed in different formats.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format
|
str
|
The format to display the preview in (default: "ascii").
|
'ascii'
|
Raises:
Type | Description |
---|---|
ValueError
|
If an unsupported format is specified. |
Note
This method requires an imputer instance to be available. Call build() or transform() first to create an instance.
Examples:
>>> imputer = um.UrbanMapper().imputer.with_type("SimpleGeoImputer") ... .on_columns(longitude_column="lng", latitude_column="lat")
>>> # Build the imputer instance
>>> imputer.build()
>>> # Display a preview
>>> imputer.preview()
>>> # Or in JSON format
>>> imputer.preview(format="json")
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
with_preview(format='ascii')
¶
Configure the factory to display a preview after building.
This method configures the factory to automatically display a preview after
building an imputer with build()
. It's a convenient way to inspect the imputer
configuration without having to call preview()
separately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format
|
str
|
The format to display the preview in (default: "ascii").
|
'ascii'
|
Returns:
Type | Description |
---|---|
ImputerFactory
|
The ImputerFactory instance for method chaining. |
Examples:
>>> # Auto-preview after building
>>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer") ... .on_columns(longitude_column="lng", latitude_column="lat") ... .with_preview(format="json") ... .build()