Skip to content

Imputers

What is the Imputer module?

The imputer module is responsible for handling missing data in geospatial datasets.

Meanwhile, we recommend to look through the Example's Imputer for a more hands-on introduction about the Imputer module and its usage.

Documentation Under Alpha Construction

This documentation is in its early stages and still being developed. The API may therefore change, and some parts might be incomplete or inaccurate.

Use at your own risk, and please report anything that seems incorrect / outdated you find.

Open An Issue!

GeoImputerBase

Bases: ABC

Abstract base class for geographic data imputers in UrbanMapper

Provides the interface for imputing missing geographic data or transforming data into coordinates. Subclasses must implement the required methods to handle specific imputation tasks such as geocoding or spatial interpolation.

Attributes:

Name Type Description
latitude_column Optional[str]

Column name for latitude values post-imputation.

longitude_column Optional[str]

Column name for longitude values post-imputation.

data_id Optional[str]

Column name for processing specific values post-imputation.

**extra_params Optional[str]

Any other argument used by a child class.

Note

This class is abstract and cannot be instantiated directly. Use concrete implementations like SimpleGeoImputer or AddressGeoImputer.

Source code in src/urban_mapper/modules/imputer/abc_imputer.py
@beartype
class GeoImputerBase(ABC):
    """Abstract base class for geographic data imputers in `UrbanMapper`

    Provides the interface for `imputing` missing geographic data or `transforming` data
    into `coordinates`. Subclasses must implement the required methods to handle
    specific imputation tasks such as geocoding or spatial interpolation.

    Attributes:
        latitude_column (Optional[str]): Column name for latitude values post-imputation.
        longitude_column (Optional[str]): Column name for longitude values post-imputation.
        data_id (Optional[str]): Column name for processing specific values post-imputation.
        **extra_params: Any other argument used by a child class.

    !!! note
        This class is abstract and cannot be instantiated directly. Use concrete
        implementations like `SimpleGeoImputer` or `AddressGeoImputer`.
    """

    def __init__(
        self,
        latitude_column: Optional[str] = None,
        longitude_column: Optional[str] = None,
        data_id: Optional[str] = None,
        **extra_params,
    ) -> None:
        self.data_id = data_id
        self.latitude_column = latitude_column
        self.longitude_column = longitude_column

    @abstractmethod
    def _transform(
        self, input_geodataframe: gpd.GeoDataFrame, urban_layer: UrbanLayerBase
    ) -> gpd.GeoDataFrame:
        """Internal method to impute geographic data.

        Called by `transform()` after validation. Subclasses must implement this method.

        !!! note "To be implemented by subclasses"
            This method should contain the core logic for imputing geographic data.
            It should handle the specific imputation task (e.g., geocoding, spatial
            interpolation) and return the modified `GeoDataFrame`.

        Args:
            input_geodataframe: GeoDataFrame with data to impute.
            urban_layer: Urban layer providing spatial context.

        Returns:
            GeoDataFrame: Data with imputed geographic information.

        Raises:
            ValueError: If imputation fails due to invalid inputs.

        !!! warning "Abstract Method"
            This method must be overridden in subclasses. Failure to implement will
            raise a NotImplementedError.
        """
        ...

    @require_arguments_not_none(
        "input_geodataframe", error_msg="Input GeoDataFrame cannot be None."
    )
    @require_arguments_not_none("urban_layer", error_msg="Urban layer cannot be None.")
    @require_attributes(["latitude_column", "longitude_column"])
    def transform(
        self,
        input_geodataframe: Union[Dict[str, gpd.GeoDataFrame], gpd.GeoDataFrame],
        urban_layer: UrbanLayerBase,
    ) -> Union[
        Dict[str, gpd.GeoDataFrame],
        gpd.GeoDataFrame,
    ]:
        """Public method to impute geographic data.

        Validates inputs and delegates to `_transform()` for imputation.

        !!! note "What to keep in mind here?"
            Every Imputer primitives (e.g. `AddressGeoImputer`, `SimpleGeoImputer`) should
            implement the `_transform()` method. This method is called by `transform()`
            after validating the inputs. The `_transform()` method is where the actual
            imputation logic resides. It should handle the specific imputation task
            (e.g., geocoding, spatial interpolation) and return the modified
            `GeoDataFrame`.

        Args:
            input_geodataframe: one or more `GeoDataFrame` with data to process.
            urban_layer: Urban layer for spatial context.

        Returns:
            GeoDataFrame: Data with imputed coordinates.

        Raises:
            ValueError: If inputs are None or columns are unset.

        Examples:
            >>> from urban_mapper.modules.imputer import AddressGeoImputer
            >>> from urban_mapper.modules.urban_layer import OSMNXStreets
            >>> imputer = AddressGeoImputer(
            ...     address_column="address",
            ...     latitude_column="lat",
            ...     longitude_column="lng"
            ... )
            >>> streets = OSMNXStreets().from_place("London, UK")
            >>> gdf = imputer.transform(data_gdf, streets)

        !!! note
            Ensure latitude_column and longitude_column are set before calling.
        """
        if isinstance(input_geodataframe, gpd.GeoDataFrame):
            return self._transform(input_geodataframe, urban_layer)
        else:
            return {
                key: self._transform(gdf, urban_layer)
                if self.data_id is None or self.data_id == key
                else gdf
                for key, gdf in input_geodataframe.items()
            }

    @abstractmethod
    def preview(self, format: str = "ascii") -> Any:
        """Generate a preview of the imputer's configuration.

        !!! note "To be implemented by subclasses"
            This method should provide a summary of the imputer's settings,
            including any parameters or configurations that are relevant to
            the imputation process.

        Args:
            format: Output format ("ascii" or "json"). Defaults to "ascii".

        Returns:
            Any: Preview in specified format (e.g., str for "ascii", dict for "json").

        Raises:
            ValueError: If format is unsupported.

        !!! warning "Abstract Method"
            Subclasses must implement this method to provide configuration insights.
        """
        pass

_transform(input_geodataframe, urban_layer) abstractmethod

Internal method to impute geographic data.

Called by transform() after validation. Subclasses must implement this method.

To be implemented by subclasses

This method should contain the core logic for imputing geographic data. It should handle the specific imputation task (e.g., geocoding, spatial interpolation) and return the modified GeoDataFrame.

Parameters:

Name Type Description Default
input_geodataframe GeoDataFrame

GeoDataFrame with data to impute.

required
urban_layer UrbanLayerBase

Urban layer providing spatial context.

required

Returns:

Name Type Description
GeoDataFrame GeoDataFrame

Data with imputed geographic information.

Raises:

Type Description
ValueError

If imputation fails due to invalid inputs.

Abstract Method

This method must be overridden in subclasses. Failure to implement will raise a NotImplementedError.

Source code in src/urban_mapper/modules/imputer/abc_imputer.py
@abstractmethod
def _transform(
    self, input_geodataframe: gpd.GeoDataFrame, urban_layer: UrbanLayerBase
) -> gpd.GeoDataFrame:
    """Internal method to impute geographic data.

    Called by `transform()` after validation. Subclasses must implement this method.

    !!! note "To be implemented by subclasses"
        This method should contain the core logic for imputing geographic data.
        It should handle the specific imputation task (e.g., geocoding, spatial
        interpolation) and return the modified `GeoDataFrame`.

    Args:
        input_geodataframe: GeoDataFrame with data to impute.
        urban_layer: Urban layer providing spatial context.

    Returns:
        GeoDataFrame: Data with imputed geographic information.

    Raises:
        ValueError: If imputation fails due to invalid inputs.

    !!! warning "Abstract Method"
        This method must be overridden in subclasses. Failure to implement will
        raise a NotImplementedError.
    """
    ...

transform(input_geodataframe, urban_layer)

Public method to impute geographic data.

Validates inputs and delegates to _transform() for imputation.

What to keep in mind here?

Every Imputer primitives (e.g. AddressGeoImputer, SimpleGeoImputer) should implement the _transform() method. This method is called by transform() after validating the inputs. The _transform() method is where the actual imputation logic resides. It should handle the specific imputation task (e.g., geocoding, spatial interpolation) and return the modified GeoDataFrame.

Parameters:

Name Type Description Default
input_geodataframe Union[Dict[str, GeoDataFrame], GeoDataFrame]

one or more GeoDataFrame with data to process.

required
urban_layer UrbanLayerBase

Urban layer for spatial context.

required

Returns:

Name Type Description
GeoDataFrame Union[Dict[str, GeoDataFrame], GeoDataFrame]

Data with imputed coordinates.

Raises:

Type Description
ValueError

If inputs are None or columns are unset.

Examples:

>>> from urban_mapper.modules.imputer import AddressGeoImputer
>>> from urban_mapper.modules.urban_layer import OSMNXStreets
>>> imputer = AddressGeoImputer(
...     address_column="address",
...     latitude_column="lat",
...     longitude_column="lng"
... )
>>> streets = OSMNXStreets().from_place("London, UK")
>>> gdf = imputer.transform(data_gdf, streets)

Note

Ensure latitude_column and longitude_column are set before calling.

Source code in src/urban_mapper/modules/imputer/abc_imputer.py
@require_arguments_not_none(
    "input_geodataframe", error_msg="Input GeoDataFrame cannot be None."
)
@require_arguments_not_none("urban_layer", error_msg="Urban layer cannot be None.")
@require_attributes(["latitude_column", "longitude_column"])
def transform(
    self,
    input_geodataframe: Union[Dict[str, gpd.GeoDataFrame], gpd.GeoDataFrame],
    urban_layer: UrbanLayerBase,
) -> Union[
    Dict[str, gpd.GeoDataFrame],
    gpd.GeoDataFrame,
]:
    """Public method to impute geographic data.

    Validates inputs and delegates to `_transform()` for imputation.

    !!! note "What to keep in mind here?"
        Every Imputer primitives (e.g. `AddressGeoImputer`, `SimpleGeoImputer`) should
        implement the `_transform()` method. This method is called by `transform()`
        after validating the inputs. The `_transform()` method is where the actual
        imputation logic resides. It should handle the specific imputation task
        (e.g., geocoding, spatial interpolation) and return the modified
        `GeoDataFrame`.

    Args:
        input_geodataframe: one or more `GeoDataFrame` with data to process.
        urban_layer: Urban layer for spatial context.

    Returns:
        GeoDataFrame: Data with imputed coordinates.

    Raises:
        ValueError: If inputs are None or columns are unset.

    Examples:
        >>> from urban_mapper.modules.imputer import AddressGeoImputer
        >>> from urban_mapper.modules.urban_layer import OSMNXStreets
        >>> imputer = AddressGeoImputer(
        ...     address_column="address",
        ...     latitude_column="lat",
        ...     longitude_column="lng"
        ... )
        >>> streets = OSMNXStreets().from_place("London, UK")
        >>> gdf = imputer.transform(data_gdf, streets)

    !!! note
        Ensure latitude_column and longitude_column are set before calling.
    """
    if isinstance(input_geodataframe, gpd.GeoDataFrame):
        return self._transform(input_geodataframe, urban_layer)
    else:
        return {
            key: self._transform(gdf, urban_layer)
            if self.data_id is None or self.data_id == key
            else gdf
            for key, gdf in input_geodataframe.items()
        }

preview(format='ascii') abstractmethod

Generate a preview of the imputer's configuration.

To be implemented by subclasses

This method should provide a summary of the imputer's settings, including any parameters or configurations that are relevant to the imputation process.

Parameters:

Name Type Description Default
format str

Output format ("ascii" or "json"). Defaults to "ascii".

'ascii'

Returns:

Name Type Description
Any Any

Preview in specified format (e.g., str for "ascii", dict for "json").

Raises:

Type Description
ValueError

If format is unsupported.

Abstract Method

Subclasses must implement this method to provide configuration insights.

Source code in src/urban_mapper/modules/imputer/abc_imputer.py
@abstractmethod
def preview(self, format: str = "ascii") -> Any:
    """Generate a preview of the imputer's configuration.

    !!! note "To be implemented by subclasses"
        This method should provide a summary of the imputer's settings,
        including any parameters or configurations that are relevant to
        the imputation process.

    Args:
        format: Output format ("ascii" or "json"). Defaults to "ascii".

    Returns:
        Any: Preview in specified format (e.g., str for "ascii", dict for "json").

    Raises:
        ValueError: If format is unsupported.

    !!! warning "Abstract Method"
        Subclasses must implement this method to provide configuration insights.
    """
    pass

SimpleGeoImputer

Bases: GeoImputerBase

Imputer that removes (naively) rows with missing coordinates.

Filters out rows with NaN in latitude or longitude columns, cleaning data for spatial operations.

Attributes:

Name Type Description
latitude_column str

Column with latitude values.

longitude_column str

Column with longitude values.

Examples:

>>> from urban_mapper.modules.imputer import SimpleGeoImputer
>>> imputer = SimpleGeoImputer(latitude_column="lat", longitude_column="lng")
>>> clean_gdf = imputer.transform(data_gdf, urban_layer)

Note

This imputer does not add coordinates; it only removes incomplete rows.

Source code in src/urban_mapper/modules/imputer/imputers/simple_geo_imputer.py
@beartype
class SimpleGeoImputer(GeoImputerBase):
    """Imputer that removes (naively) rows with missing coordinates.

    Filters out rows with `NaN` in `latitude` or `longitude` columns, cleaning data for
    spatial operations.

    Attributes:
        latitude_column (str): Column with latitude values.
        longitude_column (str): Column with longitude values.

    Examples:
        >>> from urban_mapper.modules.imputer import SimpleGeoImputer
        >>> imputer = SimpleGeoImputer(latitude_column="lat", longitude_column="lng")
        >>> clean_gdf = imputer.transform(data_gdf, urban_layer)

    !!! note
        This imputer does not add coordinates; it only removes incomplete rows.
    """

    def _transform(
        self, input_geodataframe: gpd.GeoDataFrame, urban_layer: UrbanLayerBase
    ) -> gpd.GeoDataFrame:
        """Filter rows with missing coordinates.

        Args:
            input_geodataframe: `GeoDataFrame` to clean.
            urban_layer: `Urban layer` (unused in this implementation).

        Returns:
            GeoDataFrame: Cleaned data without missing coordinates.

        !!! tip
            Use this as a preprocessing step before spatial analysis.
        """
        _ = urban_layer  # Not used in this implementation
        return input_geodataframe.dropna(
            subset=[self.latitude_column, self.longitude_column]
        )

    def preview(self, format: str = "ascii") -> Any:
        """Preview the imputer configuration.

        Args:
            format: Output format ("ascii" or "json"). Defaults to "ascii".

        Returns:
            Any: Configuration summary.

        Raises:
            ValueError: If format is unsupported.
        """
        if format == "ascii":
            lines = [
                f"Imputer: SimpleGeoImputer",
                f"  Action: Drop rows with missing '{self.latitude_column}' or '{self.longitude_column}'",
            ]
            if self.data_id:
                lines.append(f"  Data ID: '{self.data_id}'")

            return "\n".join(lines)
        elif format == "json":
            return {
                "imputer": "SimpleGeoImputer",
                "action": f"Drop rows with missing '{self.latitude_column}' or '{self.longitude_column}'",
                "latitude_column": self.latitude_column,
                "longitude_column": self.longitude_column,
                "data_id": self.data_id,
            }
        else:
            raise ValueError(f"Unsupported format '{format}'")

_transform(input_geodataframe, urban_layer)

Filter rows with missing coordinates.

Parameters:

Name Type Description Default
input_geodataframe GeoDataFrame

GeoDataFrame to clean.

required
urban_layer UrbanLayerBase

Urban layer (unused in this implementation).

required

Returns:

Name Type Description
GeoDataFrame GeoDataFrame

Cleaned data without missing coordinates.

Tip

Use this as a preprocessing step before spatial analysis.

Source code in src/urban_mapper/modules/imputer/imputers/simple_geo_imputer.py
def _transform(
    self, input_geodataframe: gpd.GeoDataFrame, urban_layer: UrbanLayerBase
) -> gpd.GeoDataFrame:
    """Filter rows with missing coordinates.

    Args:
        input_geodataframe: `GeoDataFrame` to clean.
        urban_layer: `Urban layer` (unused in this implementation).

    Returns:
        GeoDataFrame: Cleaned data without missing coordinates.

    !!! tip
        Use this as a preprocessing step before spatial analysis.
    """
    _ = urban_layer  # Not used in this implementation
    return input_geodataframe.dropna(
        subset=[self.latitude_column, self.longitude_column]
    )

preview(format='ascii')

Preview the imputer configuration.

Parameters:

Name Type Description Default
format str

Output format ("ascii" or "json"). Defaults to "ascii".

'ascii'

Returns:

Name Type Description
Any Any

Configuration summary.

Raises:

Type Description
ValueError

If format is unsupported.

Source code in src/urban_mapper/modules/imputer/imputers/simple_geo_imputer.py
def preview(self, format: str = "ascii") -> Any:
    """Preview the imputer configuration.

    Args:
        format: Output format ("ascii" or "json"). Defaults to "ascii".

    Returns:
        Any: Configuration summary.

    Raises:
        ValueError: If format is unsupported.
    """
    if format == "ascii":
        lines = [
            f"Imputer: SimpleGeoImputer",
            f"  Action: Drop rows with missing '{self.latitude_column}' or '{self.longitude_column}'",
        ]
        if self.data_id:
            lines.append(f"  Data ID: '{self.data_id}'")

        return "\n".join(lines)
    elif format == "json":
        return {
            "imputer": "SimpleGeoImputer",
            "action": f"Drop rows with missing '{self.latitude_column}' or '{self.longitude_column}'",
            "latitude_column": self.latitude_column,
            "longitude_column": self.longitude_column,
            "data_id": self.data_id,
        }
    else:
        raise ValueError(f"Unsupported format '{format}'")

AddressGeoImputer

Bases: GeoImputerBase

Imputer that geocodes addresses to coordinates.

What is that about?

Uses OpenStreetMap via osmnx to convert address strings into latitude and longitude values.

You have an address / equivalent name column in your data, but no coordinates? Or missing coordinates? This imputer will geocode the addresses to fill in the missing latitude and longitude values.

Understanding the extra parameters

If you look at the GeoImputerBase, addres_column_name is not a parameter there. As a result, below is an example localised around this primitive, but when using the factory, you will need to pass your address / equivalent name column to the kwards of .on_columns(.).

Examples:

import urban_mapper as um factory = um.UrbanMapper().imputer.with_type("AddressGeoImputer") ... .on_columns(longitude_column="lng", latitude_column="lat", address_column="address") ... # or .on_columns("lng", "lat", "address") gdf = factory.transform(data_gdf, urban_layer)

Attributes:

Name Type Description
latitude_column str

Column for latitude values.

longitude_column str

Column for longitude values.

address_column str

Column with address strings.

Examples:

>>> from urban_mapper.modules.imputer import AddressGeoImputer
>>> imputer = AddressGeoImputer(
...     latitude_column="lat",
...     longitude_column="lng",
...     address_column="address"
... )
>>> geocoded_gdf = imputer.transform(data_gdf, urban_layer)

Warning

Requires an internet connection for geocoding via OpenStreetMap.

Source code in src/urban_mapper/modules/imputer/imputers/address_geo_imputer.py
@beartype
class AddressGeoImputer(GeoImputerBase):
    """Imputer that geocodes addresses to coordinates.

    !!! tip "What is that about?"
        Uses OpenStreetMap via `osmnx` to convert address strings into latitude and
        longitude values.

        You have an `address` / equivalent name column in your data, but no coordinates? Or missing coordinates?
        This imputer will geocode the addresses to fill in the missing latitude and longitude values.

    !!! tip "Understanding the extra parameters"
        If you look at the `GeoImputerBase`, addres_column_name is not a parameter there.
        As a result, below is an example localised around this primitive, but when using the factory,
        you will need to pass your `address` / equivalent name column to the kwards of `.on_columns(.)`.

        Examples:
        >>> import urban_mapper as um
        >>> factory = um.UrbanMapper().imputer.with_type("AddressGeoImputer")\
        ...     .on_columns(longitude_column="lng", latitude_column="lat", address_column="address")
        ...     # or .on_columns("lng", "lat", "address")
        >>> gdf = factory.transform(data_gdf, urban_layer)

    Attributes:
        latitude_column (str): Column for latitude values.
        longitude_column (str): Column for longitude values.
        address_column (str): Column with address strings.

    Examples:
        >>> from urban_mapper.modules.imputer import AddressGeoImputer
        >>> imputer = AddressGeoImputer(
        ...     latitude_column="lat",
        ...     longitude_column="lng",
        ...     address_column="address"
        ... )
        >>> geocoded_gdf = imputer.transform(data_gdf, urban_layer)

    !!! warning
        Requires an internet connection for geocoding via OpenStreetMap.
    """

    def __init__(
        self,
        latitude_column: Optional[str] = None,
        longitude_column: Optional[str] = None,
        data_id: Optional[str] = None,
        address_column: Optional[str] = None,
    ):
        super().__init__(latitude_column, longitude_column, data_id)
        self.address_column = address_column

    def _transform(
        self, input_geodataframe: gpd.GeoDataFrame, urban_layer: UrbanLayerBase
    ) -> gpd.GeoDataFrame:
        """Geocode addresses into coordinates.

        Args:
            input_geodataframe: `GeoDataFrame` with address data.
            urban_layer: `Urban layer`.

        Returns:
            GeoDataFrame: Data with geocoded coordinates.

        !!! note
            Urban layer is included for interface compatibility but not used.
        """
        _ = urban_layer
        dataframe = input_geodataframe.copy()
        mask_missing = (
            dataframe[self.latitude_column].isna()
            | dataframe[self.longitude_column].isna()
        )
        missing_records = dataframe[mask_missing].copy()

        def geocode_address(row):
            address = str(row.get(self.address_column, "")).strip()
            if not address:
                return None
            try:
                latitude_longitude = osmnx.geocode(address)
                if not latitude_longitude:
                    return None
                latitude_value, longitude_value = latitude_longitude
                return pd.Series(
                    {
                        self.latitude_column: latitude_value,
                        self.longitude_column: longitude_value,
                        "geometry": Point(longitude_value, latitude_value),
                    }
                )
            except Exception:
                return None

        geocoded_data = missing_records.apply(geocode_address, axis=1)
        valid_indices = geocoded_data.dropna().index

        if not valid_indices.empty:
            dataframe.loc[valid_indices] = geocoded_data.loc[valid_indices]

        dataframe = dataframe.loc[~mask_missing | dataframe.index.isin(valid_indices)]
        return dataframe

    def preview(self, format: str = "ascii") -> Any:
        """Preview the imputer configuration.

        Args:
            format: Output format ("ascii" or "json"). Defaults to "ascii".

        Returns:
            Any: Configuration summary.

        Raises:
            ValueError: If format is unsupported.
        """
        if format == "ascii":
            lines = [
                f"Imputer: AddressGeoImputer",
                f"  Action: Impute '{self.latitude_column}' and '{self.longitude_column}' "
                f"using addresses from '{self.address_column}'",
            ]
            if self.data_id:
                lines.append(f"  Data ID: '{self.data_id}'")

            return "\n".join(lines)
        elif format == "json":
            return {
                "imputer": "AddressGeoImputer",
                "action": f"Impute '{self.latitude_column}' and '{self.longitude_column}' "
                f"using addresses from '{self.address_column}'",
                f"data_id": self.data_id,
            }
        else:
            raise ValueError(f"Unsupported format '{format}'")

_transform(input_geodataframe, urban_layer)

Geocode addresses into coordinates.

Parameters:

Name Type Description Default
input_geodataframe GeoDataFrame

GeoDataFrame with address data.

required
urban_layer UrbanLayerBase

Urban layer.

required

Returns:

Name Type Description
GeoDataFrame GeoDataFrame

Data with geocoded coordinates.

Note

Urban layer is included for interface compatibility but not used.

Source code in src/urban_mapper/modules/imputer/imputers/address_geo_imputer.py
def _transform(
    self, input_geodataframe: gpd.GeoDataFrame, urban_layer: UrbanLayerBase
) -> gpd.GeoDataFrame:
    """Geocode addresses into coordinates.

    Args:
        input_geodataframe: `GeoDataFrame` with address data.
        urban_layer: `Urban layer`.

    Returns:
        GeoDataFrame: Data with geocoded coordinates.

    !!! note
        Urban layer is included for interface compatibility but not used.
    """
    _ = urban_layer
    dataframe = input_geodataframe.copy()
    mask_missing = (
        dataframe[self.latitude_column].isna()
        | dataframe[self.longitude_column].isna()
    )
    missing_records = dataframe[mask_missing].copy()

    def geocode_address(row):
        address = str(row.get(self.address_column, "")).strip()
        if not address:
            return None
        try:
            latitude_longitude = osmnx.geocode(address)
            if not latitude_longitude:
                return None
            latitude_value, longitude_value = latitude_longitude
            return pd.Series(
                {
                    self.latitude_column: latitude_value,
                    self.longitude_column: longitude_value,
                    "geometry": Point(longitude_value, latitude_value),
                }
            )
        except Exception:
            return None

    geocoded_data = missing_records.apply(geocode_address, axis=1)
    valid_indices = geocoded_data.dropna().index

    if not valid_indices.empty:
        dataframe.loc[valid_indices] = geocoded_data.loc[valid_indices]

    dataframe = dataframe.loc[~mask_missing | dataframe.index.isin(valid_indices)]
    return dataframe

preview(format='ascii')

Preview the imputer configuration.

Parameters:

Name Type Description Default
format str

Output format ("ascii" or "json"). Defaults to "ascii".

'ascii'

Returns:

Name Type Description
Any Any

Configuration summary.

Raises:

Type Description
ValueError

If format is unsupported.

Source code in src/urban_mapper/modules/imputer/imputers/address_geo_imputer.py
def preview(self, format: str = "ascii") -> Any:
    """Preview the imputer configuration.

    Args:
        format: Output format ("ascii" or "json"). Defaults to "ascii".

    Returns:
        Any: Configuration summary.

    Raises:
        ValueError: If format is unsupported.
    """
    if format == "ascii":
        lines = [
            f"Imputer: AddressGeoImputer",
            f"  Action: Impute '{self.latitude_column}' and '{self.longitude_column}' "
            f"using addresses from '{self.address_column}'",
        ]
        if self.data_id:
            lines.append(f"  Data ID: '{self.data_id}'")

        return "\n".join(lines)
    elif format == "json":
        return {
            "imputer": "AddressGeoImputer",
            "action": f"Impute '{self.latitude_column}' and '{self.longitude_column}' "
            f"using addresses from '{self.address_column}'",
            f"data_id": self.data_id,
        }
    else:
        raise ValueError(f"Unsupported format '{format}'")

ImputerFactory

Factory for creating and configuring geographic imputers.

Offers a fluent chaining-methods-based API to instantiate imputers, configure settings, and apply them.

Attributes:

Name Type Description
_imputer_type str

Type of imputer to create.

_latitude_column str

Column for latitude values.

_longitude_column str

Column for longitude values.

Examples:

>>> import urban_mapper as um
>>> factory = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")        ...     .on_columns(longitude_column="lng", latitude_column="lat")
>>> gdf = factory.transform(data_gdf, urban_layer)
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
@beartype
class ImputerFactory:
    """Factory for creating and configuring geographic imputers.

    Offers a fluent chaining-methods-based API to instantiate imputers, configure settings, and apply them.

    Attributes:
        _imputer_type (str): Type of imputer to create.
        _latitude_column (str): Column for latitude values.
        _longitude_column (str): Column for longitude values.

    Examples:
        >>> import urban_mapper as um
        >>> factory = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")\
        ...     .on_columns(longitude_column="lng", latitude_column="lat")
        >>> gdf = factory.transform(data_gdf, urban_layer)
    """

    def __init__(self):
        self._data_id: Optional[str] = None
        self._imputer_type: Optional[str] = None
        self._latitude_column: Optional[str] = None
        self._longitude_column: Optional[str] = None
        self._extra_params: Dict[str, Any] = {}
        self._instance: Optional[GeoImputerBase] = None
        self._preview: Optional[dict] = None

    @reset_attributes_before(["_imputer_type", "_latitude_column", "_longitude_column"])
    def with_type(self, primitive_type: str) -> "ImputerFactory":
        """Set the imputer type to instantiate.

        Args:
            primitive_type: Imputer type (e.g., "SimpleGeoImputer").

        Returns:
            ImputerFactory: Self for chaining.

        Raises:
            ValueError: If primitive_type is not in IMPUTER_REGISTRY.

        !!! tip
            Check IMPUTER_REGISTRY keys for valid imputer types.
        """
        if self._imputer_type is not None:
            logger.log(
                "DEBUG_MID",
                f"WARNING: Imputer method already set to '{self._imputer_type}'. Overwriting.",
            )
            self._imputer_type = None

        if primitive_type not in IMPUTER_REGISTRY:
            available = list(IMPUTER_REGISTRY.keys())
            match, score = process.extractOne(primitive_type, available)
            if score > 80:
                suggestion = f" Maybe you meant '{match}'?"
            else:
                suggestion = ""
            raise ValueError(
                f"Unknown imputer method '{primitive_type}'. Available: {', '.join(available)}.{suggestion}"
            )
        self._imputer_type = primitive_type
        logger.log(
            "DEBUG_LOW",
            f"WITH_TYPE: Initialised ImputerFactory with imputer_type={primitive_type}",
        )
        return self

    def with_data(self, data_id: str) -> "ImputerFactory":
        """Set the data ID to perform impute.

        Args:
            data_id: ID of the dataset to be transformed.

        Returns:
            ImputerFactory: Self for chaining.

        Raises:
            ValueError: If primitive_type is not in IMPUTER_REGISTRY.

        !!! tip
            Check IMPUTER_REGISTRY keys for valid imputer types.
        """
        if self._data_id is not None:
            logger.log(
                "DEBUG_MID",
                f"WARNING: Data ID already set to '{self._data_id}'. Overwriting.",
            )
            self._data_id = None

        self._data_id = data_id
        logger.log(
            "DEBUG_LOW",
            f"WITH_DATA: Initialised ImputerFactory with data_id={data_id}",
        )
        return self

    def on_columns(
        self,
        longitude_column: str,
        latitude_column: str,
        **extra_params,
    ) -> "ImputerFactory":
        """Configure latitude and longitude columns.

        Args:
            longitude_column: Column name for longitude.
            latitude_column: Column name for latitude.
            **extra_params: Any other argument to be passed to a child class, such as address to `AddressGeoImputer`.

        Returns:
            ImputerFactory: Self for chaining.
        """
        self._longitude_column = longitude_column
        self._latitude_column = latitude_column
        self._extra_params = extra_params
        logger.log(
            "DEBUG_LOW",
            f"ON_COLUMNS: Initialised ImputerFactory with "
            f"longitude_column={longitude_column}, latitude_column={latitude_column}",
            f"extra_params={self._extra_params}",
        )
        return self

    @require_attributes_not_none(
        "_imputer_type",
        "_latitude_column",
        "_longitude_column",
    )
    def transform(
        self,
        input_geodataframe: Union[Dict[str, gpd.GeoDataFrame], gpd.GeoDataFrame],
        urban_layer: UrbanLayerBase,
    ) -> Union[
        Dict[str, gpd.GeoDataFrame],
        gpd.GeoDataFrame,
    ]:
        """Apply the configured imputer to data.

        Args:
            input_geodataframe: one or more `GeoDataFrame` to process.
            urban_layer: Urban layer for context.

        Returns:
            Union[Dict[str, GeoDataFrame], GeoDataFrame]: Imputed data.

        Raises:
            ValueError: If configuration is incomplete.

        !!! note
            Call with_type() and on_columns() before transform().
        """
        imputer_class = IMPUTER_REGISTRY[self._imputer_type]
        self._instance = imputer_class(
            latitude_column=self._latitude_column,
            longitude_column=self._longitude_column,
            data_id=self._data_id,
            **self._extra_params,
        )

        if (
            isinstance(input_geodataframe, Dict)
            and self._data_id is not None
            and self._data_id not in input_geodataframe
        ):
            print(
                "WARNING: ",
                f"Data ID {self._data_id} was not found in the list of dataframes ",
                "No input transformation will be executed ",
            )

        return self._instance.transform(input_geodataframe, urban_layer)

    def build(self) -> GeoImputerBase:
        """Build and return an imputer instance without applying it.

        This method creates and returns an imputer instance without immediately applying
        it to data. It is primarily intended for use in the `UrbanPipeline`, where the
        actual imputation is deferred until pipeline execution.

        !!! note "To Keep In Mind"
            For most use cases outside of pipelines, using `transform()` is preferred as it
            directly applies the imputer and returns the imputed data.

        Returns:
            A GeoImputerBase instance configured and ready to use.

        Raises:
            ValueError: If the imputer type or latitude/longitude columns have not been specified.

        Examples:
            >>> # Creating a pipeline component
            >>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")\
            ...     .on_columns(longitude_column="lng", latitude_column="lat")\
            ...     .build()
            >>> pipeline.add_imputer(imputer_component)
        """
        logger.log(
            "DEBUG_MID",
            "WARNING: build() should only be used in UrbanPipeline. In other cases, "
            "using transform() is a better choice.",
        )
        if self._imputer_type is None:
            raise ValueError("Imputer type must be specified. Call with_type() first.")
        if self._latitude_column is None or self._longitude_column is None:
            raise ValueError(
                "Latitude and longitude columns must be specified. Call on_columns() first."
            )
        imputer_class = IMPUTER_REGISTRY[self._imputer_type]
        self._instance = imputer_class(
            latitude_column=self._latitude_column,
            longitude_column=self._longitude_column,
            data_id=self._data_id,
            **self._extra_params,
        )
        if self._preview is not None:
            self.preview(format=self._preview["format"])
        return self._instance

    def preview(self, format: str = "ascii") -> None:
        """Display a preview of the imputer configuration and settings.

        This method generates and displays a preview of the imputer, showing its
        configuration, settings, and other metadata. The preview can be displayed
        in different formats.

        Args:
            format: The format to display the preview in (default: "ascii").

                - [x] "ascii": Text-based format for terminal display
                - [x] "json": JSON-formatted data for programmatic use

        Raises:
            ValueError: If an unsupported format is specified.

        Note:
            This method requires an imputer instance to be available. Call build()
            or transform() first to create an instance.

        Examples:
            >>> imputer = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")\
            ...     .on_columns(longitude_column="lng", latitude_column="lat")
            >>> # Build the imputer instance
            >>> imputer.build()
            >>> # Display a preview
            >>> imputer.preview()
            >>> # Or in JSON format
            >>> imputer.preview(format="json")
        """
        if self._instance is None:
            print("No imputer instance available to preview. Call build() first.")
            return
        if hasattr(self._instance, "preview"):
            preview_data = self._instance.preview(format=format)
            if format == "ascii":
                print(preview_data)
            elif format == "json":
                print(json.dumps(preview_data, indent=2))
            else:
                raise ValueError(f"Unsupported format '{format}'")
        else:
            print("Preview not supported for this imputer instance.")

    def with_preview(self, format: str = "ascii") -> "ImputerFactory":
        """Configure the factory to display a preview after building.

        This method configures the factory to automatically display a preview after
        building an imputer with `build()`. It's a convenient way to inspect the imputer
        configuration without having to call `preview()` separately.

        Args:
            format: The format to display the preview in (default: "ascii").

                - [x] "ascii": Text-based format for terminal display
                - [x] "json": JSON-formatted data for programmatic use

        Returns:
            The ImputerFactory instance for method chaining.

        Examples:
            >>> # Auto-preview after building
            >>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")\
            ...     .on_columns(longitude_column="lng", latitude_column="lat")\
            ...     .with_preview(format="json")\
            ...     .build()
        """
        self._preview = {"format": format}
        return self

with_type(primitive_type)

Set the imputer type to instantiate.

Parameters:

Name Type Description Default
primitive_type str

Imputer type (e.g., "SimpleGeoImputer").

required

Returns:

Name Type Description
ImputerFactory ImputerFactory

Self for chaining.

Raises:

Type Description
ValueError

If primitive_type is not in IMPUTER_REGISTRY.

Tip

Check IMPUTER_REGISTRY keys for valid imputer types.

Source code in src/urban_mapper/modules/imputer/imputer_factory.py
@reset_attributes_before(["_imputer_type", "_latitude_column", "_longitude_column"])
def with_type(self, primitive_type: str) -> "ImputerFactory":
    """Set the imputer type to instantiate.

    Args:
        primitive_type: Imputer type (e.g., "SimpleGeoImputer").

    Returns:
        ImputerFactory: Self for chaining.

    Raises:
        ValueError: If primitive_type is not in IMPUTER_REGISTRY.

    !!! tip
        Check IMPUTER_REGISTRY keys for valid imputer types.
    """
    if self._imputer_type is not None:
        logger.log(
            "DEBUG_MID",
            f"WARNING: Imputer method already set to '{self._imputer_type}'. Overwriting.",
        )
        self._imputer_type = None

    if primitive_type not in IMPUTER_REGISTRY:
        available = list(IMPUTER_REGISTRY.keys())
        match, score = process.extractOne(primitive_type, available)
        if score > 80:
            suggestion = f" Maybe you meant '{match}'?"
        else:
            suggestion = ""
        raise ValueError(
            f"Unknown imputer method '{primitive_type}'. Available: {', '.join(available)}.{suggestion}"
        )
    self._imputer_type = primitive_type
    logger.log(
        "DEBUG_LOW",
        f"WITH_TYPE: Initialised ImputerFactory with imputer_type={primitive_type}",
    )
    return self

on_columns(longitude_column, latitude_column, **extra_params)

Configure latitude and longitude columns.

Parameters:

Name Type Description Default
longitude_column str

Column name for longitude.

required
latitude_column str

Column name for latitude.

required
**extra_params

Any other argument to be passed to a child class, such as address to AddressGeoImputer.

{}

Returns:

Name Type Description
ImputerFactory ImputerFactory

Self for chaining.

Source code in src/urban_mapper/modules/imputer/imputer_factory.py
def on_columns(
    self,
    longitude_column: str,
    latitude_column: str,
    **extra_params,
) -> "ImputerFactory":
    """Configure latitude and longitude columns.

    Args:
        longitude_column: Column name for longitude.
        latitude_column: Column name for latitude.
        **extra_params: Any other argument to be passed to a child class, such as address to `AddressGeoImputer`.

    Returns:
        ImputerFactory: Self for chaining.
    """
    self._longitude_column = longitude_column
    self._latitude_column = latitude_column
    self._extra_params = extra_params
    logger.log(
        "DEBUG_LOW",
        f"ON_COLUMNS: Initialised ImputerFactory with "
        f"longitude_column={longitude_column}, latitude_column={latitude_column}",
        f"extra_params={self._extra_params}",
    )
    return self

transform(input_geodataframe, urban_layer)

Apply the configured imputer to data.

Parameters:

Name Type Description Default
input_geodataframe Union[Dict[str, GeoDataFrame], GeoDataFrame]

one or more GeoDataFrame to process.

required
urban_layer UrbanLayerBase

Urban layer for context.

required

Returns:

Type Description
Union[Dict[str, GeoDataFrame], GeoDataFrame]

Union[Dict[str, GeoDataFrame], GeoDataFrame]: Imputed data.

Raises:

Type Description
ValueError

If configuration is incomplete.

Note

Call with_type() and on_columns() before transform().

Source code in src/urban_mapper/modules/imputer/imputer_factory.py
@require_attributes_not_none(
    "_imputer_type",
    "_latitude_column",
    "_longitude_column",
)
def transform(
    self,
    input_geodataframe: Union[Dict[str, gpd.GeoDataFrame], gpd.GeoDataFrame],
    urban_layer: UrbanLayerBase,
) -> Union[
    Dict[str, gpd.GeoDataFrame],
    gpd.GeoDataFrame,
]:
    """Apply the configured imputer to data.

    Args:
        input_geodataframe: one or more `GeoDataFrame` to process.
        urban_layer: Urban layer for context.

    Returns:
        Union[Dict[str, GeoDataFrame], GeoDataFrame]: Imputed data.

    Raises:
        ValueError: If configuration is incomplete.

    !!! note
        Call with_type() and on_columns() before transform().
    """
    imputer_class = IMPUTER_REGISTRY[self._imputer_type]
    self._instance = imputer_class(
        latitude_column=self._latitude_column,
        longitude_column=self._longitude_column,
        data_id=self._data_id,
        **self._extra_params,
    )

    if (
        isinstance(input_geodataframe, Dict)
        and self._data_id is not None
        and self._data_id not in input_geodataframe
    ):
        print(
            "WARNING: ",
            f"Data ID {self._data_id} was not found in the list of dataframes ",
            "No input transformation will be executed ",
        )

    return self._instance.transform(input_geodataframe, urban_layer)

build()

Build and return an imputer instance without applying it.

This method creates and returns an imputer instance without immediately applying it to data. It is primarily intended for use in the UrbanPipeline, where the actual imputation is deferred until pipeline execution.

To Keep In Mind

For most use cases outside of pipelines, using transform() is preferred as it directly applies the imputer and returns the imputed data.

Returns:

Type Description
GeoImputerBase

A GeoImputerBase instance configured and ready to use.

Raises:

Type Description
ValueError

If the imputer type or latitude/longitude columns have not been specified.

Examples:

>>> # Creating a pipeline component
>>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")            ...     .on_columns(longitude_column="lng", latitude_column="lat")            ...     .build()
>>> pipeline.add_imputer(imputer_component)
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
def build(self) -> GeoImputerBase:
    """Build and return an imputer instance without applying it.

    This method creates and returns an imputer instance without immediately applying
    it to data. It is primarily intended for use in the `UrbanPipeline`, where the
    actual imputation is deferred until pipeline execution.

    !!! note "To Keep In Mind"
        For most use cases outside of pipelines, using `transform()` is preferred as it
        directly applies the imputer and returns the imputed data.

    Returns:
        A GeoImputerBase instance configured and ready to use.

    Raises:
        ValueError: If the imputer type or latitude/longitude columns have not been specified.

    Examples:
        >>> # Creating a pipeline component
        >>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")\
        ...     .on_columns(longitude_column="lng", latitude_column="lat")\
        ...     .build()
        >>> pipeline.add_imputer(imputer_component)
    """
    logger.log(
        "DEBUG_MID",
        "WARNING: build() should only be used in UrbanPipeline. In other cases, "
        "using transform() is a better choice.",
    )
    if self._imputer_type is None:
        raise ValueError("Imputer type must be specified. Call with_type() first.")
    if self._latitude_column is None or self._longitude_column is None:
        raise ValueError(
            "Latitude and longitude columns must be specified. Call on_columns() first."
        )
    imputer_class = IMPUTER_REGISTRY[self._imputer_type]
    self._instance = imputer_class(
        latitude_column=self._latitude_column,
        longitude_column=self._longitude_column,
        data_id=self._data_id,
        **self._extra_params,
    )
    if self._preview is not None:
        self.preview(format=self._preview["format"])
    return self._instance

preview(format='ascii')

Display a preview of the imputer configuration and settings.

This method generates and displays a preview of the imputer, showing its configuration, settings, and other metadata. The preview can be displayed in different formats.

Parameters:

Name Type Description Default
format str

The format to display the preview in (default: "ascii").

  • "ascii": Text-based format for terminal display
  • "json": JSON-formatted data for programmatic use
'ascii'

Raises:

Type Description
ValueError

If an unsupported format is specified.

Note

This method requires an imputer instance to be available. Call build() or transform() first to create an instance.

Examples:

>>> imputer = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")            ...     .on_columns(longitude_column="lng", latitude_column="lat")
>>> # Build the imputer instance
>>> imputer.build()
>>> # Display a preview
>>> imputer.preview()
>>> # Or in JSON format
>>> imputer.preview(format="json")
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
def preview(self, format: str = "ascii") -> None:
    """Display a preview of the imputer configuration and settings.

    This method generates and displays a preview of the imputer, showing its
    configuration, settings, and other metadata. The preview can be displayed
    in different formats.

    Args:
        format: The format to display the preview in (default: "ascii").

            - [x] "ascii": Text-based format for terminal display
            - [x] "json": JSON-formatted data for programmatic use

    Raises:
        ValueError: If an unsupported format is specified.

    Note:
        This method requires an imputer instance to be available. Call build()
        or transform() first to create an instance.

    Examples:
        >>> imputer = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")\
        ...     .on_columns(longitude_column="lng", latitude_column="lat")
        >>> # Build the imputer instance
        >>> imputer.build()
        >>> # Display a preview
        >>> imputer.preview()
        >>> # Or in JSON format
        >>> imputer.preview(format="json")
    """
    if self._instance is None:
        print("No imputer instance available to preview. Call build() first.")
        return
    if hasattr(self._instance, "preview"):
        preview_data = self._instance.preview(format=format)
        if format == "ascii":
            print(preview_data)
        elif format == "json":
            print(json.dumps(preview_data, indent=2))
        else:
            raise ValueError(f"Unsupported format '{format}'")
    else:
        print("Preview not supported for this imputer instance.")

with_preview(format='ascii')

Configure the factory to display a preview after building.

This method configures the factory to automatically display a preview after building an imputer with build(). It's a convenient way to inspect the imputer configuration without having to call preview() separately.

Parameters:

Name Type Description Default
format str

The format to display the preview in (default: "ascii").

  • "ascii": Text-based format for terminal display
  • "json": JSON-formatted data for programmatic use
'ascii'

Returns:

Type Description
ImputerFactory

The ImputerFactory instance for method chaining.

Examples:

>>> # Auto-preview after building
>>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")            ...     .on_columns(longitude_column="lng", latitude_column="lat")            ...     .with_preview(format="json")            ...     .build()
Source code in src/urban_mapper/modules/imputer/imputer_factory.py
def with_preview(self, format: str = "ascii") -> "ImputerFactory":
    """Configure the factory to display a preview after building.

    This method configures the factory to automatically display a preview after
    building an imputer with `build()`. It's a convenient way to inspect the imputer
    configuration without having to call `preview()` separately.

    Args:
        format: The format to display the preview in (default: "ascii").

            - [x] "ascii": Text-based format for terminal display
            - [x] "json": JSON-formatted data for programmatic use

    Returns:
        The ImputerFactory instance for method chaining.

    Examples:
        >>> # Auto-preview after building
        >>> imputer_component = um.UrbanMapper().imputer.with_type("SimpleGeoImputer")\
        ...     .on_columns(longitude_column="lng", latitude_column="lat")\
        ...     .with_preview(format="json")\
        ...     .build()
    """
    self._preview = {"format": format}
    return self
Provost Simon