Skip to content

Enrichers

What is the enricher module?

The enricher module is the heart of UrbanMapper’s analysis—they take your urban layer and transform it into meaningful statistics, like counting taxi pickups at each intersection or averaging building heights per neighborhood given your loaded urban data. Meanwhile, we recommend to look through the Example's Enricher for a more hands-on introduction about the enricher module and its usage.

Documentation Under Alpha Construction

This documentation is in its early stages and still being developed. The API may therefore change, and some parts might be incomplete or inaccurate.

Use at your own risk, and please report anything that seems incorrect / outdated you find.

Open An Issue!

EnricherBase

Bases: ABC

Base class for all data enrichers in UrbanMapper

This abstract class defines the common interface that all enricher implementations must implement. Enrichers add data or derived information to urban layers, enhancing them with additional attributes, statistics, or related data.

Enrichers typically perform operations like:

  • Aggregating data values (sum, mean, median, etc.)
  • Counting features within areas or near points
  • Computing statistics on related data
  • Joining external information to the urban layer

Attributes:

Name Type Description
config

Configuration object for the enricher, containing parameters that control the enrichment process.

Source code in src/urban_mapper/modules/enricher/abc_enricher.py
@beartype
class EnricherBase(ABC):
    """Base class for all data enrichers in `UrbanMapper`

    This abstract class defines the common interface that all enricher implementations
    must implement. Enrichers add data or derived information to urban layers,
    enhancing them with additional attributes, statistics, or related data.

    !!! note "Enrichers typically perform operations like:"

        - [x] Aggregating data values (sum, mean, median, etc.)
        - [x] Counting features within areas or near points
        - [x] Computing statistics on related data
        - [x] Joining external information to the urban layer

    Attributes:
        config: Configuration object for the enricher, containing parameters
            that control the enrichment process.
    """

    def __init__(self, config: Optional[Any] = None) -> None:
        from urban_mapper.modules.enricher.factory.config import EnricherConfig

        self.config = config or EnricherConfig()

    @abstractmethod
    def _enrich(
        self,
        input_geodataframe: gpd.GeoDataFrame,
        urban_layer: UrbanLayerBase,
        **kwargs,
    ) -> UrbanLayerBase:
        """Internal method to carry out the enrichment.

        This method must be fleshed out by subclasses to define the nitty-gritty
        of how enrichment happens.

        !!! warning "Method Not Implemented"
            Subclasses must implement this. It’s where the logic of enrichment takes place.

        Args:
            input_geodataframe: The GeoDataFrame with data for enrichment.
            urban_layer: The urban layer to be enriched.
            **kwargs: Extra parameters to tweak the enrichment.

        Returns:
            The enriched urban layer.
        """
        NotImplementedError("_enrich method not implemented.")

    @abstractmethod
    def preview(self, format: str = "ascii") -> Any:
        """Generate a preview of the enricher instance.

        Produces a summary of the enricher for a quick peek during `UrbanMapper`’s workflow.

        !!! warning "Method Not Implemented"
            Subclasses must implement this to offer a preview of the enricher’s setup and data.

        Args:
            format: Output format for the preview. Options include:

                - [x] `ascii`: Text-based format for terminal display
                - [x] `json`: JSON-formatted data for programmatic use

        Returns:
            A representation of the enricher in the requested format. Type varies by format.

        Raises:
            ValueError: If an unsupported format is requested.
        """
        NotImplementedError("Preview method not implemented.")

    def set_layer_data_source(
        self, urban_layer: UrbanLayerBase, index: Index
    ) -> UrbanLayerBase:
        """Initialized UrbanLayer data_id column with source name based on index list argument.

        Args:
            urban_layer: Urban layer to change.
            index: Index list of the Urban layer to change.

        Returns:
            Urban layer with new column data_id.
        """
        if self.config.data_id:
            if "data_id" not in urban_layer.layer:
                urban_layer.layer["data_id"] = pd.Series(np.nan, dtype="object")

            urban_layer.layer.loc[index, "data_id"] = self.config.data_id

        return urban_layer

    def enrich(
        self,
        input_geodataframe: Union[Dict[str, gpd.GeoDataFrame], gpd.GeoDataFrame],
        urban_layer: UrbanLayerBase,
        **kwargs,
    ) -> UrbanLayerBase:
        """Enrich an `urban layer` with data from the input `GeoDataFrame`.

        The main public method for wielding enrichers. It hands off to the
        implementation-specific `_enrich` method after any needed validation.

        Args:
            input_geodataframe: one or more `GeoDataFrame` with data to enrich with.
            urban_layer: Urban layer to beef up with data from input_geodataframe.
            **kwargs: Additional bespoke parameters to customise enrichment.

        Returns:
            The enriched urban layer sporting new columns or attributes.

        Raises:
            ValueError: If the enrichment can’t be done.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> streets = mapper.urban_layer.OSMNXStreets().from_place("London, UK")
            >>> taxi_trips = mapper.loader.from_file("taxi_trips.csv")\
            ...     .with_columns(longitude_column="pickup_lng", latitude_column="pickup_lat")\
            ...     .load()
            >>> enricher = mapper.enricher\
            ...     .with_type("SingleAggregatorEnricher")\
            ...     .with_data(group_by="nearest_street")\
            ...     .count_by(output_column="trip_count")\
            ...     .build()
            >>> enriched_streets = enricher.enrich(taxi_trips, streets)
        """
        if isinstance(input_geodataframe, gpd.GeoDataFrame):
            return self._enrich(input_geodataframe, urban_layer, **kwargs)
        else:
            enriched_layer = urban_layer

            for key, gdf in input_geodataframe.items():
                if self.config.data_id is None or self.config.data_id == key:
                    enriched_layer = self._enrich(gdf, enriched_layer, **kwargs)

            return enriched_layer

_enrich(input_geodataframe, urban_layer, **kwargs) abstractmethod

Internal method to carry out the enrichment.

This method must be fleshed out by subclasses to define the nitty-gritty of how enrichment happens.

Method Not Implemented

Subclasses must implement this. It’s where the logic of enrichment takes place.

Parameters:

Name Type Description Default
input_geodataframe GeoDataFrame

The GeoDataFrame with data for enrichment.

required
urban_layer UrbanLayerBase

The urban layer to be enriched.

required
**kwargs

Extra parameters to tweak the enrichment.

{}

Returns:

Type Description
UrbanLayerBase

The enriched urban layer.

Source code in src/urban_mapper/modules/enricher/abc_enricher.py
@abstractmethod
def _enrich(
    self,
    input_geodataframe: gpd.GeoDataFrame,
    urban_layer: UrbanLayerBase,
    **kwargs,
) -> UrbanLayerBase:
    """Internal method to carry out the enrichment.

    This method must be fleshed out by subclasses to define the nitty-gritty
    of how enrichment happens.

    !!! warning "Method Not Implemented"
        Subclasses must implement this. It’s where the logic of enrichment takes place.

    Args:
        input_geodataframe: The GeoDataFrame with data for enrichment.
        urban_layer: The urban layer to be enriched.
        **kwargs: Extra parameters to tweak the enrichment.

    Returns:
        The enriched urban layer.
    """
    NotImplementedError("_enrich method not implemented.")

enrich(input_geodataframe, urban_layer, **kwargs)

Enrich an urban layer with data from the input GeoDataFrame.

The main public method for wielding enrichers. It hands off to the implementation-specific _enrich method after any needed validation.

Parameters:

Name Type Description Default
input_geodataframe Union[Dict[str, GeoDataFrame], GeoDataFrame]

one or more GeoDataFrame with data to enrich with.

required
urban_layer UrbanLayerBase

Urban layer to beef up with data from input_geodataframe.

required
**kwargs

Additional bespoke parameters to customise enrichment.

{}

Returns:

Type Description
UrbanLayerBase

The enriched urban layer sporting new columns or attributes.

Raises:

Type Description
ValueError

If the enrichment can’t be done.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> streets = mapper.urban_layer.OSMNXStreets().from_place("London, UK")
>>> taxi_trips = mapper.loader.from_file("taxi_trips.csv")            ...     .with_columns(longitude_column="pickup_lng", latitude_column="pickup_lat")            ...     .load()
>>> enricher = mapper.enricher            ...     .with_type("SingleAggregatorEnricher")            ...     .with_data(group_by="nearest_street")            ...     .count_by(output_column="trip_count")            ...     .build()
>>> enriched_streets = enricher.enrich(taxi_trips, streets)
Source code in src/urban_mapper/modules/enricher/abc_enricher.py
def enrich(
    self,
    input_geodataframe: Union[Dict[str, gpd.GeoDataFrame], gpd.GeoDataFrame],
    urban_layer: UrbanLayerBase,
    **kwargs,
) -> UrbanLayerBase:
    """Enrich an `urban layer` with data from the input `GeoDataFrame`.

    The main public method for wielding enrichers. It hands off to the
    implementation-specific `_enrich` method after any needed validation.

    Args:
        input_geodataframe: one or more `GeoDataFrame` with data to enrich with.
        urban_layer: Urban layer to beef up with data from input_geodataframe.
        **kwargs: Additional bespoke parameters to customise enrichment.

    Returns:
        The enriched urban layer sporting new columns or attributes.

    Raises:
        ValueError: If the enrichment can’t be done.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> streets = mapper.urban_layer.OSMNXStreets().from_place("London, UK")
        >>> taxi_trips = mapper.loader.from_file("taxi_trips.csv")\
        ...     .with_columns(longitude_column="pickup_lng", latitude_column="pickup_lat")\
        ...     .load()
        >>> enricher = mapper.enricher\
        ...     .with_type("SingleAggregatorEnricher")\
        ...     .with_data(group_by="nearest_street")\
        ...     .count_by(output_column="trip_count")\
        ...     .build()
        >>> enriched_streets = enricher.enrich(taxi_trips, streets)
    """
    if isinstance(input_geodataframe, gpd.GeoDataFrame):
        return self._enrich(input_geodataframe, urban_layer, **kwargs)
    else:
        enriched_layer = urban_layer

        for key, gdf in input_geodataframe.items():
            if self.config.data_id is None or self.config.data_id == key:
                enriched_layer = self._enrich(gdf, enriched_layer, **kwargs)

        return enriched_layer

preview(format='ascii') abstractmethod

Generate a preview of the enricher instance.

Produces a summary of the enricher for a quick peek during UrbanMapper’s workflow.

Method Not Implemented

Subclasses must implement this to offer a preview of the enricher’s setup and data.

Parameters:

Name Type Description Default
format str

Output format for the preview. Options include:

  • ascii: Text-based format for terminal display
  • json: JSON-formatted data for programmatic use
'ascii'

Returns:

Type Description
Any

A representation of the enricher in the requested format. Type varies by format.

Raises:

Type Description
ValueError

If an unsupported format is requested.

Source code in src/urban_mapper/modules/enricher/abc_enricher.py
@abstractmethod
def preview(self, format: str = "ascii") -> Any:
    """Generate a preview of the enricher instance.

    Produces a summary of the enricher for a quick peek during `UrbanMapper`’s workflow.

    !!! warning "Method Not Implemented"
        Subclasses must implement this to offer a preview of the enricher’s setup and data.

    Args:
        format: Output format for the preview. Options include:

            - [x] `ascii`: Text-based format for terminal display
            - [x] `json`: JSON-formatted data for programmatic use

    Returns:
        A representation of the enricher in the requested format. Type varies by format.

    Raises:
        ValueError: If an unsupported format is requested.
    """
    NotImplementedError("Preview method not implemented.")

SingleAggregatorEnricher

Bases: EnricherBase

Enricher Using a Single Aggregator For Urban Layers.

Uses one aggregator to enrich urban layers, adding results as a new column. The aggregator decides how input data is processed (e.g., counted, averaged).

Attributes:

Name Type Description
config

Config object for the enricher.

aggregator

Aggregator computing stats or counts.

output_column

Column name for aggregated results.

debug

Whether to include debug info.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> streets = mapper.urban_layer.OSMNXStreets().from_place("London, UK")
>>> trips = mapper.loader.from_file("trips.csv")        ...     .with_columns(longitude_column="lng", latitude_column="lat")        ...     .load()
>>> enricher = mapper.enricher        ...     .with_data(group_by="nearest_street")        ...     .count_by(output_column="trip_count")        ...     .build()
>>> enriched_streets = enricher.enrich(trips, streets)
Source code in src/urban_mapper/modules/enricher/enrichers/single_aggregator_enricher.py
@beartype
class SingleAggregatorEnricher(EnricherBase):
    """Enricher Using a `Single Aggregator` For `Urban Layers`.

    Uses one aggregator to enrich `urban layers`, adding `results as a new column`.
    The aggregator decides how input data is processed (e.g., `counted`, `averaged`).

    Attributes:
        config: Config object for the enricher.
        aggregator: Aggregator computing stats or counts.
        output_column: Column name for aggregated results.
        debug: Whether to include debug info.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> streets = mapper.urban_layer.OSMNXStreets().from_place("London, UK")
        >>> trips = mapper.loader.from_file("trips.csv")\
        ...     .with_columns(longitude_column="lng", latitude_column="lat")\
        ...     .load()
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="nearest_street")\
        ...     .count_by(output_column="trip_count")\
        ...     .build()
        >>> enriched_streets = enricher.enrich(trips, streets)
    """

    def __init__(
        self,
        aggregator: BaseAggregator,
        output_column: str = "aggregated_value",
        config: EnricherConfig = None,
    ) -> None:
        super().__init__(config)
        self.aggregator = aggregator
        self.output_column = output_column
        self.debug = config.debug

    def _enrich(
        self,
        input_geodataframe: gpd.GeoDataFrame,
        urban_layer: UrbanLayerBase,
        **kwargs,
    ) -> UrbanLayerBase:
        """Enrich an `urban layer` with an `aggregator`.

        Aggregates data from the input `GeoDataFrame` and adds it to the urban layer.

        Args:
            input_geodataframe: `GeoDataFrame` with enrichment data.
            urban_layer: Urban layer to enrich.
            **kwargs: Extra params for customisation.

        Returns:
            Enriched urban layer with new columns.

        Raises:
            ValueError: If aggregation fails.
        """
        aggregated_df = self.aggregator.aggregate(input_geodataframe)
        enriched_values = (
            aggregated_df["value"].reindex(urban_layer.layer.index).fillna(0)
        )
        urban_layer = self.set_layer_data_source(urban_layer, aggregated_df.index)
        urban_layer.layer[self.output_column] = enriched_values
        if self.debug:
            indices_values = (
                aggregated_df["indices"]
                .reindex(urban_layer.layer.index)
                .apply(lambda x: x if isinstance(x, list) else [])
            )
            urban_layer.layer[f"DEBUG_{self.output_column}"] = indices_values
        return urban_layer

    def preview(self, format: str = "ascii") -> Any:
        """Generate a preview of this enricher.

        Creates a summary for quick inspection.

        Args:
            format: Output format—"ascii" (text) or "json" (dict).

        Returns:
            Preview in the requested format.
        """
        preview_builder = PreviewBuilder(self.config, ENRICHER_REGISTRY)
        return preview_builder.build_preview(format=format)

_enrich(input_geodataframe, urban_layer, **kwargs)

Enrich an urban layer with an aggregator.

Aggregates data from the input GeoDataFrame and adds it to the urban layer.

Parameters:

Name Type Description Default
input_geodataframe GeoDataFrame

GeoDataFrame with enrichment data.

required
urban_layer UrbanLayerBase

Urban layer to enrich.

required
**kwargs

Extra params for customisation.

{}

Returns:

Type Description
UrbanLayerBase

Enriched urban layer with new columns.

Raises:

Type Description
ValueError

If aggregation fails.

Source code in src/urban_mapper/modules/enricher/enrichers/single_aggregator_enricher.py
def _enrich(
    self,
    input_geodataframe: gpd.GeoDataFrame,
    urban_layer: UrbanLayerBase,
    **kwargs,
) -> UrbanLayerBase:
    """Enrich an `urban layer` with an `aggregator`.

    Aggregates data from the input `GeoDataFrame` and adds it to the urban layer.

    Args:
        input_geodataframe: `GeoDataFrame` with enrichment data.
        urban_layer: Urban layer to enrich.
        **kwargs: Extra params for customisation.

    Returns:
        Enriched urban layer with new columns.

    Raises:
        ValueError: If aggregation fails.
    """
    aggregated_df = self.aggregator.aggregate(input_geodataframe)
    enriched_values = (
        aggregated_df["value"].reindex(urban_layer.layer.index).fillna(0)
    )
    urban_layer = self.set_layer_data_source(urban_layer, aggregated_df.index)
    urban_layer.layer[self.output_column] = enriched_values
    if self.debug:
        indices_values = (
            aggregated_df["indices"]
            .reindex(urban_layer.layer.index)
            .apply(lambda x: x if isinstance(x, list) else [])
        )
        urban_layer.layer[f"DEBUG_{self.output_column}"] = indices_values
    return urban_layer

preview(format='ascii')

Generate a preview of this enricher.

Creates a summary for quick inspection.

Parameters:

Name Type Description Default
format str

Output format—"ascii" (text) or "json" (dict).

'ascii'

Returns:

Type Description
Any

Preview in the requested format.

Source code in src/urban_mapper/modules/enricher/enrichers/single_aggregator_enricher.py
def preview(self, format: str = "ascii") -> Any:
    """Generate a preview of this enricher.

    Creates a summary for quick inspection.

    Args:
        format: Output format—"ascii" (text) or "json" (dict).

    Returns:
        Preview in the requested format.
    """
    preview_builder = PreviewBuilder(self.config, ENRICHER_REGISTRY)
    return preview_builder.build_preview(format=format)

EnricherFactory

Factory Class For Creating and Configuring Data Enrichers.

This class offers a fluent, chaining-methods interface for crafting and setting up data enrichers in the UrbanMapper workflow. Enrichers empower spatial aggregation and analysis on geographic data—like counting points in polygons or tallying stats for regions.

The factory handles the nitty-gritty of enricher instantiation, configuration, and application, ensuring a uniform workflow no matter the enricher type.

Attributes:

Name Type Description
config

Configuration settings steering the enricher.

_instance Optional[EnricherBase]

The underlying enricher instance (internal use only).

_preview Optional[dict]

Preview configuration (internal use only).

Examples:

>>> import urban_mapper as um
>>> import geopandas as gpd
>>> mapper = um.UrbanMapper()
>>> hoods = mapper.urban_layer.region_neighborhoods().from_place("London, UK")
>>> points = gpd.read_file("points.geojson")
>>> # Count points per neighbourhood
>>> enriched_hoods = mapper.enricher        ...     .with_type("SingleAggregatorEnricher")\ # By default not needed as this is the default / only one at the moment.
...     .with_data(group_by="neighbourhood")        ...     .count_by(output_column="point_count")        ...     .build()        ...     .enrich(points, hoods)
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
@beartype
class EnricherFactory:
    """Factory Class For Creating and Configuring Data `Enrichers`.

    This class offers a fluent, chaining-methods interface for crafting and setting up
    data `enrichers` in the `UrbanMapper` workflow. `Enrichers` empower spatial aggregation
    and analysis on geographic data—like counting points in polygons or tallying stats
    for regions.

    The factory handles the nitty-gritty of `enricher` instantiation, `configuration`,
    and `application`, ensuring a uniform workflow no matter the enricher type.

    Attributes:
        config: Configuration settings steering the enricher.
        _instance: The underlying enricher instance (internal use only).
        _preview: Preview configuration (internal use only).

    Examples:
        >>> import urban_mapper as um
        >>> import geopandas as gpd
        >>> mapper = um.UrbanMapper()
        >>> hoods = mapper.urban_layer.region_neighborhoods().from_place("London, UK")
        >>> points = gpd.read_file("points.geojson")
        >>> # Count points per neighbourhood
        >>> enriched_hoods = mapper.enricher\
        ...     .with_type("SingleAggregatorEnricher")\ # By default not needed as this is the default / only one at the moment.
        ...     .with_data(group_by="neighbourhood")\
        ...     .count_by(output_column="point_count")\
        ...     .build()\
        ...     .enrich(points, hoods)
    """

    def __init__(self):
        self.config = EnricherConfig()
        self._instance: Optional[EnricherBase] = None
        self._preview: Optional[dict] = None

    def with_data(self, *args, **kwargs) -> "EnricherFactory":
        """Specify columns to group by and values to aggregate.

        Sets up which columns to group data by and, optionally, which to pull
        values from for aggregation during enrichment.

        Args:
            group_by: Column name(s) to group by. Can be a string or list of strings.
            values_from: Column name(s) to aggregate. Optional; if wanted, must be a string.

        Returns:
            The EnricherFactory instance for chaining.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher.with_data(group_by="neighbourhood")
        """
        self.config.with_data(*args, **kwargs)
        return self

    def with_debug(self, debug: bool = True) -> "EnricherFactory":
        """Toggle debug mode for the enricher.

        Enables or disables debug mode, which can spill extra info during enrichment.

        !!! note "What Extra Info?"
            For instance, we will be able to have an extra column for each enrichments that shows which indices
            were taken from the original data to apply the enrichment. This is useful to understand
            how the enrichment was done and to debug any issues that may arise. Another one may also be
            for some Machine learning-based tasks that would require so.

        Args:
            debug: Whether to turn on debug mode (default: True). # Such a parameter might be needed when stacking `.with_debug()`, and trying to `false` the behaviour rather than deleting the line.
        Returns:
            The EnricherFactory instance for chaining.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher.with_debug(True)
        """
        self.config.debug = debug
        return self

    def aggregate_by(self, *args, **kwargs) -> "EnricherFactory":
        """Set the enricher to perform aggregation operations.

        Configures the enricher to aggregate data (e.g., `sum`, `mean`) using provided args.

        !!! tip "Available Methods"

            - [x] `sum`
            - [x] `mean`
            - [x] `median`
            - [x] `min`
            - [x] `max`

        Args:
            *args: Positional args for EnricherConfig.aggregate_by.
            **kwargs: Keyword args like `group_by`, `values_from`, `method` (e.g., "sum").

        Returns:
            The EnricherFactory instance for chaining.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher\
            ...     .with_data(group_by="neighbourhood", values_from="temp")\
            ...     .aggregate_by(method="mean", output_column="avg_temp")
        """
        self.config.aggregate_by(*args, **kwargs)
        return self

    def count_by(self, *args, **kwargs) -> "EnricherFactory":
        """Set the enricher to count features.

        Configures the enricher to count items per group—great for tallying points in areas.

        Args:
            *args: Positional args for EnricherConfig.count_by.
            **kwargs: Keyword args like `group_by`, `output_column`.

        Returns:
            The EnricherFactory instance for chaining.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher\
            ...     .with_data(group_by="pickup")\
            ...     .count_by(output_column="pickup_count")
        """
        self.config.count_by(*args, **kwargs)
        return self

    def with_type(self, primitive_type: str) -> "EnricherFactory":
        """Choose the enricher type to create.

        Sets the type of enricher, dictating the enrichment approach, from the registry.

        !!! note "At the moment only one exists"

            - [x] `SingleAggregatorEnricher` (default)

            Hence, no need use `with_type` unless you want to use a different one in the future.
            Furthermore, we kept it for compatibility with other modules.

        Args:
            primitive_type: Name of the enricher type (e.g., "SingleAggregatorEnricher").

        Returns:
            The EnricherFactory instance for chaining.

        Raises:
            ValueError: If the type isn’t in the registry.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher.with_type("SingleAggregatorEnricher")
        """
        if primitive_type not in ENRICHER_REGISTRY:
            available = list(ENRICHER_REGISTRY.keys())
            match, score = process.extractOne(primitive_type, available)
            if score > 80:
                suggestion = f" Maybe you meant '{match}'?"
            else:
                suggestion = ""
            raise ValueError(
                f"Unknown enricher type '{primitive_type}'. Available: {', '.join(available)}.{suggestion}"
            )
        self.config.with_type(primitive_type)
        return self

    def preview(self, format: str = "ascii") -> Union[None, str, dict]:
        """Show a preview of the configured enricher.

        Displays a sneak peek of the enricher setup in the chosen format.

        Args:
            format: Preview format—"ascii" (text) or "json" (dict).

        Returns:
            None for "ascii" (prints to console), dict for "json".

        Raises:
            ValueError: If format isn’t supported.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher\
            ...     .with_data(group_by="pickup")\
            ...     .count_by()\
            ...     .build()
            >>> enricher.preview()
        """
        if self._instance is None:
            print("No Enricher instance available to preview.")
            return None
        if hasattr(self._instance, "preview"):
            preview_data = self._instance.preview(format=format)
            if format == "ascii":
                print(preview_data)
            elif format == "json":
                return preview_data
            else:
                raise ValueError(f"Unsupported format '{format}'.")
        else:
            print("Preview not supported for this Enricher instance.")
        return None

    def with_preview(self, format: str = "ascii") -> "EnricherFactory":
        """Set the factory to show a preview after building.

        Configures an automatic preview post-build—handy for a quick check.

        Args:
            format: Preview format—"ascii" (default, text) or "json" (dict).

        Returns:
            The EnricherFactory instance for chaining.

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher\
            ...     .with_data(group_by="pickup")\
            ...     .count_by()\
            ...     .with_preview()
        """
        self._preview = {"format": format}
        return self

    def build(self) -> EnricherBase:
        """Build and return the configured enricher instance.

        Finalises the setup, validates it, and creates the enricher with its aggregator.

        Returns:
            An EnricherBase-derived instance tailored to the factory’s settings.

        Raises:
            ValueError: If config is invalid (e.g., missing params).

        Examples:
            >>> import urban_mapper as um
            >>> mapper = um.UrbanMapper()
            >>> enricher = mapper.enricher\
            ...     .with_type("SingleAggregatorEnricher")\
            ...     .with_data(group_by="pickup")\
            ...     .count_by(output_column="pickup_count")\
            ...     .build()
        """
        validate_group_by(self.config)
        validate_action(self.config)

        if self.config.action == "aggregate":
            method = self.config.aggregator_config["method"]
            if isinstance(method, str):
                if method not in AGGREGATION_FUNCTIONS:
                    raise ValueError(f"Unknown aggregation method '{method}'")
                aggregation_function = AGGREGATION_FUNCTIONS[method]
            elif callable(method):
                aggregation_function = method
            else:
                raise ValueError("Aggregation method must be a string or a callable")
            aggregator = SimpleAggregator(
                group_by_column=self.config.group_by[0],
                value_column=self.config.values_from[0],
                aggregation_function=aggregation_function,
            )
        elif self.config.action == "count":
            aggregator = CountAggregator(
                group_by_column=self.config.group_by[0],
                count_function=len,
            )
        else:
            raise ValueError(
                "Unknown action. Please open an issue on GitHub to request such feature."
            )

        enricher_class = ENRICHER_REGISTRY[self.config.enricher_type]
        self._instance = enricher_class(
            aggregator=aggregator,
            output_column=self.config.enricher_config["output_column"],
            config=self.config,
        )
        if self._preview:
            self.preview(format=self._preview["format"])
        return self._instance

with_data(*args, **kwargs)

Specify columns to group by and values to aggregate.

Sets up which columns to group data by and, optionally, which to pull values from for aggregation during enrichment.

Parameters:

Name Type Description Default
group_by

Column name(s) to group by. Can be a string or list of strings.

required
values_from

Column name(s) to aggregate. Optional; if wanted, must be a string.

required

Returns:

Type Description
EnricherFactory

The EnricherFactory instance for chaining.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher.with_data(group_by="neighbourhood")
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def with_data(self, *args, **kwargs) -> "EnricherFactory":
    """Specify columns to group by and values to aggregate.

    Sets up which columns to group data by and, optionally, which to pull
    values from for aggregation during enrichment.

    Args:
        group_by: Column name(s) to group by. Can be a string or list of strings.
        values_from: Column name(s) to aggregate. Optional; if wanted, must be a string.

    Returns:
        The EnricherFactory instance for chaining.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher.with_data(group_by="neighbourhood")
    """
    self.config.with_data(*args, **kwargs)
    return self

with_debug(debug=True)

Toggle debug mode for the enricher.

Enables or disables debug mode, which can spill extra info during enrichment.

What Extra Info?

For instance, we will be able to have an extra column for each enrichments that shows which indices were taken from the original data to apply the enrichment. This is useful to understand how the enrichment was done and to debug any issues that may arise. Another one may also be for some Machine learning-based tasks that would require so.

Parameters:

Name Type Description Default
debug bool

Whether to turn on debug mode (default: True). # Such a parameter might be needed when stacking .with_debug(), and trying to false the behaviour rather than deleting the line.

True

Returns: The EnricherFactory instance for chaining.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher.with_debug(True)
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def with_debug(self, debug: bool = True) -> "EnricherFactory":
    """Toggle debug mode for the enricher.

    Enables or disables debug mode, which can spill extra info during enrichment.

    !!! note "What Extra Info?"
        For instance, we will be able to have an extra column for each enrichments that shows which indices
        were taken from the original data to apply the enrichment. This is useful to understand
        how the enrichment was done and to debug any issues that may arise. Another one may also be
        for some Machine learning-based tasks that would require so.

    Args:
        debug: Whether to turn on debug mode (default: True). # Such a parameter might be needed when stacking `.with_debug()`, and trying to `false` the behaviour rather than deleting the line.
    Returns:
        The EnricherFactory instance for chaining.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher.with_debug(True)
    """
    self.config.debug = debug
    return self

with_preview(format='ascii')

Set the factory to show a preview after building.

Configures an automatic preview post-build—handy for a quick check.

Parameters:

Name Type Description Default
format str

Preview format—"ascii" (default, text) or "json" (dict).

'ascii'

Returns:

Type Description
EnricherFactory

The EnricherFactory instance for chaining.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher            ...     .with_data(group_by="pickup")            ...     .count_by()            ...     .with_preview()
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def with_preview(self, format: str = "ascii") -> "EnricherFactory":
    """Set the factory to show a preview after building.

    Configures an automatic preview post-build—handy for a quick check.

    Args:
        format: Preview format—"ascii" (default, text) or "json" (dict).

    Returns:
        The EnricherFactory instance for chaining.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="pickup")\
        ...     .count_by()\
        ...     .with_preview()
    """
    self._preview = {"format": format}
    return self

aggregate_by(*args, **kwargs)

Set the enricher to perform aggregation operations.

Configures the enricher to aggregate data (e.g., sum, mean) using provided args.

Available Methods

  • sum
  • mean
  • median
  • min
  • max

Parameters:

Name Type Description Default
*args

Positional args for EnricherConfig.aggregate_by.

()
**kwargs

Keyword args like group_by, values_from, method (e.g., "sum").

{}

Returns:

Type Description
EnricherFactory

The EnricherFactory instance for chaining.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher            ...     .with_data(group_by="neighbourhood", values_from="temp")            ...     .aggregate_by(method="mean", output_column="avg_temp")
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def aggregate_by(self, *args, **kwargs) -> "EnricherFactory":
    """Set the enricher to perform aggregation operations.

    Configures the enricher to aggregate data (e.g., `sum`, `mean`) using provided args.

    !!! tip "Available Methods"

        - [x] `sum`
        - [x] `mean`
        - [x] `median`
        - [x] `min`
        - [x] `max`

    Args:
        *args: Positional args for EnricherConfig.aggregate_by.
        **kwargs: Keyword args like `group_by`, `values_from`, `method` (e.g., "sum").

    Returns:
        The EnricherFactory instance for chaining.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="neighbourhood", values_from="temp")\
        ...     .aggregate_by(method="mean", output_column="avg_temp")
    """
    self.config.aggregate_by(*args, **kwargs)
    return self

count_by(*args, **kwargs)

Set the enricher to count features.

Configures the enricher to count items per group—great for tallying points in areas.

Parameters:

Name Type Description Default
*args

Positional args for EnricherConfig.count_by.

()
**kwargs

Keyword args like group_by, output_column.

{}

Returns:

Type Description
EnricherFactory

The EnricherFactory instance for chaining.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher            ...     .with_data(group_by="pickup")            ...     .count_by(output_column="pickup_count")
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def count_by(self, *args, **kwargs) -> "EnricherFactory":
    """Set the enricher to count features.

    Configures the enricher to count items per group—great for tallying points in areas.

    Args:
        *args: Positional args for EnricherConfig.count_by.
        **kwargs: Keyword args like `group_by`, `output_column`.

    Returns:
        The EnricherFactory instance for chaining.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="pickup")\
        ...     .count_by(output_column="pickup_count")
    """
    self.config.count_by(*args, **kwargs)
    return self

with_type(primitive_type)

Choose the enricher type to create.

Sets the type of enricher, dictating the enrichment approach, from the registry.

At the moment only one exists

  • SingleAggregatorEnricher (default)

Hence, no need use with_type unless you want to use a different one in the future. Furthermore, we kept it for compatibility with other modules.

Parameters:

Name Type Description Default
primitive_type str

Name of the enricher type (e.g., "SingleAggregatorEnricher").

required

Returns:

Type Description
EnricherFactory

The EnricherFactory instance for chaining.

Raises:

Type Description
ValueError

If the type isn’t in the registry.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher.with_type("SingleAggregatorEnricher")
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def with_type(self, primitive_type: str) -> "EnricherFactory":
    """Choose the enricher type to create.

    Sets the type of enricher, dictating the enrichment approach, from the registry.

    !!! note "At the moment only one exists"

        - [x] `SingleAggregatorEnricher` (default)

        Hence, no need use `with_type` unless you want to use a different one in the future.
        Furthermore, we kept it for compatibility with other modules.

    Args:
        primitive_type: Name of the enricher type (e.g., "SingleAggregatorEnricher").

    Returns:
        The EnricherFactory instance for chaining.

    Raises:
        ValueError: If the type isn’t in the registry.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher.with_type("SingleAggregatorEnricher")
    """
    if primitive_type not in ENRICHER_REGISTRY:
        available = list(ENRICHER_REGISTRY.keys())
        match, score = process.extractOne(primitive_type, available)
        if score > 80:
            suggestion = f" Maybe you meant '{match}'?"
        else:
            suggestion = ""
        raise ValueError(
            f"Unknown enricher type '{primitive_type}'. Available: {', '.join(available)}.{suggestion}"
        )
    self.config.with_type(primitive_type)
    return self

build()

Build and return the configured enricher instance.

Finalises the setup, validates it, and creates the enricher with its aggregator.

Returns:

Type Description
EnricherBase

An EnricherBase-derived instance tailored to the factory’s settings.

Raises:

Type Description
ValueError

If config is invalid (e.g., missing params).

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher            ...     .with_type("SingleAggregatorEnricher")            ...     .with_data(group_by="pickup")            ...     .count_by(output_column="pickup_count")            ...     .build()
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def build(self) -> EnricherBase:
    """Build and return the configured enricher instance.

    Finalises the setup, validates it, and creates the enricher with its aggregator.

    Returns:
        An EnricherBase-derived instance tailored to the factory’s settings.

    Raises:
        ValueError: If config is invalid (e.g., missing params).

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher\
        ...     .with_type("SingleAggregatorEnricher")\
        ...     .with_data(group_by="pickup")\
        ...     .count_by(output_column="pickup_count")\
        ...     .build()
    """
    validate_group_by(self.config)
    validate_action(self.config)

    if self.config.action == "aggregate":
        method = self.config.aggregator_config["method"]
        if isinstance(method, str):
            if method not in AGGREGATION_FUNCTIONS:
                raise ValueError(f"Unknown aggregation method '{method}'")
            aggregation_function = AGGREGATION_FUNCTIONS[method]
        elif callable(method):
            aggregation_function = method
        else:
            raise ValueError("Aggregation method must be a string or a callable")
        aggregator = SimpleAggregator(
            group_by_column=self.config.group_by[0],
            value_column=self.config.values_from[0],
            aggregation_function=aggregation_function,
        )
    elif self.config.action == "count":
        aggregator = CountAggregator(
            group_by_column=self.config.group_by[0],
            count_function=len,
        )
    else:
        raise ValueError(
            "Unknown action. Please open an issue on GitHub to request such feature."
        )

    enricher_class = ENRICHER_REGISTRY[self.config.enricher_type]
    self._instance = enricher_class(
        aggregator=aggregator,
        output_column=self.config.enricher_config["output_column"],
        config=self.config,
    )
    if self._preview:
        self.preview(format=self._preview["format"])
    return self._instance

preview(format='ascii')

Show a preview of the configured enricher.

Displays a sneak peek of the enricher setup in the chosen format.

Parameters:

Name Type Description Default
format str

Preview format—"ascii" (text) or "json" (dict).

'ascii'

Returns:

Type Description
Union[None, str, dict]

None for "ascii" (prints to console), dict for "json".

Raises:

Type Description
ValueError

If format isn’t supported.

Examples:

>>> import urban_mapper as um
>>> mapper = um.UrbanMapper()
>>> enricher = mapper.enricher            ...     .with_data(group_by="pickup")            ...     .count_by()            ...     .build()
>>> enricher.preview()
Source code in src/urban_mapper/modules/enricher/enricher_factory.py
def preview(self, format: str = "ascii") -> Union[None, str, dict]:
    """Show a preview of the configured enricher.

    Displays a sneak peek of the enricher setup in the chosen format.

    Args:
        format: Preview format—"ascii" (text) or "json" (dict).

    Returns:
        None for "ascii" (prints to console), dict for "json".

    Raises:
        ValueError: If format isn’t supported.

    Examples:
        >>> import urban_mapper as um
        >>> mapper = um.UrbanMapper()
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="pickup")\
        ...     .count_by()\
        ...     .build()
        >>> enricher.preview()
    """
    if self._instance is None:
        print("No Enricher instance available to preview.")
        return None
    if hasattr(self._instance, "preview"):
        preview_data = self._instance.preview(format=format)
        if format == "ascii":
            print(preview_data)
        elif format == "json":
            return preview_data
        else:
            raise ValueError(f"Unsupported format '{format}'.")
    else:
        print("Preview not supported for this Enricher instance.")
    return None

BaseAggregator

Bases: ABC

Base Class For Data Aggregators.

Where is that used?

Note the following are used throughout the Enrichers, e.g SingleAggregatorEnricher. This means, not to use this directly, but to explore when needed for advanced configuration throughout the enricher's primitive chosen.

Defines the interface for aggregator implementations, which crunch stats on grouped data. Aggregators take input data, group it by a column, apply a function, and yields out the results.

To Implement

All concrete aggregators must inherit from this and implement _aggregate.

Examples:

>>> import urban_mapper as um
>>> import pandas as pd
>>> mapper = um.UrbanMapper()
>>> data = pd.DataFrame({
...     "hood": ["A", "A", "B", "B"],
...     "value": [10, 20, 15, 25]
... })
>>> enricher = mapper.enricher        ...     .with_data(group_by="hood", values_from="value")        ...     .aggregate_by(method="mean", output_column="avg_value")        ...     .build()
Source code in src/urban_mapper/modules/enricher/aggregator/abc_aggregator.py
@beartype
class BaseAggregator(ABC):
    """Base Class For Data Aggregators.

    !!! question "Where is that used?"
        Note the following are used throughout the Enrichers, e.g
        `SingleAggregatorEnricher`. This means, not to use this directly,
        but to explore when needed for advanced configuration throughout
        the enricher's primitive chosen.

    Defines the interface for aggregator implementations, which crunch stats on
    grouped data. Aggregators take `input data`, `group it` by a `column`, `apply` a `function`,
    and `yields out the results`.

    !!! note "To Implement"
        All concrete aggregators must inherit from this and
        implement `_aggregate`.

    Examples:
        >>> import urban_mapper as um
        >>> import pandas as pd
        >>> mapper = um.UrbanMapper()
        >>> data = pd.DataFrame({
        ...     "hood": ["A", "A", "B", "B"],
        ...     "value": [10, 20, 15, 25]
        ... })
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="hood", values_from="value")\
        ...     .aggregate_by(method="mean", output_column="avg_value")\
        ...     .build()
    """

    @abstractmethod
    def _aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
        """Perform the aggregation on the input DataFrame.

        Core method for subclasses to override with specific aggregation logic.

        Args:
            input_dataframe: DataFrame to aggregate.

        Returns:
            DataFrame with at least a 'value' column of aggregated results and
            an 'indices' column of original row indices per group.
        """
        ...

    @require_arguments_not_none(
        "input_dataframe", error_msg="No input dataframe provided.", check_empty=True
    )
    def aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
        """Aggregate the input DataFrame.

        Public method to kick off aggregation, validating input before delegating
        to `_aggregate`.

        Args:
            input_dataframe: DataFrame to aggregate. Mustn’t be None or empty.

        Returns:
            DataFrame with aggregation results.

        Raises:
            ValueError: If input_dataframe is None or empty.
        """
        return self._aggregate(input_dataframe)

_aggregate(input_dataframe) abstractmethod

Perform the aggregation on the input DataFrame.

Core method for subclasses to override with specific aggregation logic.

Parameters:

Name Type Description Default
input_dataframe DataFrame

DataFrame to aggregate.

required

Returns:

Type Description
DataFrame

DataFrame with at least a 'value' column of aggregated results and

DataFrame

an 'indices' column of original row indices per group.

Source code in src/urban_mapper/modules/enricher/aggregator/abc_aggregator.py
@abstractmethod
def _aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
    """Perform the aggregation on the input DataFrame.

    Core method for subclasses to override with specific aggregation logic.

    Args:
        input_dataframe: DataFrame to aggregate.

    Returns:
        DataFrame with at least a 'value' column of aggregated results and
        an 'indices' column of original row indices per group.
    """
    ...

aggregate(input_dataframe)

Aggregate the input DataFrame.

Public method to kick off aggregation, validating input before delegating to _aggregate.

Parameters:

Name Type Description Default
input_dataframe DataFrame

DataFrame to aggregate. Mustn’t be None or empty.

required

Returns:

Type Description
DataFrame

DataFrame with aggregation results.

Raises:

Type Description
ValueError

If input_dataframe is None or empty.

Source code in src/urban_mapper/modules/enricher/aggregator/abc_aggregator.py
@require_arguments_not_none(
    "input_dataframe", error_msg="No input dataframe provided.", check_empty=True
)
def aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
    """Aggregate the input DataFrame.

    Public method to kick off aggregation, validating input before delegating
    to `_aggregate`.

    Args:
        input_dataframe: DataFrame to aggregate. Mustn’t be None or empty.

    Returns:
        DataFrame with aggregation results.

    Raises:
        ValueError: If input_dataframe is None or empty.
    """
    return self._aggregate(input_dataframe)

Enricher Aggregators Functions For Faster Perusal

In a Nutshell, How To Read That

An aggregation function is name followed by a function that takes a list of values and returns a single value. The ones below are the common we deliver, utilising mainly Pandas.

AGGREGATION_FUNCTIONS = {'mean': pd.Series.mean, 'sum': pd.Series.sum, 'median': pd.Series.median, 'min': pd.Series.min, 'max': pd.Series.max} module-attribute

SimpleAggregator

Bases: BaseAggregator

Aggregator For Standard Stats On Numeric Data.

Applies stats functions (e.g., mean, sum) to values in a column, grouped by another.

Useful for

Useful for scenarios like average height per district or total population per area.

Supports predefined functions in AGGREGATION_FUNCTIONS or custom ones.

How to Use Custom Functions

Simply pass you own function receiving a series as parameter per the aggregation_function argument. Within the factory it'll be throughout aggregate_by(.) and method argument.

Attributes:

Name Type Description
group_by_column

Column to group by.

value_column

Column with values to aggregate.

aggregation_function

Function to apply to grouped values.

Examples:

>>> import urban_mapper as um
>>> import pandas as pd
>>> mapper = um.UrbanMapper()
>>> data = pd.DataFrame({
...     "district": ["A", "A", "B"],
...     "height": [10, 15, 20]
... })
>>> enricher = mapper.enricher        ...     .with_data(group_by="district", values_from="height")        ...     .aggregate_by(method="mean", output_column="avg_height")        ...     .build()
Source code in src/urban_mapper/modules/enricher/aggregator/aggregators/simple_aggregator.py
@beartype
class SimpleAggregator(BaseAggregator):
    """Aggregator For Standard Stats On Numeric Data.

    Applies stats functions (e.g., `mean`, `sum`) to `values` in a `column`, grouped by another.

    !!! tip "Useful for"
        Useful for scenarios like `average height` per district or `total population` per area.

    Supports predefined functions in `AGGREGATION_FUNCTIONS` or custom ones.

    !!! question "How to Use Custom Functions"
        Simply pass you own function receiving a series as parameter per the `aggregation_function` argument.
        Within the factory it'll be throughout `aggregate_by(.)` and `method` argument.

    Attributes:
        group_by_column: Column to group by.
        value_column: Column with values to aggregate.
        aggregation_function: Function to apply to grouped values.

    Examples:
        >>> import urban_mapper as um
        >>> import pandas as pd
        >>> mapper = um.UrbanMapper()
        >>> data = pd.DataFrame({
        ...     "district": ["A", "A", "B"],
        ...     "height": [10, 15, 20]
        ... })
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="district", values_from="height")\
        ...     .aggregate_by(method="mean", output_column="avg_height")\
        ...     .build()
    """

    def __init__(
        self,
        group_by_column: str,
        value_column: str,
        aggregation_function: Callable[[pd.Series], float],
    ) -> None:
        self.group_by_column = group_by_column
        self.value_column = value_column
        self.aggregation_function = aggregation_function

    def _aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
        """Aggregate data with the aggregation function.

        `Groups the DataFrame`, applies the function to `value_column`, and returns results.

        Args:
            input_dataframe: DataFrame with `group_by_column` and `value_column`.

        Returns:
            DataFrame with 'value' (aggregated values) and 'indices' (row indices).

        Raises:
            KeyError: If required columns are missing.
        """
        grouped = input_dataframe.groupby(self.group_by_column)
        aggregated = grouped[self.value_column].agg(self.aggregation_function)
        indices = grouped.apply(lambda g: list(g.index))
        return pd.DataFrame({"value": aggregated, "indices": indices})

_aggregate(input_dataframe)

Aggregate data with the aggregation function.

Groups the DataFrame, applies the function to value_column, and returns results.

Parameters:

Name Type Description Default
input_dataframe DataFrame

DataFrame with group_by_column and value_column.

required

Returns:

Type Description
DataFrame

DataFrame with 'value' (aggregated values) and 'indices' (row indices).

Raises:

Type Description
KeyError

If required columns are missing.

Source code in src/urban_mapper/modules/enricher/aggregator/aggregators/simple_aggregator.py
def _aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
    """Aggregate data with the aggregation function.

    `Groups the DataFrame`, applies the function to `value_column`, and returns results.

    Args:
        input_dataframe: DataFrame with `group_by_column` and `value_column`.

    Returns:
        DataFrame with 'value' (aggregated values) and 'indices' (row indices).

    Raises:
        KeyError: If required columns are missing.
    """
    grouped = input_dataframe.groupby(self.group_by_column)
    aggregated = grouped[self.value_column].agg(self.aggregation_function)
    indices = grouped.apply(lambda g: list(g.index))
    return pd.DataFrame({"value": aggregated, "indices": indices})

CountAggregator

Bases: BaseAggregator

Aggregator For Counting Records In Groups.

Counts records per group, with an optional custom counting function. By default, it uses len() to count all records, but you can tweak it to count specific cases, see below.

Useful for

  • Counting taxi pickups per area
  • Tallying incidents per junction
  • Totting up points of interest per district

Attributes:

Name Type Description
group_by_column

Column to group data by.

count_function

Function to count records in each group (defaults to len).

Examples:

>>> import urban_mapper as um
>>> import pandas as pd
>>> mapper = um.UrbanMapper()
>>> data = pd.DataFrame({
...     "junction": ["A", "A", "B", "B", "C"],
...     "type": ["minor", "major", "minor", "major", "minor"]
... })
>>> enricher = mapper.enricher        ...     .with_data(group_by="junction")        ...     .count_by(output_column="incident_count")        ...     .build()
Source code in src/urban_mapper/modules/enricher/aggregator/aggregators/count_aggregator.py
@beartype
class CountAggregator(BaseAggregator):
    """Aggregator For Counting Records In Groups.

    Counts records per group, with an optional custom counting function. By default,
    it uses `len()` to count all records, but you can tweak it to count specific cases, see below.

    !!! tip "Useful for"

        - [x] Counting taxi pickups per area
        - [x] Tallying incidents per junction
        - [x] Totting up points of interest per district

    Attributes:
        group_by_column: Column to group data by.
        count_function: Function to count records in each group (defaults to len).

    Examples:
        >>> import urban_mapper as um
        >>> import pandas as pd
        >>> mapper = um.UrbanMapper()
        >>> data = pd.DataFrame({
        ...     "junction": ["A", "A", "B", "B", "C"],
        ...     "type": ["minor", "major", "minor", "major", "minor"]
        ... })
        >>> enricher = mapper.enricher\
        ...     .with_data(group_by="junction")\
        ...     .count_by(output_column="incident_count")\
        ...     .build()
    """

    def __init__(
        self,
        group_by_column: str,
        count_function: Callable[[pd.DataFrame], Any] = len,
    ) -> None:
        self.group_by_column = group_by_column
        self.count_function = count_function

    @require_attribute_columns("input_dataframe", ["group_by_column"])
    def _aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
        """Count records per group using the count function.

        Groups the DataFrame by `group_by_column`, applies the count function,
        and returns a DataFrame with counts and indices.

        Args:
            input_dataframe: DataFrame to aggregate, must have `group_by_column`.

        Returns:
            DataFrame with 'value' (counts) and 'indices' (original row indices).

        Raises:
            ValueError: If required column is missing.
        """
        grouped = input_dataframe.groupby(self.group_by_column)
        values = grouped.apply(self.count_function)
        indices = grouped.apply(lambda g: list(g.index))
        return pd.DataFrame({"value": values, "indices": indices})

_aggregate(input_dataframe)

Count records per group using the count function.

Groups the DataFrame by group_by_column, applies the count function, and returns a DataFrame with counts and indices.

Parameters:

Name Type Description Default
input_dataframe DataFrame

DataFrame to aggregate, must have group_by_column.

required

Returns:

Type Description
DataFrame

DataFrame with 'value' (counts) and 'indices' (original row indices).

Raises:

Type Description
ValueError

If required column is missing.

Source code in src/urban_mapper/modules/enricher/aggregator/aggregators/count_aggregator.py
@require_attribute_columns("input_dataframe", ["group_by_column"])
def _aggregate(self, input_dataframe: pd.DataFrame) -> pd.DataFrame:
    """Count records per group using the count function.

    Groups the DataFrame by `group_by_column`, applies the count function,
    and returns a DataFrame with counts and indices.

    Args:
        input_dataframe: DataFrame to aggregate, must have `group_by_column`.

    Returns:
        DataFrame with 'value' (counts) and 'indices' (original row indices).

    Raises:
        ValueError: If required column is missing.
    """
    grouped = input_dataframe.groupby(self.group_by_column)
    values = grouped.apply(self.count_function)
    indices = grouped.apply(lambda g: list(g.index))
    return pd.DataFrame({"value": values, "indices": indices})
Provost Simon