Auctus Mixin¶
What is Auctus Mixin?
The Auctus
mixin is responsible to deliver access to Auctus Dataset Search API services via the UrbanMapper
workflow. It provides a set of methods to search for datasets, get dataset details, and download datasets.
A mixin, in this very instance, is nothing more than a class that connects external libraries for their use
directly adapted towards the UrbanMapper
workflow.
Documentation Under Alpha Construction
This documentation is in its early stages and still being developed. The API may therefore change, and some parts might be incomplete or inaccurate.
Use at your own risk, and please report anything that seems incorrect
/ outdated
you find.
auctus
¶
AuctusSearchMixin
¶
Bases: AuctusSearch
Mixin for searching, exploring, and loading datasets from the Auctus data discovery
service.
This mixin extends AuctusSearch
to provide a simplified interface for discovering
and working with datasets from the Auctus data discovery service
. It allows users
to search for relevant datasets, explore their metadata, and load them directly
into their urban data analysis workflows.
What is Auctus? What is Auctus Search?
Auctus
is a web crawler and search engine for datasets, specifically meant for data augmentation tasks in
machine learning. It is able to find datasets in different repositories and index them for later retrieval.
Auctus
paper's citation:
Sonia Castelo, Rémi Rampin, Aécio Santos, Aline Bessa, Fernando Chirigati, and Juliana Freire. 2021. Auctus: a dataset search engine for data discovery and augmentation. Proc. VLDB Endow. 14, 12 (July 2021), 2791–2794. https://doi.org/10.14778/3476311.3476346
Auctus
official website:
https://auctus.vida-nyu.org/
Find more in the Auctus GitHub repository.
–––
Auctus Search
on the other hand, is a wrapper of the great Auctus' API. Workable straightforwardly from
a Jupyter notebook's cell.
Find more in the Auctus Search GitHub Repository.
What is a mixin?
A mixin is a class that provides methods to other libraries' classes, but is not considered a base class itself. Consider this as helpers from external sources.
Examples:
>>> from urban_mapper import UrbanMapper
>>>
>>> # Initialise UrbanMapper
>>> mapper = UrbanMapper()
>>>
>>> # Search for datasets about NYC taxi trips
>>> results = mapper.auctus.explore_datasets_from_auctus(
... search_query="NYC taxi trips",
... display_initial_results=True
... )
>>>
>>> # Select a dataset from the results (interactive)
>>> # (This would be done through the UI that appears)
>>>
>>> # Load the selected dataset
>>> taxi_trips = mapper.auctus.load_dataset_from_auctus()
>>>
>>> # Profile the dataset to understand its characteristics
>>> mapper.auctus.profile_dataset_from_auctus()
Source code in src/urban_mapper/mixins/auctus.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
explore_datasets_from_auctus(search_query, page=1, size=10, display_initial_results=False)
¶
Search for datasets in the Auctus data discovery service
.
This method queries the Auctus data discovery service
for datasets matching
the provided search query. Results can be paginated and optionally displayed
immediately for quick inspection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_query
|
Union[str, List[str]]
|
Search query string or list of strings to find datasets. |
required |
page
|
int
|
Page number for paginated results. Defaults to 1. |
1
|
size
|
int
|
Number of results per page. Defaults to 10. |
10
|
display_initial_results
|
bool
|
Whether to automatically display search results. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
AuctusDatasetCollection |
AuctusDatasetCollection
|
An object containing the search results, which can be further explored or used to select a dataset. |
Examples:
>>> results = mapper.auctus.explore_datasets_from_auctus(
... search_query="NYC crashes",
... size=20,
... display_initial_results=True
... )
Source code in src/urban_mapper/mixins/auctus.py
load_dataset_from_auctus(display_table=True)
¶
Load the selected dataset from Auctus search
results.
This method loads the dataset that was selected after calling
explore_datasets_from_auctus()
. It can handle both tabular and geographic data,
returning a pandas DataFrame or geopandas GeoDataFrame accordingly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
display_table
|
bool
|
Whether to display a preview of the loaded data. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
Union[DataFrame, GeoDataFrame]
|
Union[pd.DataFrame, gpd.GeoDataFrame]: The loaded dataset, either as a pandas DataFrame or geopandas GeoDataFrame. |
Examples:
Source code in src/urban_mapper/mixins/auctus.py
profile_dataset_from_auctus()
¶
Generate and display a profile report for the selected Auctus dataset.
This method creates a comprehensive profile of the dataset loaded using
load_dataset_from_auctus()
. The profile includes statistics
, distributions
,
and insights into the dataset's characteristics, aiding in data understanding
and preparation for analysis.
Returns:
Name | Type | Description |
---|---|---|
None |
None
|
This method does not return anything but displays the profile report. |
Examples: