datasets

Busan Beach data

Loads the Antibiotic resitance genes (ARG) data from a recreational beach in Busan, South Korea along with environment variables.

The data is in the form of mutlivariate time series and was collected over the period of 2 years during several precipitation events. The frequency of environmental data is 30 mins while the ARG is discontinuous. The data and its pre-processing is described in detail in Jang et al., 2021

param inputs

features to use as input. By default all environmental data is used which consists of following parameters

  • tide_cm

  • wat_temp_c

  • sal_psu

  • air_temp_c

  • pcp_mm

  • pcp3_mm

  • pcp6_mm

  • pcp12_mm

  • wind_dir_deg

  • wind_speed_mps

  • air_p_hpa

  • mslp_hpa

  • rel_hum

type inputs

Optional[list]

param target

feature/features to use as target/output. By default tetx_coppml is used as target. Logically one or more from following can be considered as target

  • ecoli

  • 16s

  • inti1

  • Total_args

  • tetx_coppml

  • sul1_coppml

  • blaTEM_coppml

  • aac_coppml

  • Total_otus

  • otu_5575

  • otu_273

  • otu_94

type target

Union[list, str]

returns

with pandas.DateTimeIndex

rtype

a pandas dataframe with inputs and target and indexed

Example

>>> from ai4water.datasets import busan_beach
>>> dataframe = busan_beach()

Datasets

class ai4water.datasets.datasets.Datasets(name=None, units=None)[source]

Bases: object

Base class for datasets

Note

We don’t host datasets. Each dataset is downloaded fromt he target remote server and saved into local disk.

__init__(name=None, units=None)[source]
Parameters
  • name

  • units

property base_ds_dir

Base datasets directory

download_from_pangaea(overwrite=False)[source]
property ds_dir
property url

MtropicsLaos

class ai4water.datasets.datasets.MtropicsLaos(**kwargs)[source]

Bases: ai4water.datasets.datasets.Datasets

Downloads and prepares hydrological, climate and land use data for Laos from Mtropics website and ird data servers.

- fetch_lu
- fetch_ecoli
- fetch_rain_gauges
- fetch_weather_station_data
- fetch_pcp
- fetch_hydro
- make_regression
__init__(**kwargs)[source]
Parameters
  • name

  • units

fetch_ecoli(features: Union[list, str] = 'Ecoli_mpn100', st: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20110525 10:00:00', en: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20210406 15:05:00', remove_duplicates: bool = True) pandas.core.frame.DataFrame[source]

Fetches E. coli data collected at the outlet. See Ribolzi et al., 2021 and Boithias et al., 2021 for reference. NaNs represent missing values. The data is randomly sampled between 2011 to 2021 during rainfall events. Total 368 E. coli observation points are available now.

Parameters
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

  • features

    1. coli concentration data. Following data are available

    • Ecoli_LL_mpn100: Lower limit of the confidence interval

    • Ecoli_mpn100: Stream water Escherichia coli concentration

    • Ecoli_UL_mpn100: Upper limit of the confidence interval

  • remove_duplicates – whether to remove duplicates or not. This is because some values were recorded within a minute,

Return type

a pandas dataframe consisting of features as columns.

fetch_hydro(st: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20010101 00:06:00', en: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20200101 00:06:00') Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

fetches water level (cm) and suspended particulate matter (g L-1). Both data are from 2001 to 2019 but are randomly sampled.

Parameters
  • st (optional) – starting point of data to be fetched.

  • en (optional) – end point of data to be fetched.

Returns

  • a tuple of pandas dataframes of water level and suspended particulate

  • matter.

fetch_lu(processed=False)[source]

returns landuse data as list of shapefiles.

fetch_pcp(st: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20010101 00:06:00', en: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20200101 00:06:00', freq: str = '6min') pandas.core.frame.DataFrame[source]

Fetches the precipitation data which is collected at 6 minutes time-step from 2001 to 2020.

Parameters
  • st – starting point of data to be fetched.

  • en – end point of data to be fetched.

  • freq – frequency at which the data is to be returned.

Return type

pandas dataframe of precipitation data

fetch_physiochem(features: Union[list, str] = 'all', st: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20110525 10:00:00', en: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20210406 15:05:00') pandas.core.frame.DataFrame[source]

Fetches physio-chemical features of Huoy Pano catchment Laos.

Parameters
  • st – start of data.

  • en – end of data.

  • features

    The physio-chemical features to fetch. Following features are available

    • ’T’,

    • ’EC’,

    • ’DOpercent’,

    • ’DO’,

    • ’pH’,

    • ’ORP’,

    • ’Turbidity’,

    ’TSS’

Return type

a pandas dataframe

fetch_rain_gauges(st: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20010101', en: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20191231') pandas.core.frame.DataFrame[source]

fetches data from 7 rain gauges which is collected at daily time step from 2001 to 2019.

Parameters
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

Returns

  • a dataframe of 7 columns, where each column represnets a rain guage

  • observations. The length of dataframe depends upon range defined by

  • st and en arguments.

fetch_weather_station_data(st: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20010101 01:00:00', en: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = '20200101 00:00:00', freq: str = 'H') pandas.core.frame.DataFrame[source]

fetches hourly weather station data which consits of air temperature, humidity, wind speed and solar radiation.

Parameters
  • st – start of data to be feteched.

  • en – end of data to be fetched.

  • freq – frequency at which the data is to be fetched.

Return type

a pandas dataframe consisting of 4 columns

inputs = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad', 'water_level', 'pcp', 'susp_pm']
make_classification(input_features: Union[None, list] = None, output_features: Optional[Union[str, list]] = None, st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', threshold: Union[int, dict] = 400, lookback_steps: Optional[int] = None) pandas.core.frame.DataFrame[source]

Makes a classification problem.

Parameters
  • input_features – names of inputs to use.

  • output_features – feature/features to consdier as target/output/label

  • st – starting date of data

  • en – end date of data

  • freq – frequency of data

  • threshold – threshold to use to determine classes. Values greater than equal to threshold are set to 1 while values smaller than threshold are set to 0. The value of 400 is chosen for E. coli to make the the number 0s and 1s balanced. It should be noted that US-EPA recommends threshold value of 400 cfu/ml.

  • lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

Returns

a dataframe of shape `(inputs+target, st

Return type

en)`

Example

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> df = laos.make_classification()
make_regression(input_features: Union[None, list] = None, output_features: Union[str, list] = 'Ecoli_mpn100', st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', lookback_steps: Optional[int] = None) pandas.core.frame.DataFrame[source]

Makes a regression problem using hydrological, environmental, and water quality data of Huoay pano.

Parameters
  • input_features

    names of inputs to use. By default following features are used as input

    • ``air_temp`

    • rel_hum

    • ``wind_speed`

    • ``sol_rad`

    • ``water_level`

    • pcp

    • ``susp_pm`’

  • output_features (feature/features to consdier as target/output/label) –

  • st – starting date of data

  • en – end date of data

  • freq (frequency of data) –

  • lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

Return type

a dataframe of shape (inputs+target, st - en)

Example

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> ins = ['pcp', 'air_temp']
>>> out = ['Ecoli_mpn100']
>>> reg_data = laos.make_regression(ins, out, '20110101', '20181231')

todo add HRU definition

physio_chem_features = {'DO_mgl': 'DO', 'DO_percent': 'DOpercent', 'EC_s/cm': 'EC', 'ORP_mV': 'ORP', 'TSS_gL': 'TSS', 'T_deg': 'T', 'Turbidity_NTU': 'Turbidity', 'pH': 'pH'}
target = ['Ecoli_mpn100']
url = {'ecoli_data.csv': 'https://dataverse.ird.fr/api/access/datafile/5435', 'ecoli_dict.csv': 'https://dataverse.ird.fr/api/access/datafile/5436', 'hydro.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=389bbea0-7279-12c1-63d0-cfc4a77ded87', 'lu.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=0f1aea48-2a51-9b42-7688-a774a8f75e7a', 'pcp.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=3c870a03-324b-140d-7d98-d3585a63e6ec', 'rain_guage.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=7bc45591-5b9f-a13d-90dc-f2a75b0a15cc', 'soilmap.zip': 'https://dataverse.ird.fr/api/access/datafile/5430', 'subs1.zip': 'https://dataverse.ird.fr/api/access/datafile/5432', 'weather_station.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=353d7f00-8d6a-2a34-c0a2-5903c64e800b'}
weather_station_data = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad']

Camels

class ai4water.datasets.camels.Camels(name=None, units=None)[source]

Bases: ai4water.datasets.datasets.Datasets

Get CAMELS dataset. This class first downloads the CAMELS dataset if it is not already downloaded. Then the selected attribute for a selected id are fetched and provided to the user using the method fetch.

- ds_dir str/path
Type

diretory of the dataset

- dynamic_features list

this dataset

Type

tells which dynamic attributes are available in

- static_features list
Type

a list of static attributes.

- static_attribute_categories list

are present in this category.

Type

tells which kinds of static attributes

- stations : returns name/id of stations for which the data (dynamic attributes)

exists as list of strings.

- fetch : fetches all attributes (both static and dynamic type) of all

station/gauge_ids or a speficified station. It can also be used to fetch all attributes of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.

- fetch_dynamic_features :

fetches speficied dynamic attributes of one specified station. If the dynamic attribute is not specified, all dynamic attributes will be fetched for the specified station. If station is not specified, the specified dynamic attributes will be fetched for all stations.

- fetch_static_features :

works same as fetch_dynamic_features but for static attributes. Here if the category is not specified then static attributes of the specified station for all categories are returned.

stations : returns list of stations

__init__(name=None, units=None)
Parameters
  • name

  • units

DATASETS = {'CAMELS-BR': {'url': 'https://zenodo.org/record/3964745#.YA6rUxZS-Uk'}, 'CAMELS-GB': {'url': <function gb_message>}}
property camels_dir

Directory where all camels datasets will be saved. This will under datasets directory

property ds_dir

Directory where a particular dataset will be saved.

property dynamic_features: list
property end
fetch(stations: Optional[Union[str, list, int, float]] = None, dynamic_features: Optional[Union[list, str]] = 'all', static_features: Optional[Union[str, list]] = None, st: Union[None, str] = None, en: Union[None, str] = None, as_dataframe: bool = False, **kwargs) Union[dict, pandas.core.frame.DataFrame][source]

Fetches the attributes of one or more stations.

Parameters
  • stations – if string, it is supposed to be a station name/gauge_id. If list, it will be a list of station/gauge_ids. If int, it will be supposed that the user want data for this number of stations/gauge_ids. If None (default), then attributes of all available stations. If float, it will be supposed that the user wants data of this fraction of stations.

  • dynamic_features – If not None, then it is the attributes to be fetched. If None, then all available attributes are fetched

  • static_features – list of static attributes to be fetches. None means no static attribute will be fetched.

  • st – starting date of data to be returned. If None, the data will be returned from where it is available.

  • en – end date of data to be returned. If None, then the data will be returned till the date data is available.

  • as_dataframe – whether to return dynamic attributes as pandas dataframe or as xarray dataset.

  • kwargs – keyword arguments to read the files

Returns

If both static and dynamic features are obtained then it returns a dictionary whose keys are station/gauge_ids and values are the attributes and dataframes. Otherwise either dynamic or static features are returned.

Examples

>>> aus = CAMELS_AUS()
>>> # get data of 10% of stations
>>> df = aus.fetch(stations=0.1, static_features=None, as_dataframe=True)
fetch_dynamic_features(stn_id, attributes='all', st=None, en=None, as_dataframe=False)[source]

Fetches all or selected dynamic attributes of one station.

fetch_static_features(station, features)[source]
fetch_station_attributes(station: str, dynamic_features: Optional[Union[str, list]] = 'all', static_features: Optional[Union[str, list]] = None, as_ts: bool = False, st: Union[None, str] = None, en: Union[None, str] = None, **kwargs) pandas.core.frame.DataFrame[source]

Fetches attributes for one station.

Parameters
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features

  • static_features

  • as_ts – whether static attributes are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.

  • st – starting point from which the data to be fetched. By default the data will be fetched from where it is available.

  • en – end point of data to be fetched. By default the dat will be fetched

Return: DataFrame
dataframe if as_ts is True else it returns a dictionary of static and

dynamic attributes for a station/gauge_id

fetch_stations_attributes(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

Reads attributes of more than one stations. :param stations: list of stations for which data is to be fetched. :param dynamic_features: list of dynamic attributes to be fetched.

if ‘all’, then all dynamic attributes will be fetched.

Parameters
  • static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the data as pandas dataframe. default is xr.dataset object

  • dict (kwargs) – additional keyword arguments

Returns

Dynamic and static features of multiple stations. Dynamic features are by default returned as xr.Dataset unless as_dataframe is True, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists of data_vars equal to number of stations and for each station, the DataArray is of dimensions (time, dynamic_features). where time is defined by st and en i.e length of DataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have following shape (stations, static_features). If `dynamic_features is None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.

Raises

ValueError, if both dynamic_features and static_features are None

property start
stations()[source]
to_ts(static, st, en, as_ts=False, freq='D')[source]

CAMELS_AUS

class ai4water.datasets.camels.CAMELS_AUS(path: Optional[str] = None)[source]

Bases: ai4water.datasets.camels.Camels

Inherits from Camels class. Reads CAMELS-AUS dataset of [Fowler et al., 2020](https://doi.org/10.5194/essd-13-3847-2021) dataset.

Examples

>>> dataset = CAMELS_AUS()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
__init__(path: Optional[str] = None)[source]
Parameters

path – path where the CAMELS-AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will downloaded.

property dynamic_features: list
property end
fetch_static_features(station, features='all', **kwargs) pandas.core.frame.DataFrame[source]

Fetches static attribuets of one station as dataframe.

folders = {'et_morton_actual_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_actual_SILO', 'et_morton_point_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_point_SILO', 'et_morton_wet_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_wet_SILO', 'et_short_crop_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_short_crop_SILO', 'et_tall_crop_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_tall_crop_SILO', 'evap_morton_lake_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_morton_lake_SILO', 'evap_pan_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_pan_SILO', 'evap_syn_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_syn_SILO', 'mslp_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/mslp_SILO', 'precipitation_AWAP': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_AWAP', 'precipitation_SILO': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_SILO', 'precipitation_var_SWAP': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_var_AWAP', 'radiation_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/radiation_SILO', 'rh_tmax_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/rh_tmax_SILO', 'rh_tmin_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/rh_tmin_SILO', 'solarrad_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/solarrad_AWAP', 'streamflow_MLd': '03_streamflow/03_streamflow/streamflow_MLd', 'streamflow_MLd_inclInfilled': '03_streamflow/03_streamflow/streamflow_MLd_inclInfilled', 'streamflow_mmd': '03_streamflow/03_streamflow/streamflow_mmd', 'tmax_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/tmax_AWAP', 'tmax_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/tmax_SILO', 'tmin_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/tmin_AWAP', 'tmin_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/tmin_SILO', 'vp_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/vp_SILO', 'vp_deficit_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/vp_deficit_SILO', 'vprp_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/vprp_AWAP'}
property location
plot(what, stations=None, **kwargs)[source]
property start
property static_attribute_categories
property static_features: list
stations(as_list=True) list[source]
url = 'https://doi.pangaea.de/10.1594/PANGAEA.921850'
urls = {'01_id_name_metadata.zip': 'https://download.pangaea.de/dataset/921850/files/', '02_location_boundary_area.zip': 'https://download.pangaea.de/dataset/921850/files/', '03_streamflow.zip': 'https://download.pangaea.de/dataset/921850/files/', '04_attributes.zip': 'https://download.pangaea.de/dataset/921850/files/', '05_hydrometeorology.zip': 'https://download.pangaea.de/dataset/921850/files/', 'CAMELS_AUS_Attributes-Indices_MasterTable.csv': 'https://download.pangaea.de/dataset/921850/files/', 'Units_01_TimeseriesData.pdf': 'https://download.pangaea.de/dataset/921850/files/', 'Units_02_AttributeMasterTable.pdf': 'https://download.pangaea.de/dataset/921850/files/'}

LamaH

class ai4water.datasets.camels.LamaH(*, time_step: str, data_type: str, **kwargs)[source]

Bases: ai4water.datasets.camels.Camels

Large-Sample Data for Hydrology and Environmental Sciences for Central Europe from url = “https://zenodo.org/record/4609826#.YFNp59zt02w” paper: https://essd.copernicus.org/preprints/essd-2021-72/

__init__(*, time_step: str, data_type: str, **kwargs)[source]
Parameters
  • time_step – possible values are daily or hourly

  • data_type – possible values are total_upstrm, diff_upstrm_all or ‘diff_upstrm_lowimp’

property data_type_dir
property ds_dir

Directory where a particular dataset will be saved.

property dynamic_features
property end
fetch_static_features(station: Union[str, list], features=None) pandas.core.frame.DataFrame[source]
read_ts_of_station(station) pandas.core.frame.DataFrame[source]
property start
static_attribute_categories = ['']
property static_features: list
stations() list[source]
time_steps = ['daily', 'hourly']
url = 'https://zenodo.org/record/4609826#.YFNp59zt02w'

CAMELS_GB

class ai4water.datasets.camels.CAMELS_GB(path=None)[source]

Bases: ai4water.datasets.camels.Camels

This dataset must be manually downloaded by the user. The path of the downloaded folder must be provided while initiating this class.

__init__(path=None)[source]
Parameters
  • name

  • units

property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['precipitation', 'pet', 'temperature', 'discharge_spec', 'discharge_vol', 'peti', 'humidity', 'shortwave_rad', 'longwave_rad', 'windspeed']
property end
fetch_static_features(station: str, features='all') pandas.core.frame.DataFrame[source]

Fetches static attributes of one station for one or more category as dataframe.

property start
property static_attribute_categories: list
property static_features
stations(to_exclude=None)[source]

CAMELS_BR

class ai4water.datasets.camels.CAMELS_BR[source]

Bases: ai4water.datasets.camels.Camels

Downloads and processes CAMELS dataset of Brazil

__init__()[source]
Parameters
  • name

  • units

all_stations(attribute) list[source]

Tells all station ids for which a data of a specific attribute is available.

property ds_dir

Directory where a particular dataset will be saved.

property dynamic_features: list
property end
fetch_static_features(station, features=None) pandas.core.frame.DataFrame[source]

Arguments: stn_id int/list:

station id whose attribute to fetch

attributes str/list:

name of attribute to fetch. Default is None, which will return all the attributes for a particular station of the specified category.

index_col_name str:

name of column containing station names

as_ts bool:

Example

>>> dataset = Camels('CAMELS-BR')
>>> df = dataset.fetch_static_features(11500000, 'climate')
folders = {'evapotransp_gleam': '08_CAMELS_BR_evapotransp_gleam', 'evapotransp_mgb': '09_CAMELS_BR_evapotransp_mgb', 'potential_evapotransp_gleam': '10_CAMELS_BR_potential_evapotransp_gleam', 'precipitation_chirps': '05_CAMELS_BR_precipitation_chirps', 'precipitation_cpc': '07_CAMELS_BR_precipitation_cpc', 'precipitation_mswep': '06_CAMELS_BR_precipitation_mswep', 'simulated_streamflow_m3s': '04_CAMELS_BR_streamflow_simulated', 'streamflow_m3s': '02_CAMELS_BR_streamflow_m3s', 'streamflow_mm': '03_CAMELS_BR_streamflow_mm_selected_catchments', 'temperature_max': '13_CAMELS_BR_temperature_max_cpc', 'temperature_mean': '12_CAMELS_BR_temperature_mean_cpc', 'temperature_min': '11_CAMELS_BR_temperature_min_cpc'}
property start
property static_attribute_categories
property static_dir
property static_features
property static_files
stations(to_exclude=None) list[source]

Returns a list of station ids which are common among all dynamic attributes.

Example

>>> dataset = CAMELS_BR()
>>> stations = dataset.stations()
url = 'https://zenodo.org/record/3964745#.YA6rUxZS-Uk'

CAMELS_US

class ai4water.datasets.camels.CAMELS_US(data_source='basin_mean_daymet')[source]

Bases: ai4water.datasets.camels.Camels

Downloads and processes CAMELS dataset of 671 catchments named as CAMELS from https://ral.ucar.edu/solutions/products/camels https://doi.org/10.5194/hess-19-209-2015

__init__(data_source='basin_mean_daymet')[source]
Parameters
  • name

  • units

DATASETS = ['CAMELS_US']
catchment_attr_url = 'https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/camels_attributes_v2.0.zip'
property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['dayl(s)', 'prcp(mm/day)', 'srad(W/m2)', 'swe(mm)', 'tmax(C)', 'tmin(C)', 'vp(Pa)', 'Flow']
property end
fetch_static_features(station, features)[source]
folders = {'basin_mean_daymet': 'basin_mean_forcing/daymet', 'basin_mean_maurer': 'basin_mean_forcing/maurer', 'basin_mean_nldas': 'basin_mean_forcing/nldas', 'basin_mean_v1p15_daymet': 'basin_mean_forcing/v1p15/daymet', 'basin_mean_v1p15_nldas': 'basin_mean_forcing/v1p15/nldas', 'elev_bands': 'elev/daymet', 'hru': 'hru_forcing/daymet'}
property start
property static_features
stations() list[source]
url = 'https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip'

CAMELS_CL

class ai4water.datasets.camels.CAMELS_CL(path: Optional[str] = None)[source]

Bases: ai4water.datasets.camels.Camels

Downloads and processes CAMELS dataset of Chile https://doi.org/10.5194/hess-22-5817-2018

__init__(path: Optional[str] = None)[source]
Parameters
  • name

  • units

dynamic_features = ['streamflow_m3s', 'streamflow_mm', 'precip_cr2met', 'precip_chirps', 'precip_mswep', 'precip_tmpa', 'tmin_cr2met', 'tmax_cr2met', 'tmean_cr2met', 'pet_8d_modis', 'pet_hargreaves', 'swe']

Arguments: path: path where the CAMELS-AUS dataset has been downloaded. This path must

contain five zip files and one xlsx file.

property end
fetch_static_features(station, features=None, st=None, en=None)[source]
property start
property static_features: list
stations() list[source]

Tells all station ids for which a data of a specific attribute is available.

urls = {'10_CAMELScl_tmean_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '11_CAMELScl_pet_8d_modis.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '12_CAMELScl_pet_hargreaves.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '13_CAMELScl_swe.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '14_CAMELScl_catch_hierarchy.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '1_CAMELScl_attributes.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '2_CAMELScl_streamflow_m3s.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '3_CAMELScl_streamflow_mm.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '4_CAMELScl_precip_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '5_CAMELScl_precip_chirps.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '6_CAMELScl_precip_mswep.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '7_CAMELScl_precip_tmpa.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '8_CAMELScl_tmin_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '9_CAMELScl_tmax_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', 'CAMELScl_catchment_boundaries.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/'}

HYSETS

class ai4water.datasets.camels.HYSETS(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]

Bases: ai4water.datasets.camels.Camels

database for hydrometeorological modeling of 14,425 North American watersheds from 1950-2018 following the work of Arsenault et al., 2020 The user must manually download the files, unpack them and provide the path where these files are saved.

This data comes with multiple sources. Each source having one or more dynamic_features Following data_source are available.

SNODAS_SWE

dynamic_features

dscharge, swe

SCDNA

discharge, pr, tasmin, tasmax

nonQC_stations

discharge, pr, tasmin, tasmax

Livneh

discharge, pr, tasmin, tasmax

ERA5

discharge, pr, tasmax, tasmin

ERAS5Land_SWE

discharge, swe

ERA5Land

discharge, pr, tasmax, tasmin

all sources contain one or more following dynamic_features with following shapes

time

shape

(25202,)

watershedID

(14425,)

drainage_area

(14425,)

drainage_area_GSIM

(14425,)

flag_GSIM_boundaries

(14425,)

flag_artificial_boundaries

(14425,)

centroid_lat

(14425,)

centroid_lon

(14425,)

elevation

(14425,)

slope

(14425,)

discharge

(14425, 25202)

pr

(14425, 25202)

tasmax

(14425, 25202)

tasmin

(14425, 25202)

Examples

>>> dataset = HYSETS(path="path/to/HYSETS")
>>> df = dataset.fetch(0.01, as_dataframe=True) # 1% of stations
__init__(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]
Parameters
  • path – path where all the data files are saved.

  • swe_source – source of swe data.

  • discharge_source – source of discharge data

  • tasmin_source – source of tasmin data

  • tasmax_source – source of tasmax data

  • pr_source – source of pr data

  • kwargs – arguments for Camels base class

OTHER_SRC = ['ERA5', 'ERA5Land', 'Livneh', 'nonQC_stations', 'SCDNA']
Q_SRC = ['ERA5', 'ERA5Land', 'ERA5Land_SWE', 'Livneh', 'nonQC_stations', 'SCDNA', 'SNODAS_SWE']
SWE_SRC = ['ERA5Land_SWE', 'SNODAS_SWE']
doi = 'https://doi.org/10.1038/s41597-020-00583-2'
property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['discharge', 'swe', 'tasmin', 'tasmax', 'pr']
property end
fetch_dynamic_features(station, dynamic_features='all', st=None, en=None, as_dataframe=False)[source]

Fetches dynamic attributes of one station.

fetch_static_features(station, features='all', st=None, en=None, as_ts=False) pandas.core.frame.DataFrame[source]
fetch_stations_attributes(stations: list, dynamic_features: Optional[Union[str, list]] = 'all', static_features: Optional[Union[str, list]] = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

Reads attributes of more than one stations. :param stations: list of stations for which data is to be fetched. :param dynamic_features: list of dynamic attributes to be fetched.

if ‘all’, then all dynamic attributes will be fetched.

Parameters
  • static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the data as pandas dataframe. default is xr.dataset object

  • dict (kwargs) – additional keyword arguments

Returns

Dynamic and static features of multiple stations. Dynamic features are by default returned as xr.Dataset unless as_dataframe is True, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists of data_vars equal to number of stations and for each station, the DataArray is of dimensions (time, dynamic_features). where time is defined by st and en i.e length of DataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have following shape (stations, static_features). If `dynamic_features is None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.

Raises

ValueError, if both dynamic_features and static_features are None

read_static_data()[source]
property start
property static_features
stations() list[source]
url = 'https://osf.io/rpc3w/'

HYPE

class ai4water.datasets.camels.HYPE(time_step: str = 'daily', **kwargs)[source]

Bases: ai4water.datasets.camels.Camels

Downloads and preprocesses HYPE dataset from https://zenodo.org/record/4029572. This is a rainfall-runoff dataset of 564 stations from 1985 to 2019 at daily monthly and yearly time steps. paper : https://doi.org/10.2166/nh.2010.007

__init__(time_step: str = 'daily', **kwargs)[source]
Parameters
  • name

  • units

dynamic_features = ['AET_mm', 'Baseflow_mm', 'Infiltration_mm', 'SM_mm', 'Streamflow_mm', 'Runoff_mm', 'Qsim_m3-s', 'Prec_mm', 'PET_mm']
property end
fetch_static_features(station, features)[source]
property start
property static_features
stations() list[source]
url = ['https://zenodo.org/record/581435', 'https://zenodo.org/record/4029572']

Weisssee

class ai4water.datasets.datasets.Weisssee(name=None, units=None)[source]

Bases: ai4water.datasets.datasets.Datasets

__init__(name=None, units=None)
Parameters
  • name

  • units

dynamic_attributes = ['Precipitation_measurements', 'long_wave_upward_radiation', 'snow_density_at_30cm', 'long_wave_downward_radiation']
fetch(**kwargs)[source]
url = '10.1594/PANGAEA.898217'

WeatherJena

class ai4water.datasets.datasets.WeatherJena(obs_loc='roof')[source]

Bases: ai4water.datasets.datasets.Datasets

10 minute weather dataset of Jena, Germany hosted at https://www.bgc-jena.mpg.de/wetter/index.html from 2002 onwards.

__init__(obs_loc='roof')[source]

The ETP data is collected at three different locations i.e. roof, soil and saale(hall). :param obs_loc str: location of observation.

fetch(st: Optional[str] = None, en: Optional[str] = None) pandas.core.frame.DataFrame[source]

Fetches the time series data between given period as pandas dataframe. :param st: start of data to be fetched. If None, the data from start will

be retuned.

Parameters

en – end of data to be fetched. If None, the data from till end be retuned.

Returns

a pandas dataframe.

url = 'https://www.bgc-jena.mpg.de/wetter/weather_data.html'

SWECanada

class ai4water.datasets.datasets.SWECanada(**kwargs)[source]

Bases: ai4water.datasets.datasets.Datasets

Daily Canadian historical Snow Water Equivalent dataset from 1928 to 2020 https://doi.org/10.1080/07055900.2019.1598843

__init__(**kwargs)[source]
Parameters
  • name

  • units

property end
feaures = ['snw', 'snd', 'den']
fetch(station_id: Optional[Union[str, list, int, float]] = None, features: Optional[Union[str, list]] = None, q_flags: Optional[Union[str, list]] = None, st=None, en=None) dict[source]

Fetches time series data from selected stations. :param station_id: station/stations to be retrieved. In None, then data

from all stations will be returned.

Parameters
  • features

    Names of features to be retrieved. Following features are allowed:

    • ’snw’ snow water equivalent kg/m3

    • ’snd’ snow depth m

    • ’den’ snowpack bulk density kg/m3

    If None, then all three features will be retrieved.

  • q_flags

    If None, then no qflags will be returned. Following q_flag values are available.

    • ’data_flag_snw’

    • ’data_flag_snd’

    • ’qc_flag_snw’

    • ’qc_flag_snd’

  • st – start of data to be retrieved

  • en – end of data to be retrived.

Returns

en, features + q_flags) whose length is equal to length of stations being considered.

Return type

a dictionary of dataframes of shape (st

fetch_station_attributes(stn, features_to_fetch, st=None, en=None) pandas.core.frame.DataFrame[source]

fetches attributes of one station

q_flags = ['data_flag_snw', 'data_flag_snd', 'qc_flag_snw', 'qc_flag_snd']
property start
stations() list[source]
url = 'https://doi.org/10.5194/essd-2021-160'