datasets

Busan Beach data

ai4water.datasets.busan_beach(inputs: Optional[list] = None, target: Union[list, str] = 'tetx_coppml') DataFrame[source]

Loads the Antibiotic resitance genes (ARG) data from a recreational beach in Busan, South Korea along with environment variables.

The data is in the form of mutlivariate time series and was collected over the period of 2 years during several precipitation events. The frequency of environmental data is 30 mins while that of ARG is discontinuous. The data and its pre-processing is described in detail in Jang et al., 2021

Parameters
  • inputs

    features to use as input. By default all environmental data is used which consists of following parameters

    • tide_cm

    • wat_temp_c

    • sal_psu

    • air_temp_c

    • pcp_mm

    • pcp3_mm

    • pcp6_mm

    • pcp12_mm

    • wind_dir_deg

    • wind_speed_mps

    • air_p_hpa

    • mslp_hpa

    • rel_hum

  • target

    feature/features to use as target/output. By default tetx_coppml is used as target. Logically one or more from following can be considered as target

    • ecoli

    • 16s

    • inti1

    • Total_args

    • tetx_coppml

    • sul1_coppml

    • blaTEM_coppml

    • aac_coppml

    • Total_otus

    • otu_5575

    • otu_273

    • otu_94

Returns

with pandas.DateTimeIndex

Return type

a pandas dataframe with inputs and target and indexed

Example

>>> from ai4water.datasets import busan_beach
>>> dataframe = busan_beach()

Ecoli Mekong River

  1. coli data from Mekong river (Houay Pano) area from 2011 to 2021 Boithias et al., 2022 [1]_.

param st

starting time. The default starting point is 2011-05-25 10:00:00

type st

optional

param en

end time, The default end point is 2021-05-25 15:41:00

type en

optional

param features

names of features to use. use all to get all features. By default following input features are selected

  • station_name name of station/catchment where the observation was made

  • T temperature

  • EC electrical conductance

  • DOpercent dissolved oxygen concentration

  • DO dissolved oxygen saturation

  • pH pH

  • ORP oxidation-reduction potential

  • Turbidity turbidity

  • TSS total suspended sediment concentration

  • E-coli_4dilutions Eschrechia coli concentration

type features

str, optional

param overwrite

whether to overwrite the downloaded file or not

type overwrite

bool

returns

with default parameters, the shape is (1602, 10)

rtype

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_mekong
>>> ecoli = ecoli_mekong()
1

https://essd.copernicus.org/preprints/essd-2021-440/

Ecoli Mekong River (Laos)

  1. coli data from Mekong river (Northern Laos).

param st

starting time

type st

Union[str, pandas._libs.tslibs.timestamps.Timestamp, int]

param en

end time

type en

Union[str, pandas._libs.tslibs.timestamps.Timestamp, int]

param station_name

type station_name

str

param features

type features

str, optional

param overwrite

whether to overwrite or not

type overwrite

bool

returns

with default parameters, the shape is (1131, 10)

rtype

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_mekong_laos
>>> ecoli = ecoli_mekong_laos()

Ecoli Houay Pano (Laos)

  1. coli data from Mekong river (Houay Pano) area.

param st

starting time. The default starting point is 2011-05-25 10:00:00

type st

optional

param en

end time, The default end point is 2021-05-25 15:41:00

type en

optional

param features

names of features to use. use all to get all features. By default following input features are selected

station_name name of station/catchment where the observation was made T temperature EC electrical conductance DOpercent dissolved oxygen concentration DO dissolved oxygen saturation pH pH ORP oxidation-reduction potential Turbidity turbidity TSS total suspended sediment concentration E-coli_4dilutions Eschrechia coli concentration

type features

str, optional

param overwrite

whether to overwrite the downloaded file or not

type overwrite

bool

returns

with default parameters, the shape is (413, 10)

rtype

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_houay_pano
>>> ecoli = ecoli_houay_pano()

Ecoli data from Mekong river (2016)

ai4water.datasets.mtropics.ecoli_mekong_2016(st: Union[str, Timestamp, int] = '20160101', en: Union[str, Timestamp, int] = '20161231', features: Optional[Union[str, list]] = None, overwrite=False) DataFrame[source]
  1. coli data from Mekong river from 2016 from 29 catchments

Parameters
  • st – starting time

  • en – end time

  • features (str, optional) – names of features to use. use all to get all features.

  • overwrite (bool) – whether to overwrite the downloaded file or not

Returns

with default parameters, the shape is (58, 10)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_mekong_2016
>>> ecoli = ecoli_mekong_2016()

Datasets

class ai4water.datasets.mtropics.Datasets(name=None, units=None)[source]

Bases: object

Base class for datasets

Note

We don’t host datasets. Each dataset is downloaded fromt he target remote server and saved into local disk.

__init__(name=None, units=None)[source]
Parameters
  • name

  • units

property base_ds_dir

Base datasets directory

download_from_pangaea(overwrite=False)[source]
property ds_dir
property url

MtropicsLaos

class ai4water.datasets.mtropics.MtropicsLaos(**kwargs)[source]

Bases: Datasets

Downloads and prepares hydrological, climate and land use data for Laos from Mtropics website and ird data servers.

- fetch_lu
- fetch_ecoli
- fetch_rain_gauges
- fetch_weather_station_data
- fetch_pcp
- fetch_hydro
- make_regression
__init__(**kwargs)[source]
Parameters
  • name

  • units

fetch_ecoli(features: Union[list, str] = 'Ecoli_mpn100', st: Union[str, Timestamp] = '20110525 10:00:00', en: Union[str, Timestamp] = '20210406 15:05:00', remove_duplicates: bool = True) DataFrame[source]

Fetches E. coli data collected at the outlet. See Ribolzi et al., 2021 and Boithias et al., 2021 for reference. NaNs represent missing values. The data is randomly sampled between 2011 to 2021 during rainfall events. Total 368 E. coli observation points are available now.

Parameters
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

  • features

    1. coli concentration data. Following data are available

    • Ecoli_LL_mpn100: Lower limit of the confidence interval

    • Ecoli_mpn100: Stream water Escherichia coli concentration

    • Ecoli_UL_mpn100: Upper limit of the confidence interval

  • remove_duplicates – whether to remove duplicates or not. This is because some values were recorded within a minute,

Return type

a pandas dataframe consisting of features as columns.

fetch_hydro(st: Union[str, Timestamp] = '20010101 00:06:00', en: Union[str, Timestamp] = '20200101 00:06:00') Tuple[DataFrame, DataFrame][source]

fetches water level (cm) and suspended particulate matter (g L-1). Both data are from 2001 to 2019 but are randomly sampled.

Parameters
  • st (optional) – starting point of data to be fetched.

  • en (optional) – end point of data to be fetched.

Returns

  • a tuple of pandas dataframes of water level and suspended particulate

  • matter.

fetch_lu(processed=False)[source]

returns landuse data as list of shapefiles.

fetch_pcp(st: Union[str, Timestamp] = '20010101 00:06:00', en: Union[str, Timestamp] = '20200101 00:06:00', freq: str = '6min') DataFrame[source]

Fetches the precipitation data which is collected at 6 minutes time-step from 2001 to 2020.

Parameters
  • st – starting point of data to be fetched.

  • en – end point of data to be fetched.

  • freq – frequency at which the data is to be returned.

Return type

pandas dataframe of precipitation data

fetch_physiochem(features: Union[list, str] = 'all', st: Union[str, Timestamp] = '20110525 10:00:00', en: Union[str, Timestamp] = '20210406 15:05:00') DataFrame[source]

Fetches physio-chemical features of Huoy Pano catchment Laos.

Parameters
  • st – start of data.

  • en – end of data.

  • features

    The physio-chemical features to fetch. Following features are available

    • ’T’,

    • ’EC’,

    • ’DOpercent’,

    • ’DO’,

    • ’pH’,

    • ’ORP’,

    • ’Turbidity’,

    • ’TSS’

Return type

a pandas dataframe

fetch_rain_gauges(st: Union[str, Timestamp] = '20010101', en: Union[str, Timestamp] = '20191231') DataFrame[source]

fetches data from 7 rain gauges which is collected at daily time step from 2001 to 2019.

Parameters
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

Returns

  • a dataframe of 7 columns, where each column represnets a rain guage

  • observations. The length of dataframe depends upon range defined by

  • st and en arguments.

Examples

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> rg = laos.fetch_rain_gauges()
fetch_suro() DataFrame[source]

returns surface runoff and soil detachment data from Houay pano, Laos PDR.

Returns

a dataframe of shape (293, 13)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> suro = laos.fetch_suro()
fetch_weather_station_data(st: Union[str, Timestamp] = '20010101 01:00:00', en: Union[str, Timestamp] = '20200101 00:00:00', freq: str = 'H') DataFrame[source]

fetches hourly weather [1]_ station data which consits of air temperature, humidity, wind speed and solar radiation.

Parameters
  • st – start of data to be feteched.

  • en – end of data to be fetched.

  • freq – frequency at which the data is to be fetched.

Return type

a pandas dataframe consisting of 4 columns

inputs = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad', 'water_level', 'pcp', 'susp_pm']
make_classification(input_features: Union[None, list] = None, output_features: Optional[Union[str, list]] = None, st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', threshold: Union[int, dict] = 400, lookback_steps: Optional[int] = None) DataFrame[source]

Makes a classification problem.

Parameters
  • input_features – names of inputs to use.

  • output_features – feature/features to consdier as target/output/label

  • st – starting date of data

  • en – end date of data

  • freq – frequency of data

  • threshold – threshold to use to determine classes. Values greater than equal to threshold are set to 1 while values smaller than threshold are set to 0. The value of 400 is chosen for E. coli to make the the number 0s and 1s balanced. It should be noted that US-EPA recommends threshold value of 400 cfu/ml.

  • lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

Returns

a dataframe of shape (inputs+target, st:en)

Return type

pd.DataFrame

Example

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> df = laos.make_classification()
make_regression(input_features: Union[None, list] = None, output_features: Union[str, list] = 'Ecoli_mpn100', st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', lookback_steps: Optional[int] = None) DataFrame[source]

Makes a regression problem using hydrological, environmental, and water quality data of Huoay pano.

Parameters
  • input_features

    names of inputs to use. By default following features are used as input

    • air_temp

    • rel_hum

    • wind_speed

    • sol_rad

    • water_level

    • pcp

    • susp_pm

  • output_features (feature/features to consdier as target/output/label) –

  • st – starting date of data

  • en – end date of data

  • freq (frequency of data) –

  • lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

Returns

a dataframe of shape (inputs+target, st - en)

Return type

pd.DataFrame

Example

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> ins = ['pcp', 'air_temp']
>>> out = ['Ecoli_mpn100']
>>> reg_data = laos.make_regression(ins, out, '20110101', '20181231')

todo add HRU definition

physio_chem_features = {'DO_mgl': 'DO', 'DO_percent': 'DOpercent', 'EC_s/cm': 'EC', 'ORP_mV': 'ORP', 'TSS_gL': 'TSS', 'T_deg': 'T', 'Turbidity_NTU': 'Turbidity', 'pH': 'pH'}
surface_features(st: Union[str, int, Timestamp] = '2000-10-14', en: Union[str, int, Timestamp] = '2016-11-12') DataFrame[source]

soil surface features data

target = ['Ecoli_mpn100']
url = {'ecoli_data.csv': 'https://dataverse.ird.fr/api/access/datafile/5435', 'ecoli_dict.csv': 'https://dataverse.ird.fr/api/access/datafile/5436', 'hydro.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=389bbea0-7279-12c1-63d0-cfc4a77ded87', 'lu.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=0f1aea48-2a51-9b42-7688-a774a8f75e7a', 'pcp.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=3c870a03-324b-140d-7d98-d3585a63e6ec', 'rain_guage.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=7bc45591-5b9f-a13d-90dc-f2a75b0a15cc', 'soilmap.zip': 'https://dataverse.ird.fr/api/access/datafile/5430', 'subs1.zip': 'https://dataverse.ird.fr/api/access/datafile/5432', 'surf_feat.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=72d9e532-8910-48d2-b9a2-6c8b0241825b', 'suro.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=f06cb605-7e59-4ba4-8faf-1beee35d2162', 'weather_station.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=353d7f00-8d6a-2a34-c0a2-5903c64e800b'}
weather_station_data = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad']

Camels

class ai4water.datasets.camels.Camels(name=None, units=None)[source]

Bases: Datasets

Get CAMELS dataset. This class first downloads the CAMELS dataset if it is not already downloaded. Then the selected attribute for a selected id are fetched and provided to the user using the method fetch.

- ds_dir str/path
Type

diretory of the dataset

- dynamic_features list

this dataset

Type

tells which dynamic attributes are available in

- static_features list
Type

a list of static attributes.

- static_attribute_categories list

are present in this category.

Type

tells which kinds of static attributes

- stations : returns name/id of stations for which the data (dynamic attributes)

exists as list of strings.

- fetch : fetches all attributes (both static and dynamic type) of all

station/gauge_ids or a speficified station. It can also be used to fetch all attributes of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.

- fetch_dynamic_features :

fetches speficied dynamic attributes of one specified station. If the dynamic attribute is not specified, all dynamic attributes will be fetched for the specified station. If station is not specified, the specified dynamic attributes will be fetched for all stations.

- fetch_static_features :

works same as fetch_dynamic_features but for static attributes. Here if the category is not specified then static attributes of the specified station for all categories are returned.

stations : returns list of stations

__init__(name=None, units=None)
Parameters
  • name

  • units

DATASETS = {'CAMELS-BR': {'url': 'https://zenodo.org/record/3964745#.YA6rUxZS-Uk'}, 'CAMELS-GB': {'url': <function gb_message>}}
property camels_dir

Directory where all camels datasets will be saved. This will under datasets directory

property ds_dir

Directory where a particular dataset will be saved.

property dynamic_features: list
property end
fetch(stations: Optional[Union[str, list, int, float]] = None, dynamic_features: Optional[Union[list, str]] = 'all', static_features: Optional[Union[str, list]] = None, st: Union[None, str] = None, en: Union[None, str] = None, as_dataframe: bool = False, **kwargs) Union[dict, DataFrame][source]

Fetches the attributes of one or more stations.

Parameters
  • stations – if string, it is supposed to be a station name/gauge_id. If list, it will be a list of station/gauge_ids. If int, it will be supposed that the user want data for this number of stations/gauge_ids. If None (default), then attributes of all available stations. If float, it will be supposed that the user wants data of this fraction of stations.

  • dynamic_features – If not None, then it is the attributes to be fetched. If None, then all available attributes are fetched

  • static_features – list of static attributes to be fetches. None means no static attribute will be fetched.

  • st – starting date of data to be returned. If None, the data will be returned from where it is available.

  • en – end date of data to be returned. If None, then the data will be returned till the date data is available.

  • as_dataframe – whether to return dynamic attributes as pandas dataframe or as xarray dataset.

  • kwargs – keyword arguments to read the files

Returns

If both static and dynamic features are obtained then it returns a dictionary whose keys are station/gauge_ids and values are the attributes and dataframes. Otherwise either dynamic or static features are returned.

Examples

>>> dataset = CAMELS_AUS()
>>> # get data of 10% of stations
>>> df = dataset.fetch(stations=0.1, as_dataframe=True)  # returns a multiindex dataframe
...  # fetch data of 5 (randomly selected) stations
>>> df = dataset.fetch(stations=5, as_dataframe=True)
... # fetch data of 3 selected stations
>>> df = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True)
... # fetch data of a single stations
>>> df = dataset.fetch(stations='318076', as_dataframe=True)
... # get both static and dynamic features as dictionary
>>> data = dataset.fetch(1, static_features="all", as_dataframe=True)  # -> dict
>>> data['dynamic']
... # get only selected dynamic features
>>> df = dataset.fetch(stations='318076',
...     dynamic_features=['streamflow_MLd', 'solarrad_AWAP'], as_dataframe=True)
... # fetch data between selected periods
>>> df = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
fetch_dynamic_features(stn_id: str, features='all', st=None, en=None, as_dataframe=False)[source]

Fetches all or selected dynamic attributes of one station.

Parameters
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_dynamic_features('224214A', as_dataframe=True).unstack()
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('224214A',
... attributes=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'],
... as_dataframe=True).unstack()
fetch_static_features(stn_id: Union[str, list], features: Optional[Union[str, list]] = None)[source]

Fetches all or selected static attributes of one station.

Parameters
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_static_features('224214A')
>>> camels.static_features
>>> camels.fetch_static_features('224214A',
... features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
fetch_station_attributes(station: str, dynamic_features: Optional[Union[str, list]] = 'all', static_features: Optional[Union[str, list]] = None, as_ts: bool = False, st: Union[None, str] = None, en: Union[None, str] = None, **kwargs) DataFrame[source]

Fetches attributes for one station.

Parameters
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch

  • static_features – names of static features/attributes to be fetches

  • as_ts (bool) – whether static attributes are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.

  • st (str,optional) – starting point from which the data to be fetched. By default the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns

dataframe if as_ts is True else it returns a dictionary of static and dynamic attributes for a station/gauge_id

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.fetch_station_attributes('912101A')
fetch_stations_attributes(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

Reads attributes of more than one stations.

Parameters
  • stations – list of stations for which data is to be fetched.

  • dynamic_features – list of dynamic attributes to be fetched. if ‘all’, then all dynamic attributes will be fetched.

  • static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the data as pandas dataframe. default is xr.dataset object

  • dict (kwargs) – additional keyword arguments

Returns

Dynamic and static features of multiple stations. Dynamic features are by default returned as xr.Dataset unless as_dataframe is True, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists of data_vars equal to number of stations and for each station, the DataArray is of dimensions (time, dynamic_features). where time is defined by st and en i.e length of DataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have following shape (stations, static_features). If `dynamic_features is None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.

Raises

ValueError, if both dynamic_features and static_features are None

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
... # find out station ids
>>> dataset.stations()
... # get data of selected stations
>>> dataset.fetch_stations_attributes(['912101A', '912105A', '915011A'], as_dataframe=True)
property start
stations()[source]
to_ts(static, st, en, as_ts=False, freq='D')[source]

CAMELS_AUS

class ai4water.datasets.camels.CAMELS_AUS(path: Optional[str] = None)[source]

Bases: Camels

Inherits from Camels class. Reads CAMELS-AUS dataset of Fowler et al., 2020 [1]_ dataset.

Examples

>>> dataset = CAMELS_AUS()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
... # get name of all stations as list
>>> dataset.stations()
... # get data by station id
>>> df = dataset.fetch(stations='224214A', as_dataframe=True).unstack()
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> dataset.fetch(1, as_dataframe=True,
...  dynamic_features=['tmax_AWAP', 'precipitation_AWAP', 'et_morton_actual_SILO', 'streamflow_MLd']).unstack()
.. # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
1

https://doi.org/10.5194/essd-13-3847-2021

__init__(path: Optional[str] = None)[source]
Parameters

path – path where the CAMELS-AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will downloaded.

property dynamic_features: list
property end
fetch_static_features(stn_id, features='all', **kwargs) DataFrame[source]

Fetches static attribuets of one station as dataframe.

folders = {'et_morton_actual_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_actual_SILO', 'et_morton_point_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_point_SILO', 'et_morton_wet_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_wet_SILO', 'et_short_crop_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_short_crop_SILO', 'et_tall_crop_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_tall_crop_SILO', 'evap_morton_lake_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_morton_lake_SILO', 'evap_pan_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_pan_SILO', 'evap_syn_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_syn_SILO', 'mslp_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/mslp_SILO', 'precipitation_AWAP': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_AWAP', 'precipitation_SILO': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_SILO', 'precipitation_var_SWAP': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_var_AWAP', 'radiation_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/radiation_SILO', 'rh_tmax_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/rh_tmax_SILO', 'rh_tmin_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/rh_tmin_SILO', 'solarrad_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/solarrad_AWAP', 'streamflow_MLd': '03_streamflow/03_streamflow/streamflow_MLd', 'streamflow_MLd_inclInfilled': '03_streamflow/03_streamflow/streamflow_MLd_inclInfilled', 'streamflow_mmd': '03_streamflow/03_streamflow/streamflow_mmd', 'tmax_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/tmax_AWAP', 'tmax_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/tmax_SILO', 'tmin_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/tmin_AWAP', 'tmin_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/tmin_SILO', 'vp_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/vp_SILO', 'vp_deficit_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/vp_deficit_SILO', 'vprp_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/vprp_AWAP'}
property location
plot(what, stations=None, **kwargs)[source]
property start
property static_attribute_categories
property static_features: list
stations(as_list=True) list[source]
url = 'https://doi.pangaea.de/10.1594/PANGAEA.921850'
urls = {'01_id_name_metadata.zip': 'https://download.pangaea.de/dataset/921850/files/', '02_location_boundary_area.zip': 'https://download.pangaea.de/dataset/921850/files/', '03_streamflow.zip': 'https://download.pangaea.de/dataset/921850/files/', '04_attributes.zip': 'https://download.pangaea.de/dataset/921850/files/', '05_hydrometeorology.zip': 'https://download.pangaea.de/dataset/921850/files/', 'CAMELS_AUS_Attributes-Indices_MasterTable.csv': 'https://download.pangaea.de/dataset/921850/files/', 'Units_01_TimeseriesData.pdf': 'https://download.pangaea.de/dataset/921850/files/', 'Units_02_AttributeMasterTable.pdf': 'https://download.pangaea.de/dataset/921850/files/'}

LamaH

class ai4water.datasets.camels.LamaH(*, time_step: str, data_type: str, **kwargs)[source]

Bases: Camels

Large-Sample Data for Hydrology and Environmental Sciences for Central Europe from url = “https://zenodo.org/record/4609826#.YFNp59zt02w” paper: https://essd.copernicus.org/preprints/essd-2021-72/

__init__(*, time_step: str, data_type: str, **kwargs)[source]
Parameters
  • time_step – possible values are daily or hourly

  • data_type – possible values are total_upstrm, diff_upstrm_all or diff_upstrm_lowimp

Examples

>>> from ai4water.datasets import LamaH
>>> dataset = LamaH(time_step='daily', data_type='total_upstrm')
>>> df = dataset.fetch(3, as_dataframe=True)
property data_type_dir
property ds_dir

Directory where a particular dataset will be saved.

property dynamic_features
property end
fetch_static_features(stn_id: Union[str, list], features=None) DataFrame[source]

static features of LamaH

Examples

>>> from ai4water.datasets import LamaH
>>> dataset = LamaH(time_step='daily', data_type='total_upstrm')
>>> df = dataset.fetch_static_features('99')  # (1, 61)
...  # get list of all static features
>>> dataset.static_features
>>> dataset.fetch_static_features('99',
>>> features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra'])  # (1, 4)
read_ts_of_station(station) DataFrame[source]
property start
static_attribute_categories = ['']
property static_features: list
stations() list[source]
time_steps = ['daily', 'hourly']
url = 'https://zenodo.org/record/4609826#.YFNp59zt02w'

CAMELS_GB

class ai4water.datasets.camels.CAMELS_GB(path=None)[source]

Bases: Camels

This dataset must be manually downloaded by the user. The path of the downloaded folder must be provided while initiating this class.

__init__(path=None)[source]
Parameters
  • name

  • units

property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['precipitation', 'pet', 'temperature', 'discharge_spec', 'discharge_vol', 'peti', 'humidity', 'shortwave_rad', 'longwave_rad', 'windspeed']
property end
fetch_static_features(stn_id: str, features='all') DataFrame[source]

Fetches static attributes of one station for one or more category as dataframe.

property start
property static_attribute_categories: list
property static_features
stations(to_exclude=None)[source]

CAMELS_BR

class ai4water.datasets.camels.CAMELS_BR[source]

Bases: Camels

Downloads and processes CAMELS dataset of Brazil

__init__()[source]
Parameters
  • name

  • units

all_stations(attribute) list[source]

Tells all station ids for which a data of a specific attribute is available.

property ds_dir

Directory where a particular dataset will be saved.

property dynamic_features: list
property end
fetch_static_features(stn_id, features=None) DataFrame[source]
Parameters
  • stn_id (int/list) – station id whose attribute to fetch

  • features (str/list) – name of attribute to fetch. Default is None, which will return all the attributes for a particular station of the specified category.

Example

>>> dataset = Camels('CAMELS-BR')
>>> df = dataset.fetch_static_features('11500000', 'climate')
folders = {'evapotransp_gleam': '08_CAMELS_BR_evapotransp_gleam', 'evapotransp_mgb': '09_CAMELS_BR_evapotransp_mgb', 'potential_evapotransp_gleam': '10_CAMELS_BR_potential_evapotransp_gleam', 'precipitation_chirps': '05_CAMELS_BR_precipitation_chirps', 'precipitation_cpc': '07_CAMELS_BR_precipitation_cpc', 'precipitation_mswep': '06_CAMELS_BR_precipitation_mswep', 'simulated_streamflow_m3s': '04_CAMELS_BR_streamflow_simulated', 'streamflow_m3s': '02_CAMELS_BR_streamflow_m3s', 'streamflow_mm': '03_CAMELS_BR_streamflow_mm_selected_catchments', 'temperature_max': '13_CAMELS_BR_temperature_max_cpc', 'temperature_mean': '12_CAMELS_BR_temperature_mean_cpc', 'temperature_min': '11_CAMELS_BR_temperature_min_cpc'}
property start
property static_attribute_categories
property static_dir
property static_features
property static_files
stations(to_exclude=None) list[source]

Returns a list of station ids which are common among all dynamic attributes.

Example

>>> dataset = CAMELS_BR()
>>> stations = dataset.stations()
url = 'https://zenodo.org/record/3964745#.YA6rUxZS-Uk'

CAMELS_US

class ai4water.datasets.camels.CAMELS_US(data_source='basin_mean_daymet')[source]

Bases: Camels

Downloads and processes CAMELS dataset of 671 catchments named as CAMELS from https://ral.ucar.edu/solutions/products/camels following Newman et al., 2015 [1]_

__init__(data_source='basin_mean_daymet')[source]
Parameters
  • name

  • units

DATASETS = ['CAMELS_US']
catchment_attr_url = 'https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/camels_attributes_v2.0.zip'
property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['dayl(s)', 'prcp(mm/day)', 'srad(W/m2)', 'swe(mm)', 'tmax(C)', 'tmin(C)', 'vp(Pa)', 'Flow']
property end
fetch_static_features(stn_id: Union[str, list], features: Optional[Union[str, list]] = None)[source]

Examples

>>> from ai4water.datasets import CAMELS_US
>>> camels = CAMELS_US()
>>> camels.fetch_static_features('11532500')
>>> camels.static_features
>>> camels.fetch_static_features('11528700',
>>> features=['area_gages2', 'geol_porostiy', 'soil_conductivity', 'elev_mean'])
folders = {'basin_mean_daymet': 'basin_mean_forcing/daymet', 'basin_mean_maurer': 'basin_mean_forcing/maurer', 'basin_mean_nldas': 'basin_mean_forcing/nldas', 'basin_mean_v1p15_daymet': 'basin_mean_forcing/v1p15/daymet', 'basin_mean_v1p15_nldas': 'basin_mean_forcing/v1p15/nldas', 'elev_bands': 'elev/daymet', 'hru': 'hru_forcing/daymet'}
property start
property static_features
stations() list[source]
url = 'https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip'

CAMELS_CL

class ai4water.datasets.camels.CAMELS_CL(path: Optional[str] = None)[source]

Bases: Camels

Downloads and processes CAMELS dataset of Chile https://doi.org/10.5194/hess-22-5817-2018

__init__(path: Optional[str] = None)[source]
Parameters

path – path where the CAMELS-CL dataset has been downloaded. This path must contain five zip files and one xlsx file.

dynamic_features = ['streamflow_m3s', 'streamflow_mm', 'precip_cr2met', 'precip_chirps', 'precip_mswep', 'precip_tmpa', 'tmin_cr2met', 'tmax_cr2met', 'tmean_cr2met', 'pet_8d_modis', 'pet_hargreaves', 'swe']
property end
fetch_static_features(stn_id, features=None)[source]

Examples

>>> from ai4water.datasets import CAMELS_CL
>>> camels = CAMELS_CL()
>>> camels.fetch_static_features('11315001')
>>> camels.static_features
>>> camels.fetch_static_features('2110002',
>>> features=['slope_mean', 'q_mean', 'elev_med', 'area'])
property start
property static_features: list
stations() list[source]

Tells all station ids for which a data of a specific attribute is available.

urls = {'10_CAMELScl_tmean_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '11_CAMELScl_pet_8d_modis.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '12_CAMELScl_pet_hargreaves.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '13_CAMELScl_swe.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '14_CAMELScl_catch_hierarchy.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '1_CAMELScl_attributes.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '2_CAMELScl_streamflow_m3s.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '3_CAMELScl_streamflow_mm.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '4_CAMELScl_precip_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '5_CAMELScl_precip_chirps.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '6_CAMELScl_precip_mswep.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '7_CAMELScl_precip_tmpa.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '8_CAMELScl_tmin_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '9_CAMELScl_tmax_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', 'CAMELScl_catchment_boundaries.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/'}

HYSETS

class ai4water.datasets.camels.HYSETS(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]

Bases: Camels

database for hydrometeorological modeling of 14,425 North American watersheds from 1950-2018 following the work of Arsenault et al., 2020 The user must manually download the files, unpack them and provide the path where these files are saved.

This data comes with multiple sources. Each source having one or more dynamic_features Following data_source are available.

SNODAS_SWE

dynamic_features

dscharge, swe

SCDNA

discharge, pr, tasmin, tasmax

nonQC_stations

discharge, pr, tasmin, tasmax

Livneh

discharge, pr, tasmin, tasmax

ERA5

discharge, pr, tasmax, tasmin

ERAS5Land_SWE

discharge, swe

ERA5Land

discharge, pr, tasmax, tasmin

all sources contain one or more following dynamic_features with following shapes

time

shape

(25202,)

watershedID

(14425,)

drainage_area

(14425,)

drainage_area_GSIM

(14425,)

flag_GSIM_boundaries

(14425,)

flag_artificial_boundaries

(14425,)

centroid_lat

(14425,)

centroid_lon

(14425,)

elevation

(14425,)

slope

(14425,)

discharge

(14425, 25202)

pr

(14425, 25202)

tasmax

(14425, 25202)

tasmin

(14425, 25202)

Examples

>>> dataset = HYSETS(path="path/to/HYSETS")
>>> df = dataset.fetch(0.01, as_dataframe=True) # 1% of stations
__init__(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]
Parameters
  • path – path where all the data files are saved.

  • swe_source – source of swe data.

  • discharge_source – source of discharge data

  • tasmin_source – source of tasmin data

  • tasmax_source – source of tasmax data

  • pr_source – source of pr data

  • kwargs – arguments for Camels base class

OTHER_SRC = ['ERA5', 'ERA5Land', 'Livneh', 'nonQC_stations', 'SCDNA']
Q_SRC = ['ERA5', 'ERA5Land', 'ERA5Land_SWE', 'Livneh', 'nonQC_stations', 'SCDNA', 'SNODAS_SWE']
SWE_SRC = ['ERA5Land_SWE', 'SNODAS_SWE']
doi = 'https://doi.org/10.1038/s41597-020-00583-2'
property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['discharge', 'swe', 'tasmin', 'tasmax', 'pr']
property end
fetch_dynamic_features(stn_id, features='all', st=None, en=None, as_dataframe=False)[source]

Fetches dynamic attributes of one station.

fetch_static_features(stn_id, features='all', st=None, en=None, as_ts=False) DataFrame[source]

returns static atttributes of a station

fetch_stations_attributes(stations: list, dynamic_features: Optional[Union[str, list]] = 'all', static_features: Optional[Union[str, list]] = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

returns attributes of multiple stations

read_static_data()[source]
property start
property static_features
stations() list[source]
Returns

a list of ids of stations

Return type

list

Examples

>>> dataset = CAMELS_AUS()
... # get name of all stations as list
>>> dataset.stations()
url = 'https://osf.io/rpc3w/'

HYPE

class ai4water.datasets.camels.HYPE(time_step: str = 'daily', **kwargs)[source]

Bases: Camels

Downloads and preprocesses HYPE [1]_ dataset from Lindstroem et al., 2010 2. This is a rainfall-runoff dataset of 564 stations from 1985 to 2019 at daily, monthly and yearly time steps.

Examples

>>> from ai4water.datasets import HYPE
>>> dataset = HYPE()
... # get data of 5% of stations
>>> df = dataset.fetch(stations=0.05, as_dataframe=True)  # returns a multiindex dataframe
... # fetch data of 5 (randomly selected) stations
>>> df = dataset.fetch(stations=5, as_dataframe=True)
# fetch data of 3 selected stations
>>> df = dataset.fetch(stations=['564','563','562'], as_dataframe=True)
... # fetch data of a single stations
>>> df = dataset.fetch(stations='500', as_dataframe=True)
...
# get only selected dynamic features
>>> df = dataset.fetch(stations='501',
...    dynamic_features=['AET_mm', 'Prec_mm',  'Streamflow_mm'], as_dataframe=True)
# fetch data between selected periods
>>> df = dataset.fetch(stations='225', st="20010101", en="20101231", as_dataframe=True)
... # get data at monthly time step
>>> dataset = HYPE(time_step="month")
>>> df = dataset.fetch(stations='500', as_dataframe=True)
1

https://zenodo.org/record/4029572

2

https://doi.org/10.2166/nh.2010.007

__init__(time_step: str = 'daily', **kwargs)[source]
Parameters
  • time_step (str) – one of daily, month or year

  • **kwargs – key word arguments

dynamic_features = ['AET_mm', 'Baseflow_mm', 'Infiltration_mm', 'SM_mm', 'Streamflow_mm', 'Runoff_mm', 'Qsim_m3-s', 'Prec_mm', 'PET_mm']
property end
fetch_static_features(stn_id, features=None)[source]

static data for HYPE is not available.

property start
property static_features
stations() list[source]
url = ['https://zenodo.org/record/581435', 'https://zenodo.org/record/4029572']

Weisssee

class ai4water.datasets.datasets.Weisssee(name=None, units=None)[source]

Bases: Datasets

__init__(name=None, units=None)
Parameters
  • name

  • units

dynamic_attributes = ['Precipitation_measurements', 'long_wave_upward_radiation', 'snow_density_at_30cm', 'long_wave_downward_radiation']
fetch(**kwargs)[source]

Examples

>>> from ai4water.datasets import Weisssee
>>> dataset = Weisssee()
>>> data = dataset.fetch()
url = '10.1594/PANGAEA.898217'

WeatherJena

class ai4water.datasets.datasets.WeatherJena(obs_loc='roof')[source]

Bases: Datasets

10 minute weather dataset of Jena, Germany hosted at https://www.bgc-jena.mpg.de/wetter/index.html from 2002 onwards.

__init__(obs_loc='roof')[source]

The ETP data is collected at three different locations i.e. roof, soil and saale(hall).

Parameters

obs_loc (str, optional (default=roof)) – location of observation.

property dynamic_features: list

returns names of features availabel

fetch(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

Fetches the time series data between given period as pandas dataframe.

Parameters
  • st (Optional) – start of data to be fetched. If None, the data from start (2003-01-01) will be retuned

  • en (Optional) – end of data to be fetched. If None, the data from till (2021-12-31) end be retuned.

Returns

a pandas dataframe of shape (972111, 21)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import WeatherJena
>>> dataset = WeatherJena()
>>> df = dataset.fetch()
... # get data between specific period
>>> df = dataset.fetch("20110101", "20201231")
url = 'https://www.bgc-jena.mpg.de/wetter/weather_data.html'

Quadica

class ai4water.datasets.datasets.Quadica(**kwargs)[source]

Bases: Datasets

water quality dataset following Pia Ebeling et al. 2022 [1]_ .

1

https://doi.org/10.5194/essd-2022-6

__init__(**kwargs)[source]
Parameters
  • name

  • units

annual_medians() DataFrame[source]

Annual medians over the whole time series of water quality variables and discharge

Returns

a dataframe of shape (24393, 18)

Return type

pd.DataFrame

fetch_annual()[source]
fetch_metadata() DataFrame[source]

fetches the metadata about the stations as dataframe. Each row represents metadata about one station and each column represents one feature.

Returns

a dataframe of shape (1386, 60)

Return type

pd.DataFrame

fetch_pet(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

average monthly potential evapotranspiration starting from 1950-01 to 2018-09

Examples

>>> from ai4water.datasets import Quadica
>>> dataset = Quadica()
>>> df = dataset.fetch_pet() # -> (828, 1388)
fetch_precip(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

sums of precipitation starting from 1950-01 to 2018-09

Returns

a dataframe of shape (828, 1388)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import Quadica
>>> dataset = Quadica()
>>> df = dataset.fetch_precip() # -> (828, 1388)
fetch_tavg(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

monthly median average temperatures starting from 1950-01 to 2018-09

Examples

>>> from ai4water.datasets import Quadica
>>> dataset = Quadica()
>>> df = dataset.fetch_tavg() # -> (828, 1388)
fetch_wrtds_annual(features: Optional[Union[str, list]] = None, st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

Annual median concentrations, flow-normalized concentrations, and mean fluxes estimated using Weighted Regressions on Time, Discharge, and Season (WRTDS) for stations with enough data availability.

Parameters
  • features (optional) –

  • st (optional) – starting point of data. By default, the data starts from 1992

  • en (optional) – end point of data. By default, the data ends at 2013

Returns

a dataframe of shape (4213, 46)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import Quadica
>>> dataset = Quadica()
>>> df = dataset.fetch_wrtds_annual()
fetch_wrtds_monthly(features: Optional[Union[str, list]] = None, st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

Monthly median concentrations, flow-normalized concentrations, and mean fluxes estimated using Weighted Regressions on Time, Discharge, and Season (WRTDS) for stations with enough data availability.

Parameters
  • features (optional) –

  • st (optional) – starting point of data. By default, the data starts from 1992-09

  • en (optional) – end point of data. By default, the data ends at 2013-12

Returns

a dataframe of shape (50186, 47)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import Quadica
>>> dataset = Quadica()
>>> df = dataset.fetch_wrtds_monthly()
monthly_medians() DataFrame[source]

Monthly medians over the whole time series of water quality variables and discharge

Returns

a dataframe of shape (16629, 18)

Return type

pd.DataFrame

url = {'metadata.pdf': 'https://www.hydroshare.org/resource/26e8238f0be14fa1a49641cd8a455e29/data/contents/Metadata_QUADICA.pdf', 'quadica.zip': 'https://www.hydroshare.org/resource/26e8238f0be14fa1a49641cd8a455e29/data/contents/QUADICA.zip'}

SWECanada

class ai4water.datasets.datasets.SWECanada(**kwargs)[source]

Bases: Datasets

Daily Canadian historical Snow Water Equivalent dataset from 1928 to 2020 from brown et al., 2019 [1]_

Examples

>>> from ai4water.datasets import SWECanada
>>> swe = SWECanada()
... # get names of all available stations
>>> stns = swe.stations()
... # get data of one station
>>> df1 = swe.fetch('SCD-NS010')
... # get data of 10 stations
>>> df10 = swe.fetch(10, st='20110101')
... # get data of 0.1% of stations
>>> df2 = swe.fetch(0.001, st='20110101')
... # get data of one stations starting from 2011
>>> df3 = swe.fetch('ALE-05AE810', st='20110101')
...
>>> df4 = swe.fetch(stns[0:10], st='20110101')
1

https://doi.org/10.1080/07055900.2019.1598843

__init__(**kwargs)[source]
Parameters
  • name

  • units

property end
feaures = ['snw', 'snd', 'den']
fetch(station_id: Optional[Union[str, list, int, float]] = None, features: Optional[Union[str, list]] = None, q_flags: Optional[Union[str, list]] = None, st=None, en=None) dict[source]

Fetches time series data from selected stations.

Parameters
  • station_id – station/stations to be retrieved. In None, then data from all stations will be returned.

  • features

    Names of features to be retrieved. Following features are allowed:

    • snw snow water equivalent kg/m3

    • snd snow depth m

    • den snowpack bulk density kg/m3

    If None, then all three features will be retrieved.

  • q_flags

    If None, then no qflags will be returned. Following q_flag values are available.

    • data_flag_snw

    • data_flag_snd

    • qc_flag_snw

    • qc_flag_snd

  • st – start of data to be retrieved

  • en – end of data to be retrived.

Returns

a dictionary of dataframes of shape (st:en, features + q_flags) whose length is equal to length of stations being considered.

Return type

dict

fetch_station_attributes(stn, features_to_fetch, st=None, en=None) DataFrame[source]

fetches attributes of one station

q_flags = ['data_flag_snw', 'data_flag_snd', 'qc_flag_snw', 'qc_flag_snd']
property start
stations() list[source]
url = 'https://doi.org/10.5194/essd-2021-160'

RRLuleaSweden

class ai4water.datasets.datasets.RRLuleaSweden(**kwargs)[source]

Bases: Datasets

Rainfall runoff data for an urban catchment from 2016-2019 following the work of Broekhuizen et al., 2020 11 .

11

https://doi.org/10.5194/hess-24-869-2020

__init__(**kwargs)[source]
Parameters
  • name

  • units

fetch(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None)[source]

fetches rainfall runoff data

Parameters
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:41

fetch_flow(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

fetches flow data

Parameters
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:35:00

Returns

a dataframe of shape (37_618, 3)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> flow = dataset.fetch_flow()
fetch_pcp(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

fetches precipitation data

Parameters
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 19:48:00

  • en (optional) – end of data to be fetched. By default the end is 2019-10-26 23:59:00

Returns

a dataframe of shape (967_080, 1)

Return type

pd.DataFrame

Examples

>>> from ai4water.datasets import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> pcp = dataset.fetch_pcp()
url = 'https://zenodo.org/record/3931582'