RainfallRunoff

Camels

class ai4water.datasets.camels.Camels(path=None, **kwargs)[source]

Bases: Datasets

Get CAMELS dataset. This class first downloads the CAMELS dataset if it is not already downloaded. Then the selected attribute for a selected id are fetched and provided to the user using the method fetch.

- ds_dir str/path
Type:

diretory of the dataset

- dynamic_features list

this dataset

Type:

tells which dynamic attributes are available in

- static_features list
Type:

a list of static attributes.

- static_attribute_categories list

are present in this category.

Type:

tells which kinds of static attributes

- stations : returns name/id of stations for which the data (dynamic attributes)

exists as list of strings.

- fetch : fetches all attributes (both static and dynamic type) of all

station/gauge_ids or a speficified station. It can also be used to fetch all attributes of a number of stations ids either by providing their guage_id or by just saying that we need data of 20 stations which will then be chosen randomly.

- fetch_dynamic_features :

fetches speficied dynamic attributes of one specified station. If the dynamic attribute is not specified, all dynamic attributes will be fetched for the specified station. If station is not specified, the specified dynamic attributes will be fetched for all stations.

- fetch_static_features :

works same as fetch_dynamic_features but for static attributes. Here if the category is not specified then static attributes of the specified station for all categories are returned.

stations : returns list of stations

__init__(path=None, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

DATASETS = {'CAMELS-BR': {'url': 'https://zenodo.org/record/3964745#.YA6rUxZS-Uk'}, 'CAMELS-GB': {'url': <function gb_message>}}
property camels_dir

Directory where all camels datasets will be saved. This will under datasets directory

property ds_dir

Directory where a particular dataset will be saved.

property dynamic_features: list
property end
fetch(stations: Union[None, str, float, int, list] = None, dynamic_features: Optional[Union[list, str]] = 'all', static_features: Union[None, str, list] = None, st: Union[None, str] = None, en: Union[None, str] = None, as_dataframe: bool = False, **kwargs) Union[dict, DataFrame][source]

Fetches the attributes of one or more stations.

Parameters:
  • stations – if string, it is supposed to be a station name/gauge_id. If list, it will be a list of station/gauge_ids. If int, it will be supposed that the user want data for this number of stations/gauge_ids. If None (default), then attributes of all available stations. If float, it will be supposed that the user wants data of this fraction of stations.

  • dynamic_features – If not None, then it is the attributes to be fetched. If None, then all available attributes are fetched

  • static_features – list of static attributes to be fetches. None means no static attribute will be fetched.

  • st – starting date of data to be returned. If None, the data will be returned from where it is available.

  • en – end date of data to be returned. If None, then the data will be returned till the date data is available.

  • as_dataframe – whether to return dynamic attributes as pandas dataframe or as xarray dataset.

  • kwargs – keyword arguments to read the files

Returns:

If both static and dynamic features are obtained then it returns a dictionary whose keys are station/gauge_ids and values are the attributes and dataframes. Otherwise either dynamic or static features are returned.

Examples

>>> dataset = CAMELS_AUS()
>>> # get data of 10% of stations
>>> df = dataset.fetch(stations=0.1, as_dataframe=True)  # returns a multiindex dataframe
...  # fetch data of 5 (randomly selected) stations
>>> df = dataset.fetch(stations=5, as_dataframe=True)
... # fetch data of 3 selected stations
>>> df = dataset.fetch(stations=['912101A','912105A','915011A'], as_dataframe=True)
... # fetch data of a single stations
>>> df = dataset.fetch(stations='318076', as_dataframe=True)
... # get both static and dynamic features as dictionary
>>> data = dataset.fetch(1, static_features="all", as_dataframe=True)  # -> dict
>>> data['dynamic']
... # get only selected dynamic features
>>> df = dataset.fetch(stations='318076',
...     dynamic_features=['streamflow_MLd', 'solarrad_AWAP'], as_dataframe=True)
... # fetch data between selected periods
>>> df = dataset.fetch(stations='318076', st="20010101", en="20101231", as_dataframe=True)
fetch_dynamic_features(stn_id: str, features='all', st=None, en=None, as_dataframe=False)[source]

Fetches all or selected dynamic attributes of one station.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available dynamic features are returned.

  • st (Optional (default=None)) – start time from where to fetch the data.

  • en (Optional (default=None)) – end time untill where to fetch the data

  • as_dataframe (bool, optional (default=False)) – if true, the returned data is pandas DataFrame otherwise it is xarray dataset

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_dynamic_features('224214A', as_dataframe=True).unstack()
>>> camels.dynamic_features
>>> camels.fetch_dynamic_features('224214A',
... attributes=['tmax_AWAP', 'vprp_AWAP', 'streamflow_mmd'],
... as_dataframe=True).unstack()
fetch_static_features(stn_id: Union[str, list], features: Optional[Union[str, list]] = None)[source]

Fetches all or selected static attributes of one or more stations.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> camels = CAMELS_AUS()
>>> camels.fetch_static_features('224214A')
>>> camels.static_features
>>> camels.fetch_static_features('224214A',
... features=['elev_mean', 'relief', 'ksat', 'pop_mean'])
fetch_station_attributes(station: str, dynamic_features: Optional[Union[str, list]] = 'all', static_features: Union[None, str, list] = None, as_ts: bool = False, st: Union[None, str] = None, en: Union[None, str] = None, **kwargs) DataFrame[source]

Fetches attributes for one station.

Parameters:
  • station – station id/gauge id for which the data is to be fetched.

  • dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch

  • static_features – names of static features/attributes to be fetches

  • as_ts (bool) – whether static attributes are to be converted into a time series or not. If yes then the returned time series will be of same length as that of dynamic attribtues.

  • st (str,optional) – starting point from which the data to be fetched. By default the data will be fetched from where it is available.

  • en (str, optional) – end point of data to be fetched. By default the dat will be fetched

Returns:

dataframe if as_ts is True else it returns a dictionary of static and dynamic attributes for a station/gauge_id

Return type:

pd.DataFrame

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> dataset.fetch_station_attributes('912101A')
fetch_stations_attributes(stations: list, dynamic_features='all', static_features=None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

Reads attributes of more than one stations.

Parameters:
  • stations – list of stations for which data is to be fetched.

  • dynamic_features – list of dynamic attributes to be fetched. if ‘all’, then all dynamic attributes will be fetched.

  • static_features – list of static attributes to be fetched. If all, then all static attributes will be fetched. If None, then no static attribute will be fetched.

  • st – start of data to be fetched.

  • en – end of data to be fetched.

  • as_dataframe – whether to return the data as pandas dataframe. default is xr.dataset object

  • dict (kwargs) – additional keyword arguments

Returns:

Dynamic and static features of multiple stations. Dynamic features are by default returned as xr.Dataset unless as_dataframe is True, in such a case, it is a pandas dataframe with multiindex. If xr.Dataset, it consists of data_vars equal to number of stations and for each station, the DataArray is of dimensions (time, dynamic_features). where time is defined by st and en i.e length of DataArray. In case, when the returned object is pandas DataFrame, the first index is time and second index is dyanamic_features. Static attributes are always returned as pandas DataFrame and have following shape (stations, static_features). If `dynamic_features is None, then they are not returned and the returned value only consists of static features. Same holds true for static_features. If both are not None, then the returned type is a dictionary with static and dynamic keys.

Raises:

ValueError, if both dynamic_features and static_features are None

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
... # find out station ids
>>> dataset.stations()
... # get data of selected stations
>>> dataset.fetch_stations_attributes(['912101A', '912105A', '915011A'],
...  as_dataframe=True)
property start
stations()[source]
to_ts(static, st, en, as_ts=False, freq='D')[source]

CAMELS_AUS

class ai4water.datasets.camels.CAMELS_AUS(path: Optional[str] = None)[source]

Bases: Camels

Inherits from Camels class. Reads CAMELS-AUS dataset of Fowler et al., 2020 dataset.

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
   (21184, 26)
... # get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
   222
... # get data by station id
>>> df = dataset.fetch(stations='224214A', as_dataframe=True).unstack()
>>> df.shape
    (21184, 26)
... # get names of available dynamic features
>>> dataset.dynamic_features
... # get only selected dynamic features
>>> data = dataset.fetch(1, as_dataframe=True,
...  dynamic_features=['tmax_AWAP', 'precipitation_AWAP', 'et_morton_actual_SILO', 'streamflow_MLd']).unstack()
>>> data.shape
   (21184, 4)
... # get names of available static features
>>> dataset.static_features
... # get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape  # remember this is a multiindexed dataframe
   (21184, 260)
__init__(path: Optional[str] = None)[source]
Parameters:

path – path where the CAMELS-AUS dataset has been downloaded. This path must contain five zip files and one xlsx file. If None, then the data will downloaded.

property dynamic_features: list
property end
fetch_static_features(stn_id: Union[str, List[str]], features: Union[str, List[str]] = 'all', **kwargs) DataFrame[source]

Fetches static attribuets of one or more stations as dataframe.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import CAMELS_AUS
>>> dataset = CAMELS_AUS()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    222
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (222, 110)
get static data of one station only
>>> static_data = dataset.fetch_static_features('305202')
>>> static_data.shape
   (1, 110)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['catchment_di', 'elev_mean'])
>>> static_data.shape
   (222, 2)
folders = {'et_morton_actual_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_actual_SILO', 'et_morton_point_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_point_SILO', 'et_morton_wet_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_morton_wet_SILO', 'et_short_crop_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_short_crop_SILO', 'et_tall_crop_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/et_tall_crop_SILO', 'evap_morton_lake_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_morton_lake_SILO', 'evap_pan_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_pan_SILO', 'evap_syn_SILO': '05_hydrometeorology/05_hydrometeorology/02_EvaporativeDemand_timeseries/evap_syn_SILO', 'mslp_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/mslp_SILO', 'precipitation_AWAP': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_AWAP', 'precipitation_SILO': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_SILO', 'precipitation_var_SWAP': '05_hydrometeorology/05_hydrometeorology/01_precipitation_timeseries/precipitation_var_AWAP', 'radiation_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/radiation_SILO', 'rh_tmax_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/rh_tmax_SILO', 'rh_tmin_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/rh_tmin_SILO', 'solarrad_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/solarrad_AWAP', 'streamflow_MLd': '03_streamflow/03_streamflow/streamflow_MLd', 'streamflow_MLd_inclInfilled': '03_streamflow/03_streamflow/streamflow_MLd_inclInfilled', 'streamflow_mmd': '03_streamflow/03_streamflow/streamflow_mmd', 'tmax_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/tmax_AWAP', 'tmax_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/tmax_SILO', 'tmin_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/tmin_AWAP', 'tmin_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/tmin_SILO', 'vp_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/vp_SILO', 'vp_deficit_SILO': '05_hydrometeorology/05_hydrometeorology/03_Other/SILO/vp_deficit_SILO', 'vprp_AWAP': '05_hydrometeorology/05_hydrometeorology/03_Other/AWAP/vprp_AWAP'}
property location
plot(what, stations=None, **kwargs)[source]
property start
property static_attribute_categories
property static_features: list
stations(as_list=True) list[source]
url = 'https://doi.pangaea.de/10.1594/PANGAEA.921850'
urls = {'01_id_name_metadata.zip': 'https://download.pangaea.de/dataset/921850/files/', '02_location_boundary_area.zip': 'https://download.pangaea.de/dataset/921850/files/', '03_streamflow.zip': 'https://download.pangaea.de/dataset/921850/files/', '04_attributes.zip': 'https://download.pangaea.de/dataset/921850/files/', '05_hydrometeorology.zip': 'https://download.pangaea.de/dataset/921850/files/', 'CAMELS_AUS_Attributes-Indices_MasterTable.csv': 'https://download.pangaea.de/dataset/921850/files/', 'Units_01_TimeseriesData.pdf': 'https://download.pangaea.de/dataset/921850/files/', 'Units_02_AttributeMasterTable.pdf': 'https://download.pangaea.de/dataset/921850/files/'}

CAMELS_GB

class ai4water.datasets.camels.CAMELS_GB(path=None)[source]

Bases: Camels

This dataset must be manually downloaded by the user. The path of the downloaded folder must be provided while initiating this class.

__init__(path=None)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['precipitation', 'pet', 'temperature', 'discharge_spec', 'discharge_vol', 'peti', 'humidity', 'shortwave_rad', 'longwave_rad', 'windspeed']
property end
fetch_static_features(stn_id: Union[str, List[str]], features: Union[str, List[str]] = 'all') DataFrame[source]

Fetches static attributes of one or more stations for one or more category as dataframe.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import CAMELS_GB
>>> dataset = CAMELS_GB()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    671
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (671, 290)
get static data of one station only
>>> static_data = dataset.fetch_static_features('85004')
>>> static_data.shape
   (1, 290)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['area', 'elev_mean'])
>>> static_data.shape
   (671, 2)
property start
property static_attribute_categories: list
property static_features
stations(to_exclude=None)[source]

CAMELS_BR

class ai4water.datasets.camels.CAMELS_BR(path=None)[source]

Bases: Camels

Downloads and processes CAMELS dataset of Brazil

Examples

>>> from ai4water.datasets import CAMELS_BR
>>> dataset = CAMELS_BR(path=r'F:\data\CAMELS\CAMELS_BR')
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(14245, 12)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
593
# get data by station id
>>> df = dataset.fetch(stations='46035000', as_dataframe=True).unstack()
>>> df.shape
(14245, 12)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['precipitation_cpc', 'evapotransp_mgb', 'temperature_mean', 'streamflow_m3s']).unstack()
>>> df.shape
(14245, 4)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(170940, 10)  # remember this is multi-indexed DataFrame
__init__(path=None)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

all_stations(attribute) list[source]

Tells all station ids for which a data of a specific attribute is available.

property dynamic_features: list
property end
fetch_static_features(stn_id: Union[str, List[str]], features: Optional[Union[str, List[str]]] = None) DataFrame[source]
Parameters:
  • stn_id (int/list) – station id whose attribute to fetch

  • features (str/list) – name of attribute to fetch. Default is None, which will return all the attributes for a particular station of the specified category.

Example

>>> dataset = Camels('CAMELS-BR')
>>> df = dataset.fetch_static_features('11500000', 'climate')
# read all static features of all stations
>>> data = dataset.fetch_static_features(dataset.stations(), dataset.static_features)
>>> data.shape
(597, 67)
folders = {'evapotransp_gleam': '08_CAMELS_BR_evapotransp_gleam', 'evapotransp_mgb': '09_CAMELS_BR_evapotransp_mgb', 'potential_evapotransp_gleam': '10_CAMELS_BR_potential_evapotransp_gleam', 'precipitation_chirps': '05_CAMELS_BR_precipitation_chirps', 'precipitation_cpc': '07_CAMELS_BR_precipitation_cpc', 'precipitation_mswep': '06_CAMELS_BR_precipitation_mswep', 'simulated_streamflow_m3s': '04_CAMELS_BR_streamflow_simulated', 'streamflow_m3s': '02_CAMELS_BR_streamflow_m3s', 'streamflow_mm': '03_CAMELS_BR_streamflow_mm_selected_catchments', 'temperature_max': '13_CAMELS_BR_temperature_max_cpc', 'temperature_mean': '12_CAMELS_BR_temperature_mean_cpc', 'temperature_min': '11_CAMELS_BR_temperature_min_cpc'}
property start
property static_attribute_categories
property static_dir
property static_features
property static_files
stations(to_exclude=None) list[source]

Returns a list of station ids which are common among all dynamic attributes.

Example

>>> dataset = CAMELS_BR()
>>> stations = dataset.stations()
url = 'https://zenodo.org/record/3964745#.YA6rUxZS-Uk'

CAMELS_US

class ai4water.datasets.camels.CAMELS_US(data_source='basin_mean_daymet', path=None)[source]

Bases: Camels

Downloads and processes CAMELS dataset of 671 catchments named as CAMELS from https://ral.ucar.edu/solutions/products/camels following Newman et al., 2015 [1]

Examples

>>> from ai4water.datasets import CAMELS_US
>>> dataset = CAMELS_US(path=r'F:\data\CAMELS\CAMELS_US')
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
(12784, 8)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
671
# get data by station id
>>> df = dataset.fetch(stations='11478500', as_dataframe=True).unstack()
>>> df.shape
(12784, 8)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['prcp(mm/day)', 'srad(W/m2)', 'tmax(C)', 'tmin(C)', 'Flow']).unstack()
>>> df.shape
(12784, 5)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(102272, 10)  # remember this is multi-indexed DataFrame
__init__(data_source='basin_mean_daymet', path=None)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

DATASETS = ['CAMELS_US']
catchment_attr_url = 'https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/camels_attributes_v2.0.zip'
dynamic_features = ['dayl(s)', 'prcp(mm/day)', 'srad(W/m2)', 'swe(mm)', 'tmax(C)', 'tmin(C)', 'vp(Pa)', 'Flow']
property end
fetch_static_features(stn_id: Union[str, List[str]], features: Optional[Union[str, List[str]]] = None)[source]

gets one or more static features of one or more stations

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import CAMELS_US
>>> camels = CAMELS_US()
>>> st_data = camels.fetch_static_features('11532500')
>>> st_data.shape
   (1, 59)
get names of available static features
>>> camels.static_features
get specific features of one station
>>> static_data = camels.fetch_static_features('11528700',
>>> features=['area_gages2', 'geol_porostiy', 'soil_conductivity', 'elev_mean'])
>>> static_data.shape
   (1, 4)
get names of allstations
>>> all_stns = camels.stations()
>>> len(all_stns)
   671
>>> all_static_data = camels.fetch_static_features(all_stns)
>>> all_static_data.shape
   (671, 59)
folders = {'basin_mean_daymet': 'basin_mean_forcing/daymet', 'basin_mean_maurer': 'basin_mean_forcing/maurer', 'basin_mean_nldas': 'basin_mean_forcing/nldas', 'basin_mean_v1p15_daymet': 'basin_mean_forcing/v1p15/daymet', 'basin_mean_v1p15_nldas': 'basin_mean_forcing/v1p15/nldas', 'elev_bands': 'elev/daymet', 'hru': 'hru_forcing/daymet'}
property start
property static_features
stations() list[source]
url = 'https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip'

CAMELS_CL

class ai4water.datasets.camels.CAMELS_CL(path: Optional[str] = None)[source]

Bases: Camels

Downloads and processes CAMELS dataset of Chile following the work of Alvarez-Garreton et al., 2018 .

Examples

>>> from ai4water.datasets import CAMELS_CL
>>> dataset = CAMELS_CL()
>>> df = dataset.fetch(stations=1, as_dataframe=True)
>>> df = df.unstack() # the returned dataframe is a multi-indexed dataframe so we have to unstack it
>>> df.shape
    (38374, 12)
# get name of all stations as list
>>> stns = dataset.stations()
>>> len(stns)
516
# get data by station id
>>> df = dataset.fetch(stations='11130001', as_dataframe=True).unstack()
>>> df.shape
(38374, 12)
# get names of available dynamic features
>>> dataset.dynamic_features
# get only selected dynamic features
>>> df = dataset.fetch(1, as_dataframe=True,
... dynamic_features=['pet_hargreaves', 'precip_tmpa', 'tmean_cr2met', 'streamflow_m3s']).unstack()
>>> df.shape
(38374, 4)
# get names of available static features
>>> dataset.static_features
# get data of 10 random stations
>>> df = dataset.fetch(10, as_dataframe=True)
>>> df.shape
(460488, 10)
__init__(path: Optional[str] = None)[source]
Parameters:

path – path where the CAMELS-CL dataset has been downloaded. This path must contain five zip files and one xlsx file.

dynamic_features = ['streamflow_m3s', 'streamflow_mm', 'precip_cr2met', 'precip_chirps', 'precip_mswep', 'precip_tmpa', 'tmin_cr2met', 'tmax_cr2met', 'tmean_cr2met', 'pet_8d_modis', 'pet_hargreaves', 'swe']
property end
fetch_static_features(stn_id: Union[str, List[str]], features: Optional[Union[str, List[str]]] = None)[source]

Returns static features of one or more stations.

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import CAMELS_CL
>>> dataset = CAMELS_CL()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    516
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (516, 104)
get static data of one station only
>>> static_data = dataset.fetch_static_features('11315001')
>>> static_data.shape
   (1, 104)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope_mean', 'area'])
>>> static_data.shape
   (516, 2)
>>> data = dataset.fetch_static_features('2110002', features=['slope_mean', 'area'])
>>> data.shape
   (1, 2)
property start
property static_features: list
stations() list[source]

Tells all station ids for which a data of a specific attribute is available.

urls = {'10_CAMELScl_tmean_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '11_CAMELScl_pet_8d_modis.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '12_CAMELScl_pet_hargreaves.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '13_CAMELScl_swe.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '14_CAMELScl_catch_hierarchy.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '1_CAMELScl_attributes.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '2_CAMELScl_streamflow_m3s.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '3_CAMELScl_streamflow_mm.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '4_CAMELScl_precip_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '5_CAMELScl_precip_chirps.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '6_CAMELScl_precip_mswep.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '7_CAMELScl_precip_tmpa.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '8_CAMELScl_tmin_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', '9_CAMELScl_tmax_cr2met.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/', 'CAMELScl_catchment_boundaries.zip': 'https://store.pangaea.de/Publications/Alvarez-Garreton-etal_2018/'}

WaterBenchIowa

class ai4water.datasets.camels.WaterBenchIowa(path=None)[source]

Bases: Camels

Rainfall run-off dataset for Iowa (US) following the work of Demir et al., 2022

Examples

>>> from ai4water.datasets import WaterBenchIowa
>>> ds = WaterBenchIowa()
... # fetch static and dynamic features of 5 stations
>>> data = ds.fetch(5, as_dataframe=True)
>>> data.shape  # it is a multi-indexed DataFrame
(184032, 5)
... # fetch both static and dynamic features of 5 stations
>>> data = ds.fetch(5, static_features="all", as_dataframe=True)
>>> data.keys()
dict_keys(['dynamic', 'static'])
>>> data['static'].shape
(5, 7)
>>> data['dynamic']  # returns a xarray DataSet
... # using another method
>>> data = ds.fetch_dynamic_features('644', as_dataframe=True)
>>> data.unstack().shape
(61344, 3)
__init__(path=None)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

property dynamic_features: List[str]
property end
fetch_static_features(stn_id: Union[str, List[str]], features: Optional[Union[str, List[str]]] = None) DataFrame[source]
Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import WaterBenchIowa
>>> dataset = WaterBenchIowa()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    125
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (125, 7)
get static data of one station only
>>> static_data = dataset.fetch_static_features('592')
>>> static_data.shape
   (1, 7)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['slope', 'area'])
>>> static_data.shape
   (125, 2)
>>> data = dataset.fetch_static_features('592', features=['slope', 'area'])
>>> data.shape
   (1, 2)
fetch_station_attributes(station: str, dynamic_features: Optional[Union[str, list]] = 'all', static_features: Union[None, str, list] = None, as_ts: bool = False, st: Union[None, str] = None, en: Union[None, str] = None, **kwargs) DataFrame[source]

Examples

>>> from ai4water.datasets import WaterBenchIowa
>>> dataset = WaterBenchIowa()
>>> data = dataset.fetch_station_attributes('666')
property start
property static_features: List[str]
stations() List[str][source]
property ts_path: str
url = 'https://zenodo.org/record/7087806#.Y6rW-BVByUk'

LamaH

class ai4water.datasets.camels.LamaH(*, time_step: str, data_type: str, **kwargs)[source]

Bases: Camels

Large-Sample Data for Hydrology and Environmental Sciences for Central Europe from Zenodo following the work of Klingler et al., 2021 .

__init__(*, time_step: str, data_type: str, **kwargs)[source]
Parameters:
  • time_step – possible values are daily or hourly

  • data_type – possible values are total_upstrm, diff_upstrm_all or diff_upstrm_lowimp

Examples

>>> from ai4water.datasets import LamaH
>>> dataset = LamaH(time_step='daily', data_type='total_upstrm')
>>> df = dataset.fetch(3, as_dataframe=True)
property data_type_dir
property dynamic_features
property end
fetch_static_features(stn_id: Union[str, List[str]], features: Optional[Union[str, List[str]]] = None) DataFrame[source]

static features of LamaH

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import LamaH
>>> dataset = LamaH(time_step='daily', data_type='total_upstrm')
>>> df = dataset.fetch_static_features('99')  # (1, 61)
...  # get list of all static features
>>> dataset.static_features
>>> dataset.fetch_static_features('99',
>>> features=['area_calc', 'elev_mean', 'agr_fra', 'sand_fra'])  # (1, 4)
read_ts_of_station(station) DataFrame[source]
property start
static_attribute_categories = ['']
property static_features: list
stations() list[source]
time_steps = ['daily', 'hourly']
url = 'https://zenodo.org/record/4609826#.YFNp59zt02w'

HYSETS

class ai4water.datasets.camels.HYSETS(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]

Bases: Camels

database for hydrometeorological modeling of 14,425 North American watersheds from 1950-2018 following the work of Arsenault et al., 2020 The user must manually download the files, unpack them and provide the path where these files are saved.

This data comes with multiple sources. Each source having one or more dynamic_features Following data_source are available.

SNODAS_SWE

dynamic_features

dscharge, swe

SCDNA

discharge, pr, tasmin, tasmax

nonQC_stations

discharge, pr, tasmin, tasmax

Livneh

discharge, pr, tasmin, tasmax

ERA5

discharge, pr, tasmax, tasmin

ERAS5Land_SWE

discharge, swe

ERA5Land

discharge, pr, tasmax, tasmin

all sources contain one or more following dynamic_features with following shapes

time

shape

(25202,)

watershedID

(14425,)

drainage_area

(14425,)

drainage_area_GSIM

(14425,)

flag_GSIM_boundaries

(14425,)

flag_artificial_boundaries

(14425,)

centroid_lat

(14425,)

centroid_lon

(14425,)

elevation

(14425,)

slope

(14425,)

discharge

(14425, 25202)

pr

(14425, 25202)

tasmax

(14425, 25202)

tasmin

(14425, 25202)

Examples

>>> from ai4water.datasets import HYSETS
>>> dataset = HYSETS(path="path/to/HYSETS")
... # fetch data of a random station
>>> df = dataset.fetch(1, as_dataframe=True)
>>> df.shape
(25202, 5)
>>> stations = dataset.stations()
>>> len(stations)
14425
>>> df = dataset.fetch('999', as_dataframe=True)
>>> df.unstack().shape
(25202, 5)
__init__(path: str, swe_source: str = 'SNODAS_SWE', discharge_source: str = 'ERA5', tasmin_source: str = 'ERA5', tasmax_source: str = 'ERA5', pr_source: str = 'ERA5', **kwargs)[source]
Parameters:
  • path – path where all the data files are saved.

  • swe_source – source of swe data.

  • discharge_source – source of discharge data

  • tasmin_source – source of tasmin data

  • tasmax_source – source of tasmax data

  • pr_source – source of pr data

  • kwargs – arguments for Camels base class

OTHER_SRC = ['ERA5', 'ERA5Land', 'Livneh', 'nonQC_stations', 'SCDNA']
Q_SRC = ['ERA5', 'ERA5Land', 'ERA5Land_SWE', 'Livneh', 'nonQC_stations', 'SCDNA', 'SNODAS_SWE']
SWE_SRC = ['ERA5Land_SWE', 'SNODAS_SWE']
doi = 'https://doi.org/10.1038/s41597-020-00583-2'
property ds_dir

Directory where a particular dataset will be saved.

dynamic_features = ['discharge', 'swe', 'tasmin', 'tasmax', 'pr']
property end: str
fetch_dynamic_features(stn_id, features='all', st=None, en=None, as_dataframe=False)[source]

Fetches dynamic attributes of one station.

Examples

>>> from ai4water.datasets import HYSETS
>>> dataset = HYSETS()
>>> dyn_features = dataset.fetch_dynamic_features('station_name')
fetch_static_features(stn_id: Union[str, List[str]], features: Union[str, List[str]] = 'all', st=None, en=None, as_ts=False) DataFrame[source]

returns static atttributes of one or multiple stations

Parameters:
  • stn_id (str) – name/id of station of which to extract the data

  • features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available static features are returned.

Examples

>>> from ai4water.datasets import HYSETS
>>> dataset = HYSETS()
get the names of stations
>>> stns = dataset.stations()
>>> len(stns)
    14425
get all static data of all stations
>>> static_data = dataset.fetch_static_features(stns)
>>> static_data.shape
   (14425, 28)
get static data of one station only
>>> static_data = dataset.fetch_static_features('991')
>>> static_data.shape
   (1, 28)
get the names of static features
>>> dataset.static_features
get only selected features of all stations
>>> static_data = dataset.fetch_static_features(stns, ['Drainage_Area_km2', 'Elevation_m'])
>>> static_data.shape
   (14425, 2)
fetch_stations_attributes(stations: list, dynamic_features: Optional[Union[str, list]] = 'all', static_features: Union[None, str, list] = None, st=None, en=None, as_dataframe: bool = False, **kwargs)[source]

returns attributes of multiple stations .. rubric:: Examples

>>> from ai4water.datasets import HYSETS
>>> dataset = HYSETS()
>>> stations = dataset.stations()[0:3]
>>> attributes = dataset.fetch_stations_attributes(stations)
read_static_data()[source]
property start: str
property static_features: list
stations() List[str][source]
Returns:

a list of ids of stations

Return type:

list

Examples

>>> dataset = HYSETS()
... # get name of all stations as list
>>> dataset.stations()
url = 'https://osf.io/rpc3w/'

HYPE

class ai4water.datasets.camels.HYPE(time_step: str = 'daily', path=None, **kwargs)[source]

Bases: Camels

Downloads and preprocesses HYPE [1] dataset from Lindstroem et al., 2010 [2] . This is a rainfall-runoff dataset of Sweden of 564 stations from 1985 to 2019 at daily, monthly and yearly time steps.

Examples

>>> from ai4water.datasets import HYPE
>>> dataset = HYPE()
... # get data of 5% of stations
>>> df = dataset.fetch(stations=0.05, as_dataframe=True)  # returns a multiindex dataframe
>>> df.shape
  (115047, 28)
... # fetch data of 5 (randomly selected) stations
>>> df = dataset.fetch(stations=5, as_dataframe=True)
>>> df.shape
   (115047, 5)
fetch data of 3 selected stations
>>> df = dataset.fetch(stations=['564','563','562'], as_dataframe=True)
>>> df.shape
   (115047, 3)
... # fetch data of a single stations
>>> df = dataset.fetch(stations='500', as_dataframe=True)
   (115047, 1)
# get only selected dynamic features
>>> df = dataset.fetch(stations='501',
...    dynamic_features=['AET_mm', 'Prec_mm',  'Streamflow_mm'], as_dataframe=True)
# fetch data between selected periods
>>> df = dataset.fetch(stations='225', st="20010101", en="20101231", as_dataframe=True)
>>> df.shape
   (32868, 1)
... # get data at monthly time step
>>> dataset = HYPE(time_step="month")
>>> df = dataset.fetch(stations='500', as_dataframe=True)
>>> df.shape
   (3780, 1)
__init__(time_step: str = 'daily', path=None, **kwargs)[source]
Parameters:
  • time_step (str) – one of daily, month or year

  • **kwargs – key word arguments

dynamic_features = ['AET_mm', 'Baseflow_mm', 'Infiltration_mm', 'SM_mm', 'Streamflow_mm', 'Runoff_mm', 'Qsim_m3-s', 'Prec_mm', 'PET_mm']
property end
fetch_static_features(stn_id, features=None)[source]

static data for HYPE is not available.

property start
property static_features
stations() list[source]
url = ['https://zenodo.org/record/581435', 'https://zenodo.org/record/4029572']

RRLuleaSweden

class ai4water.datasets.RRLuleaSweden(path=None, **kwargs)[source]

Bases: Datasets

Rainfall runoff data for an urban catchment from 2016-2019 following the work of Broekhuizen et al., 2020 [11] .

__init__(path=None, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

fetch(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None)[source]

fetches rainfall runoff data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:41

fetch_flow(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

fetches flow data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 20:50:00

  • en (optional) – end of data to be fetched. By default the end is 2019-09-15 18:35:00

Returns:

a dataframe of shape (37_618, 3) where the columns are velocity, level and flow rate

Return type:

pd.DataFrame

Examples

>>> from ai4water.datasets import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> flow = dataset.fetch_flow()
>>> flow.shape
(37618, 3)
fetch_pcp(st: Optional[Union[str, int, DatetimeIndex]] = None, en: Optional[Union[str, int, DatetimeIndex]] = None) DataFrame[source]

fetches precipitation data

Parameters:
  • st (optional) – start of data to be fetched. By default the data starts from 2016-06-16 19:48:00

  • en (optional) – end of data to be fetched. By default the end is 2019-10-26 23:59:00

Returns:

a dataframe of shape (967_080, 1)

Return type:

pd.DataFrame

Examples

>>> from ai4water.datasets import RRLuleaSweden
>>> dataset = RRLuleaSweden()
>>> pcp = dataset.fetch_pcp()
>>> pcp.shape
(967080, 1)
url = 'https://zenodo.org/record/3931582'