Laos

Ecoli Mekong River

E. coli data from Mekong river (Houay Pano) area from 2011 to 2021 Boithias et al., 2022 [1].

param st:

starting time. The default starting point is 2011-05-25 10:00:00

type st:

optional

param en:

end time, The default end point is 2021-05-25 15:41:00

type en:

optional

param features:

names of features to use. use all to get all features. By default following input features are selected

  • station_name name of station/catchment where the observation was made

  • T temperature

  • EC electrical conductance

  • DOpercent dissolved oxygen concentration

  • DO dissolved oxygen saturation

  • pH pH

  • ORP oxidation-reduction potential

  • Turbidity turbidity

  • TSS total suspended sediment concentration

  • E-coli_4dilutions Eschrechia coli concentration

type features:

str, optional

param overwrite:

whether to overwrite the downloaded file or not

type overwrite:

bool

returns:

with default parameters, the shape is (1602, 10)

rtype:

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_mekong
>>> ecoli_data = ecoli_mekong()
>>> ecoli_data.shape
(1602, 10)

Ecoli Mekong River (Laos)

  1. coli data from Mekong river (Northern Laos).

param st:

starting time

type st:

Union[str, pandas._libs.tslibs.timestamps.Timestamp, int]

param en:

end time

type en:

Union[str, pandas._libs.tslibs.timestamps.Timestamp, int]

param station_name:

type station_name:

str

param features:

type features:

str, optional

param overwrite:

whether to overwrite or not

type overwrite:

bool

returns:

with default parameters, the shape is (1131, 10)

rtype:

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_mekong_laos
>>> ecoli = ecoli_mekong_laos()
>>> ecoli.shape
(1131, 10)

Ecoli Houay Pano (Laos)

  1. coli data from Mekong river (Houay Pano) area.

param st:

starting time. The default starting point is 2011-05-25 10:00:00

type st:

optional

param en:

end time, The default end point is 2021-05-25 15:41:00

type en:

optional

param features:

names of features to use. use all to get all features. By default following input features are selected

station_name name of station/catchment where the observation was made T temperature EC electrical conductance DOpercent dissolved oxygen concentration DO dissolved oxygen saturation pH pH ORP oxidation-reduction potential Turbidity turbidity TSS total suspended sediment concentration E-coli_4dilutions Eschrechia coli concentration

type features:

str, optional

param overwrite:

whether to overwrite the downloaded file or not

type overwrite:

bool

returns:

with default parameters, the shape is (413, 10)

rtype:

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_houay_pano
>>> ecoli = ecoli_houay_pano()
>>> ecoli.shape
(413, 10)

Ecoli data from Mekong river (2016)

ai4water.datasets.mtropics.ecoli_mekong_2016(st: Union[str, Timestamp, int] = '20160101', en: Union[str, Timestamp, int] = '20161231', features: Optional[Union[str, list]] = None, overwrite=False) DataFrame[source]
  1. coli data from Mekong river from 2016 from 29 catchments

Parameters:
  • st – starting time

  • en – end time

  • features (str, optional) – names of features to use. use all to get all features.

  • overwrite (bool) – whether to overwrite the downloaded file or not

Returns:

with default parameters, the shape is (58, 10)

Return type:

pd.DataFrame

Examples

>>> from ai4water.datasets import ecoli_mekong_2016
>>> ecoli = ecoli_mekong_2016()
>>> ecoli.shape
(58, 10)

MtropicsLaos

class ai4water.datasets.mtropics.MtropicsLaos(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]

Bases: Datasets

Downloads and prepares hydrological, climate and land use data for Laos from Mtropics website and ird data servers.

- fetch_lu
- fetch_ecoli
- fetch_rain_gauges
- fetch_weather_station_data
- fetch_pcp
- fetch_hydro
- make_regression
__init__(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]
Parameters:
  • name – str (default=None) name of dataset

  • units – str, (default=None) the unit system being used

  • path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded

fetch_ecoli(features: Union[list, str] = 'Ecoli_mpn100', st: Union[str, Timestamp] = '20110525 10:00:00', en: Union[str, Timestamp] = '20210406 15:05:00', remove_duplicates: bool = True) DataFrame[source]

Fetches E. coli data collected at the outlet. See Ribolzi et al., 2021 and Boithias et al., 2021 for reference. NaNs represent missing values. The data is randomly sampled between 2011 to 2021 during rainfall events. Total 368 E. coli observation points are available now.

Parameters:
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

  • features

    1. coli concentration data. Following data are available

    • Ecoli_LL_mpn100: Lower limit of the confidence interval

    • Ecoli_mpn100: Stream water Escherichia coli concentration

    • Ecoli_UL_mpn100: Upper limit of the confidence interval

  • remove_duplicates – whether to remove duplicates or not. This is because some values were recorded within a minute,

Return type:

a pandas dataframe consisting of features as columns.

fetch_hydro(st: Union[str, Timestamp] = '20010101 00:06:00', en: Union[str, Timestamp] = '20200101 00:06:00') Tuple[DataFrame, DataFrame][source]

fetches water level (cm) and suspended particulate matter (g L-1). Both data are from 2001 to 2019 but are randomly sampled.

Parameters:
  • st (optional) – starting point of data to be fetched.

  • en (optional) – end point of data to be fetched.

Returns:

  • a tuple of pandas dataframes of water level and suspended particulate

  • matter.

fetch_lu(processed=False)[source]

returns landuse data as list of shapefiles.

fetch_pcp(st: Union[str, Timestamp] = '20010101 00:06:00', en: Union[str, Timestamp] = '20200101 00:06:00', freq: str = '6min') DataFrame[source]

Fetches the precipitation data which is collected at 6 minutes time-step from 2001 to 2020.

Parameters:
  • st – starting point of data to be fetched.

  • en – end point of data to be fetched.

  • freq – frequency at which the data is to be returned.

Return type:

pandas dataframe of precipitation data

fetch_physiochem(features: Union[list, str] = 'all', st: Union[str, Timestamp] = '20110525 10:00:00', en: Union[str, Timestamp] = '20210406 15:05:00') DataFrame[source]

Fetches physio-chemical features of Huoy Pano catchment Laos.

Parameters:
  • st – start of data.

  • en – end of data.

  • features

    The physio-chemical features to fetch. Following features are available

    • T

    • EC

    • DOpercent

    • DO

    • pH

    • ORP

    • Turbidity

    • TSS

Return type:

a pandas dataframe

fetch_rain_gauges(st: Union[str, Timestamp] = '20010101', en: Union[str, Timestamp] = '20191231') DataFrame[source]

fetches data from 7 rain gauges which is collected at daily time step from 2001 to 2019.

Parameters:
  • st – start of data. By default the data is fetched from the point it is available.

  • en – end of data. By default the data is fetched til the point it is available.

Returns:

  • a dataframe of 7 columns, where each column represnets a rain guage

  • observations. The length of dataframe depends upon range defined by

  • st and en arguments.

Examples

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> rg = laos.fetch_rain_gauges()
fetch_source() DataFrame[source]

returns monthly source data for E. coli at from 2001 to 2021 obtained from here

Return type:

pd.DataFrame of shape (252, 19)

fetch_suro() DataFrame[source]
returns surface runoff and soil detachment data from Houay pano,

Laos PDR.

Returns:

a dataframe of shape (293, 13)

Return type:

pd.DataFrame

Examples

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> suro = laos.fetch_suro()
fetch_weather_station_data(st: Union[str, Timestamp] = '20010101 01:00:00', en: Union[str, Timestamp] = '20200101 00:00:00', freq: str = 'H') DataFrame[source]

fetches hourly weather [1] station data which consits of air temperature, humidity, wind speed and solar radiation.

Parameters:
  • st – start of data to be feteched.

  • en – end of data to be fetched.

  • freq – frequency at which the data is to be fetched.

Return type:

a pandas dataframe consisting of 4 columns

inputs = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad', 'water_level', 'pcp', 'susp_pm', 'Ecoli_source']
make_classification(input_features: Union[None, list] = None, output_features: Optional[Union[str, list]] = None, st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', threshold: Union[int, dict] = 400, lookback_steps: Optional[int] = None) DataFrame[source]

Returns data for a classification problem.

Parameters:
  • input_features – names of inputs to use.

  • output_features – feature/features to consdier as target/output/label

  • st – starting date of data. The default starting date is 20110525

  • en – end date of data

  • freq – frequency of data

  • threshold – threshold to use to determine classes. Values greater than equal to threshold are set to 1 while values smaller than threshold are set to 0. The value of 400 is chosen for E. coli to make the the number 0s and 1s balanced. It should be noted that US-EPA recommends threshold value of 400 cfu/ml.

  • lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

Returns:

a dataframe of shape (inputs+target, st:en)

Return type:

pd.DataFrame

Example

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> df = laos.make_classification()
make_regression(input_features: Union[None, list] = None, output_features: Union[str, list] = 'Ecoli_mpn100', st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', lookback_steps: Optional[int] = None, replace_zeros_in_target: bool = True) DataFrame[source]

Returns data for a regression problem using hydrological, environmental, and water quality data of Huoay pano.

Parameters:
  • input_features

    names of inputs to use. By default following features are used as input

    • air_temp

    • rel_hum

    • wind_speed

    • sol_rad

    • water_level

    • pcp

    • susp_pm

    • Ecoli_source

  • output_features (feature/features to consdier as target/output/label) –

  • st – starting date of data

  • en – end date of data

  • freq (frequency of data) –

  • lookback_steps (int, default=None) – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.

  • replace_zeros_in_target (bool, default=True) – Replace the zeroes in target column with 1s.

Returns:

a dataframe of shape (inputs+target, st - en)

Return type:

pd.DataFrame

Example

>>> from ai4water.datasets import MtropicsLaos
>>> laos = MtropicsLaos()
>>> ins = ['pcp', 'air_temp']
>>> out = ['Ecoli_mpn100']
>>> reg_data = laos.make_regression(ins, out, '20110101', '20181231')

todo add HRU definition

physio_chem_features = {'DO_mgl': 'DO', 'DO_percent': 'DOpercent', 'EC_s/cm': 'EC', 'ORP_mV': 'ORP', 'TSS_gL': 'TSS', 'T_deg': 'T', 'Turbidity_NTU': 'Turbidity', 'pH': 'pH'}
surface_features(st: Union[str, int, Timestamp] = '2000-10-14', en: Union[str, int, Timestamp] = '2016-11-12') DataFrame[source]

soil surface features data

target = ['Ecoli_mpn100']
url = {'ecoli_data.csv': 'https://dataverse.ird.fr/api/access/datafile/5435', 'ecoli_dict.csv': 'https://dataverse.ird.fr/api/access/datafile/5436', 'ecoli_source.csv': 'https://dataverse.ird.fr/api/access/datafile/37737', 'ecoli_source_readme.txt': 'https://dataverse.ird.fr/api/access/datafile/37736', 'ecoli_suro_gw.csv': 'https://dataverse.ird.fr/api/access/datafile/37735', 'ecoli_suro_gw_readme.txt': 'https://dataverse.ird.fr/api/access/datafile/37734', 'hydro.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=389bbea0-7279-12c1-63d0-cfc4a77ded87', 'lu.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=0f1aea48-2a51-9b42-7688-a774a8f75e7a', 'pcp.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=3c870a03-324b-140d-7d98-d3585a63e6ec', 'rain_guage.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=7bc45591-5b9f-a13d-90dc-f2a75b0a15cc', 'soilmap.zip': 'https://dataverse.ird.fr/api/access/datafile/5430', 'subs1.zip': 'https://dataverse.ird.fr/api/access/datafile/5432', 'surf_feat.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=72d9e532-8910-48d2-b9a2-6c8b0241825b', 'suro.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=f06cb605-7e59-4ba4-8faf-1beee35d2162', 'weather_station.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=353d7f00-8d6a-2a34-c0a2-5903c64e800b'}
weather_station_data = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad']