Laos
Ecoli Mekong River
E. coli data from Mekong river (Houay Pano) area from 2011 to 2021 Boithias et al., 2022 [1].
- param st:
starting time. The default starting point is 2011-05-25 10:00:00
- type st:
optional
- param en:
end time, The default end point is 2021-05-25 15:41:00
- type en:
optional
- param features:
names of features to use. use
all
to get all features. By default following input features are selectedstation_name
name of station/catchment where the observation was madeT
temperatureEC
electrical conductanceDOpercent
dissolved oxygen concentrationDO
dissolved oxygen saturationpH
pHORP
oxidation-reduction potentialTurbidity
turbidityTSS
total suspended sediment concentrationE-coli_4dilutions
Eschrechia coli concentration
- type features:
str, optional
- param overwrite:
whether to overwrite the downloaded file or not
- type overwrite:
bool
- returns:
with default parameters, the shape is (1602, 10)
- rtype:
pd.DataFrame
Examples
>>> from ai4water.datasets import ecoli_mekong
>>> ecoli_data = ecoli_mekong()
>>> ecoli_data.shape
(1602, 10)
Ecoli Mekong River (Laos)
coli data from Mekong river (Northern Laos).
- param st:
starting time
- type st:
Union[str, pandas._libs.tslibs.timestamps.Timestamp, int]
- param en:
end time
- type en:
Union[str, pandas._libs.tslibs.timestamps.Timestamp, int]
- param station_name:
- type station_name:
str
- param features:
- type features:
str, optional
- param overwrite:
whether to overwrite or not
- type overwrite:
bool
- returns:
with default parameters, the shape is (1131, 10)
- rtype:
pd.DataFrame
Examples
>>> from ai4water.datasets import ecoli_mekong_laos
>>> ecoli = ecoli_mekong_laos()
>>> ecoli.shape
(1131, 10)
Ecoli Houay Pano (Laos)
coli data from Mekong river (Houay Pano) area.
- param st:
starting time. The default starting point is 2011-05-25 10:00:00
- type st:
optional
- param en:
end time, The default end point is 2021-05-25 15:41:00
- type en:
optional
- param features:
names of features to use. use
all
to get all features. By default following input features are selectedstation_name
name of station/catchment where the observation was madeT
temperatureEC
electrical conductanceDOpercent
dissolved oxygen concentrationDO
dissolved oxygen saturationpH
pHORP
oxidation-reduction potentialTurbidity
turbidityTSS
total suspended sediment concentrationE-coli_4dilutions
Eschrechia coli concentration- type features:
str, optional
- param overwrite:
whether to overwrite the downloaded file or not
- type overwrite:
bool
- returns:
with default parameters, the shape is (413, 10)
- rtype:
pd.DataFrame
Examples
>>> from ai4water.datasets import ecoli_houay_pano
>>> ecoli = ecoli_houay_pano()
>>> ecoli.shape
(413, 10)
Ecoli data from Mekong river (2016)
- ai4water.datasets.mtropics.ecoli_mekong_2016(st: Union[str, Timestamp, int] = '20160101', en: Union[str, Timestamp, int] = '20161231', features: Optional[Union[str, list]] = None, overwrite=False) DataFrame [source]
coli data from Mekong river from 2016 from 29 catchments
- Parameters:
- Returns:
with default parameters, the shape is (58, 10)
- Return type:
pd.DataFrame
Examples
>>> from ai4water.datasets import ecoli_mekong_2016 >>> ecoli = ecoli_mekong_2016() >>> ecoli.shape (58, 10)
MtropicsLaos
- class ai4water.datasets.mtropics.MtropicsLaos(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]
Bases:
Datasets
Downloads and prepares hydrological, climate and land use data for Laos from Mtropics website and ird data servers.
- - fetch_lu
- - fetch_ecoli
- - fetch_rain_gauges
- - fetch_weather_station_data
- - fetch_pcp
- - fetch_hydro
- - make_regression
- __init__(path=None, save_as_nc: bool = True, convert_to_csv: bool = False, **kwargs)[source]
- Parameters:
name – str (default=None) name of dataset
units – str, (default=None) the unit system being used
path – str (default=None) path where the data is available (manually downloaded). If None, it will be downloaded
- fetch_ecoli(features: Union[list, str] = 'Ecoli_mpn100', st: Union[str, Timestamp] = '20110525 10:00:00', en: Union[str, Timestamp] = '20210406 15:05:00', remove_duplicates: bool = True) DataFrame [source]
Fetches E. coli data collected at the outlet. See Ribolzi et al., 2021 and Boithias et al., 2021 for reference. NaNs represent missing values. The data is randomly sampled between 2011 to 2021 during rainfall events. Total 368 E. coli observation points are available now.
- Parameters:
st – start of data. By default the data is fetched from the point it is available.
en – end of data. By default the data is fetched til the point it is available.
features –
coli concentration data. Following data are available
Ecoli_LL_mpn100: Lower limit of the confidence interval
Ecoli_mpn100: Stream water Escherichia coli concentration
Ecoli_UL_mpn100: Upper limit of the confidence interval
remove_duplicates – whether to remove duplicates or not. This is because some values were recorded within a minute,
- Return type:
a pandas dataframe consisting of features as columns.
- fetch_hydro(st: Union[str, Timestamp] = '20010101 00:06:00', en: Union[str, Timestamp] = '20200101 00:06:00') Tuple[DataFrame, DataFrame] [source]
fetches water level (cm) and suspended particulate matter (g L-1). Both data are from 2001 to 2019 but are randomly sampled.
- Parameters:
st (optional) – starting point of data to be fetched.
en (optional) – end point of data to be fetched.
- Returns:
a tuple of pandas dataframes of water level and suspended particulate
matter.
- fetch_pcp(st: Union[str, Timestamp] = '20010101 00:06:00', en: Union[str, Timestamp] = '20200101 00:06:00', freq: str = '6min') DataFrame [source]
Fetches the precipitation data which is collected at 6 minutes time-step from 2001 to 2020.
- Parameters:
st – starting point of data to be fetched.
en – end point of data to be fetched.
freq – frequency at which the data is to be returned.
- Return type:
pandas dataframe of precipitation data
- fetch_physiochem(features: Union[list, str] = 'all', st: Union[str, Timestamp] = '20110525 10:00:00', en: Union[str, Timestamp] = '20210406 15:05:00') DataFrame [source]
Fetches physio-chemical features of Huoy Pano catchment Laos.
- Parameters:
st – start of data.
en – end of data.
features –
The physio-chemical features to fetch. Following features are available
T
EC
DOpercent
DO
pH
ORP
Turbidity
TSS
- Return type:
a pandas dataframe
- fetch_rain_gauges(st: Union[str, Timestamp] = '20010101', en: Union[str, Timestamp] = '20191231') DataFrame [source]
fetches data from 7 rain gauges which is collected at daily time step from 2001 to 2019.
- Parameters:
st – start of data. By default the data is fetched from the point it is available.
en – end of data. By default the data is fetched til the point it is available.
- Returns:
a dataframe of 7 columns, where each column represnets a rain guage
observations. The length of dataframe depends upon range defined by
st and en arguments.
Examples
>>> from ai4water.datasets import MtropicsLaos >>> laos = MtropicsLaos() >>> rg = laos.fetch_rain_gauges()
- fetch_source() DataFrame [source]
returns monthly source data for E. coli at from 2001 to 2021 obtained from here
- Return type:
pd.DataFrame of shape (252, 19)
- fetch_suro() DataFrame [source]
- returns surface runoff and soil detachment data from Houay pano,
Laos PDR.
- Returns:
a dataframe of shape (293, 13)
- Return type:
pd.DataFrame
Examples
>>> from ai4water.datasets import MtropicsLaos >>> laos = MtropicsLaos() >>> suro = laos.fetch_suro()
- fetch_weather_station_data(st: Union[str, Timestamp] = '20010101 01:00:00', en: Union[str, Timestamp] = '20200101 00:00:00', freq: str = 'H') DataFrame [source]
fetches hourly weather [1] station data which consits of air temperature, humidity, wind speed and solar radiation.
- Parameters:
st – start of data to be feteched.
en – end of data to be fetched.
freq – frequency at which the data is to be fetched.
- Return type:
a pandas dataframe consisting of 4 columns
- inputs = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad', 'water_level', 'pcp', 'susp_pm', 'Ecoli_source']
- make_classification(input_features: Union[None, list] = None, output_features: Optional[Union[str, list]] = None, st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', threshold: Union[int, dict] = 400, lookback_steps: Optional[int] = None) DataFrame [source]
Returns data for a classification problem.
- Parameters:
input_features – names of inputs to use.
output_features – feature/features to consdier as target/output/label
st – starting date of data. The default starting date is 20110525
en – end date of data
freq – frequency of data
threshold – threshold to use to determine classes. Values greater than equal to threshold are set to 1 while values smaller than threshold are set to 0. The value of 400 is chosen for E. coli to make the the number 0s and 1s balanced. It should be noted that US-EPA recommends threshold value of 400 cfu/ml.
lookback_steps – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.
- Returns:
a dataframe of shape (inputs+target, st:en)
- Return type:
pd.DataFrame
Example
>>> from ai4water.datasets import MtropicsLaos >>> laos = MtropicsLaos() >>> df = laos.make_classification()
- make_regression(input_features: Union[None, list] = None, output_features: Union[str, list] = 'Ecoli_mpn100', st: Union[None, str] = '20110525 14:00:00', en: Union[None, str] = '20181027 00:00:00', freq: str = '6min', lookback_steps: Optional[int] = None, replace_zeros_in_target: bool = True) DataFrame [source]
Returns data for a regression problem using hydrological, environmental, and water quality data of Huoay pano.
- Parameters:
input_features –
names of inputs to use. By default following features are used as input
air_temp
rel_hum
wind_speed
sol_rad
water_level
pcp
susp_pm
Ecoli_source
output_features (feature/features to consdier as target/output/label) –
st – starting date of data
en – end date of data
freq (frequency of data) –
lookback_steps (int, default=None) – the number of previous steps to use. If this argument is used, the resultant dataframe will have (ecoli_observations * lookback_steps) rows. The resulting index will not be continuous.
replace_zeros_in_target (bool, default=True) – Replace the zeroes in target column with 1s.
- Returns:
a dataframe of shape (inputs+target, st - en)
- Return type:
pd.DataFrame
Example
>>> from ai4water.datasets import MtropicsLaos >>> laos = MtropicsLaos() >>> ins = ['pcp', 'air_temp'] >>> out = ['Ecoli_mpn100'] >>> reg_data = laos.make_regression(ins, out, '20110101', '20181231')
todo add HRU definition
- physio_chem_features = {'DO_mgl': 'DO', 'DO_percent': 'DOpercent', 'EC_s/cm': 'EC', 'ORP_mV': 'ORP', 'TSS_gL': 'TSS', 'T_deg': 'T', 'Turbidity_NTU': 'Turbidity', 'pH': 'pH'}
- surface_features(st: Union[str, int, Timestamp] = '2000-10-14', en: Union[str, int, Timestamp] = '2016-11-12') DataFrame [source]
soil surface features data
- target = ['Ecoli_mpn100']
- url = {'ecoli_data.csv': 'https://dataverse.ird.fr/api/access/datafile/5435', 'ecoli_dict.csv': 'https://dataverse.ird.fr/api/access/datafile/5436', 'ecoli_source.csv': 'https://dataverse.ird.fr/api/access/datafile/37737', 'ecoli_source_readme.txt': 'https://dataverse.ird.fr/api/access/datafile/37736', 'ecoli_suro_gw.csv': 'https://dataverse.ird.fr/api/access/datafile/37735', 'ecoli_suro_gw_readme.txt': 'https://dataverse.ird.fr/api/access/datafile/37734', 'hydro.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=389bbea0-7279-12c1-63d0-cfc4a77ded87', 'lu.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=0f1aea48-2a51-9b42-7688-a774a8f75e7a', 'pcp.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=3c870a03-324b-140d-7d98-d3585a63e6ec', 'rain_guage.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=7bc45591-5b9f-a13d-90dc-f2a75b0a15cc', 'soilmap.zip': 'https://dataverse.ird.fr/api/access/datafile/5430', 'subs1.zip': 'https://dataverse.ird.fr/api/access/datafile/5432', 'surf_feat.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=72d9e532-8910-48d2-b9a2-6c8b0241825b', 'suro.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=f06cb605-7e59-4ba4-8faf-1beee35d2162', 'weather_station.zip': 'https://services.sedoo.fr/mtropics/data/v1_0/download?collectionId=353d7f00-8d6a-2a34-c0a2-5903c64e800b'}
- weather_station_data = ['air_temp', 'rel_hum', 'wind_speed', 'sol_rad']