Experiments

The purpose of this module is to compare more than one models. Furthermore, this module can also optimize the hyper-parameters of these models and compare them. The Experiments class provides the basic building block for conducting experiments. The MLRegressionExperiments and MLClassificationExperiments compare several classical machine learning regression and classification models respectively. The DLRegressionExperiments class compares some common basic deep learning algorithms for a given data.

Experiments

class ai4water.experiments.Experiments(cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, monitor: Optional[Union[str, list, Callable]] = None, show: bool = True, save: bool = True, **model_kws)[source]

Bases: object

Base class for all the experiments.

All the experiments must be subclasses of this class. The core idea of Experiments is based upon model. An experiment consists of one or more models. The models differ from each other in their structure/idea/concept/configuration. When ai4water.experiments.Experiments.fit() is called, each model is built and trained. The user can customize, building and training process by subclassing this class and customizing ai4water.experiments.Experiments._build() and ai4water.experiments.Experiments._fit() methods.

- metrics

- exp_path

- model_

- models

- fit

- taylor_plot

- loss_comparison

- plot_convergence

- from_config

- compare_errors

- plot_improvement

- compare_convergence

- plot_cv_scores

- fit_with_tpot

__init__(cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, monitor: Optional[Union[str, list, Callable]] = None, show: bool = True, save: bool = True, **model_kws)[source]

Parameters:

cases – python dictionary defining different cases/scenarios. See TransformationExperiments for use case.
exp_name – name of experiment, used to define path in which results are saved
num_samples – only relevent when you wan to optimize hyperparameters of models using grid method
verbosity (bool, optional) – determines the amount of information
monitor (str, list, optional) –
list of performance metrics to monitor. It can be any performance metric SeqMetrics library. By default r2, corr_coeff, mse, rmse, r2_score, nse, kge, mape, pbias, bias, mae, nrmse mase are considered for regression and accuracy, precision recall are considered for classification. The user can also put a custom metric to monitor. In such a case we it should be callable which accepts two input arguments. The first one is array of true and second is array of predicted values.
```
>>> def f1_score(t,p)->float:
>>>     return ClassificationMetrics(t, p).f1_score(average="macro")
>>> monitor = [f1_score, "accuracy"]
```
Here f1_score is a function which accepts two arays.
**model_kws –
Model (keyword arguments which are to be passed to) – and are not optimized.

compare_convergence(name: str = 'convergence_comparison', **kwargs) → Optional[Axes][source]

Plots and compares the convergence plots of hyperparameter optimization runs. Only valid if run_type=optimize during ai4water.experiments.Experiments.fit() call.

Parameters:

name (str) – name of file to save the plot
kwargs – keyword arguments to plot function

Returns:

if the optimized models are >1 then it returns the maplotlib axes
on which the figure is drawn otherwise it returns None.

Examples

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> experiment = MLRegressionExperiments()
>>> experiment.fit(data=busan_beach(), run_type="optimize", num_iterations=30)
>>> experiment.compare_convergence()

compare_edf_plots(x=None, y=None, data=None, exclude: Optional[Union[str, list]] = None, figsize=None, fname: Optional[str] = 'edf', **kwargs)[source]

compare EDF plots of all the models which have been fitted. This plot is only available for regression problems.

Parameters:

x – input data
y – target data
data – raw unprocessed data from which x,y pairs of the test data are drawn
exclude (list) – name of models to exclude from plotting
figsize – figure size as (width, height)
fname (str, optional) – name of the file to save plot
**kwargs – any keword arguments for py:meth:ai4water.utils.utils.edf_plot

Returns:

matplotlib

Return type:

plt.Figure

Example

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> dataset = busan_beach()
>>> inputs = list(dataset.columns)[0:-1]
>>> outputs = list(dataset.columns)[-1]
>>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs)
>>> experiment.fit(data=dataset, include="LMs")
>>> experiment.compare_edf_plots(data=dataset, exclude="SGDRegressor")

compare_errors(matric_name: str, x=None, y=None, data=None, cutoff_val: Optional[float] = None, cutoff_type: Optional[str] = None, sort_by: str = 'test', ignore_nans: bool = True, colors=None, cmaps=None, figsize: Optional[tuple] = None, **kwargs) → DataFrame[source]

Plots a specific performance matric for all the models which were run during ai4water.experiments.Experiments.fit() call.

Parameters:

matric_name (str) – performance matric whose value to plot for all the models
x – input data, if not given, then data must be given.
y – target data
data – raw unprocessed data from which x,y pairs can be drawn. This data will be passed to ai4water.preprocessing.DataSet() class and ai4water.preprocessing.DataSet.test_data() method will be used to draw x,y pairs.
cutoff_val (float) – if provided, only those models will be plotted for whome the matric is greater/smaller than this value. This works in conjuction with cutoff_type.
cutoff_type (str) – one of greater, greater_equal, less or less_equal. Criteria to determine cutoff_val. For example if we want to show only those models whose $R^2$ is > 0.5, it will be ‘max’.
sort_by – either test or train. How to sort the results for plotting. If ‘test’, then test performance matrics will be sorted otherwise train performance matrics will be sorted.
ignore_nans – default True, if True, then performance matrics with nans are ignored otherwise nans/empty bars will be shown to depict which models have resulted in nans for the given performance matric.
colors – color for bar chart. To assign separate colors for both bar charts, provide a list of two.
cmaps – color map for bar chart. To assign separate cmap for both bar charts, provide a list of two.
figsize (tuple) – figure size as (width, height)
**kwargs – any keyword argument that goes to easy_mpl.bar_chart

Returns:

pandas dataframe whose index is models and has two columns with name ‘train’ and ‘test’ These columns contain performance metrics for the models..

Return type:

pd.DataFrame

Example

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> inputs = list(data.columns)[0:-1]
>>> outputs = list(data.columns)[-1]
>>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs)
>>> experiment.fit(data=data)
>>> experiment.compare_errors('mse', data=data)
>>> experiment.compare_errors('r2', data=data, cutoff_val=0.2, cutoff_type='greater')

compare_precision_recall_curves(x, y, figsize: Optional[tuple] = None, **kwargs)[source]

compares precision recall curves of the all the models.

Parameters:

x – input data
y – labels for the input data
figsize (tuple) – figure size
**kwargs – any keyword arguments for :obj:matplotlib.plot function

Returns:

matplotlib axes on which figure is drawn

Return type:

plt.Axes

Example

>>> from ai4water.datasets import MtropicsLaos
>>> from ai4water.experiments import MLClassificationExperiments
>>> data = MtropicsLaos().make_classification(lookback_steps=1)
# define inputs and outputs
>>> inputs = data.columns.tolist()[0:-1]
>>> outputs = data.columns.tolist()[-1:]
# initiate the experiment
>>> exp = MLClassificationExperiments(
...     input_features=inputs,
...     output_features=outputs)
# run the experiment
>>> exp.fit(data=data, include=["model_LGBMClassifier",
...                             "model_XGBClassifier",
...                             "RandomForestClassifier"])
... # Compare Precision Recall curves
>>> exp.compare_precision_recall_curves(data[inputs].values, data[outputs].values)

compare_regression_plots(x=None, y=None, data=None, include: Union[None, list] = None, figsize: Optional[tuple] = None, fname: Optional[str] = 'regression', **kwargs) → Figure[source]

compare regression plots of all the models which have been fitted. This plot is only available for regression problems.

Parameters:

x – input data
y – target data
data – raw unprocessed data from which x,y pairs of the test data are drawn
include (str, list, optional) – if not None, must be a list of models which will be included. None will result in plotting all the models.
figsize – figure size as (width, length)
fname (str, optional) – name of the file to save the plot
**kwargs – any keyword arguments for obj:easy_mpl.reg_plot

Returns:

matplotlib

Return type:

plt.Figure

Example

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> dataset = busan_beach()
>>> inputs = list(dataset.columns)[0:-1]
>>> outputs = list(dataset.columns)[-1]
>>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs)
>>> experiment.fit(data=dataset)
>>> experiment.compare_regression_plots(data=dataset)

compare_residual_plots(x=None, y=None, data=None, include: Union[None, list] = None, figsize: Optional[tuple] = None, fname: Optional[str] = 'residual') → Figure[source]

compare residual plots of all the models which have been fitted. This plot is only available for regression problems.

Parameters:

x – input data
y – target data
data – raw unprocessed data frmm which test x,y pairs are drawn using ai4water.preprocessing.DataSet(). class. Only valid if x and y are not given.
include (str, list, optional) – if not None, must be a list of models which will be included. None will result in plotting all the models.
figsize (tuple) – figure size as (width, height)
fname (str, optional) – name of file to save the plot

Returns:

matplotlib

Return type:

plt.Figure

Example

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> dataset = busan_beach()
>>> inputs = list(dataset.columns)[0:-1]
>>> outputs = list(dataset.columns)[-1]
>>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs)
>>> experiment.fit(data=dataset)
>>> experiment.compare_residual_plots(data=dataset)

compare_roc_curves(x, y, figsize: Optional[tuple] = None, **kwargs)[source]

compares roc curves of the all the models.

Parameters:

x – input data
y – labels for the input data
figsize (tuple) – figure size
**kwargs – any keyword arguments for :obj:matplotlib.plot function

Returns:

matplotlib axes on which figure is drawn

Return type:

plt.Axes

Example

>>> from ai4water.datasets import MtropicsLaos
>>> from ai4water.experiments import MLClassificationExperiments
>>> data = MtropicsLaos().make_classification(lookback_steps=1)
# define inputs and outputs
>>> inputs = data.columns.tolist()[0:-1]
>>> outputs = data.columns.tolist()[-1:]
# initiate the experiment
>>> exp = MLClassificationExperiments(
...     input_features=inputs,
...     output_features=outputs)
# run the experiment
>>> exp.fit(data=data, include=["model_LGBMClassifier",
...                             "model_XGBClassifier",
...                             "RandomForestClassifier"])
... # Compare ROC curves
>>> exp.compare_roc_curves(data[inputs].values, data[outputs].values)

eval_best(x, y, model_type: str, opt_dir: str)[source]: Evaluate the best models.

fit(x=None, y=None, data=None, validation_data: Optional[tuple] = None, run_type: str = 'dry_run', opt_method: str = 'bayes', num_iterations: int = 12, include: Union[None, str, list] = None, exclude: Union[None, list, str] = '', cross_validate: bool = False, post_optimize: str = 'eval_best', **hpo_kws)[source]

Runs the fit loop for all the models of experiment. The user can however, specify the models by making use of include and exclude keywords.

The data should be defined according to following four rules either

only x,y should be given (val will be taken from it according to splitting schemes)

or x,y and validation_data should be given

or only data should be given (train and validation data will be taken accoring to splitting schemes)

Parameters:

x – input data. When run_type is dry_run, then the each model is trained on this data. If run_type is optimize, validation_data is not given, then x,y pairs of validation data are extracted from this data based upon splitting scheme i.e. val_fraction argument.
y – label/true/observed data
data – Raw unprepared data from which x,y pairs for training and validation will be extracted. this will be passed to ai4water.Model.fit(). This is is only required if x and y are not given
validation_data – a tuple which consists of x,y pairs for validation data. This can only be given if x and y are given and data is not given.
run_type (str, optional (default="dry_run")) – One of dry_run or optimize. If dry_run, then all the models will be trained only once. if optimize, then hyperparameters of all the models will be optimized.
opt_method (str, optional (default="bayes")) – which optimization method to use. options are bayes, random, grid. Only valid if run_type is optimize
num_iterations (int, optional) – number of iterations for optimization. Only valid if run_type is optimize.
include (list/str optional (default="DTs")) – name of models to included. If None, all the models found will be trained and or optimized. Default is “DTs”, which means all decision tree based models will be used.
exclude – name of models to be excluded
cross_validate (bool, optional (default=False)) – whether to cross validate the model or not. This depends upon cross_validator agrument to the Model.
post_optimize (str, optional) – one of eval_best or train_best. If eval_best, the weights from the best models will be uploaded again and the model will be evaluated on train, test and all the data. If train_best, then a new model will be built and trained using the parameters of the best model.
**hpo_kws – keyword arguments for ai4water.hyperopt.HyperOpt class.

Examples

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> exp = MLRegressionExperiments()
>>> exp.fit(data=busan_beach())

If you want to compare only RandomForest, XGBRegressor, CatBoostRegressor and LGBMRegressor, use the include keyword

>>> exp.fit(data=busan_beach(), include=['RandomForestRegressor', 'XGBRegressor',
>>>    'CatBoostRegressor', 'LGBMRegressor'])

Similarly, if you want to exclude certain models from comparison, you can use exclude keyword

>>> exp.fit(data=busan_beach(), exclude=["SGDRegressor"])

if you want to perform cross validation for each model, we must give the cross_validator argument which will be passed to ai4water Model

>>> exp = MLRegressionExperiments(cross_validator={"KFold": {"n_splits": 10}})
>>> exp.fit(data=busan_beach(), cross_validate=True)

Setting cross_validate to True will populate cv_scores_ dictionary which can be accessed as exp.cv_scores_

if you want to optimize the hyperparameters of each model,

>>> exp.fit(data=busan_beach(), run_type="optimize", num_iterations=20)

fit_with_tpot(data, models: Optional[Union[int, List[str], dict, str]] = None, selection_criteria: str = 'mse', scoring: Optional[str] = None, **tpot_args)[source]

Fits the tpot’s fit method which finds out the best pipline for the given data.

Parameters:

data –
models –
It can be of three types.
- If list, it will be the names of machine learning models/
  algorithms to consider.
- If integer, it will be the number of top
  algorithms to consider for tpot. In such a case, you must have first run .fit method before running this method. If you run the tpot using all available models, it will take hours to days for medium sized data (consisting of few thousand examples). However, if you run first .fit and see for example what are the top 5 models, then you can set this argument to 5. In such a case, tpot will search pipeline using only the top 5 algorithms/models that have been found using .fit method.
- if dictionary, then the keys should be the names of algorithms/models
  and values shoudl be the parameters for each model/algorithm to be optimized.
- You can also set it to all consider all models available in
  ai4water’s Experiment module.
- default is None, which means, the tpot_config argument will be None
selection_criteria – The name of performance metric. If models is integer, then according to this performance metric the models will be choosen. By default the models will be selected based upon their mse values on test data.
scoring (the performance metric to use for finding the pipeline.) –
tpot_args – any keyword argument for tpot’s Regressor or Classifier class. This can include arguments like generations, population_size etc.

Return type:

the tpot object

Example

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> exp = MLRegressionExperiments(exp_name=f"tpot_reg_{dateandtime_now()}")
>>> exp.fit(data=busan_beach())
>>> tpot_regr = exp.fit_with_tpot(busan_beach(), 2, generations=1, population_size=2)

classmethod from_config(config_path: str, **kwargs) → Experiments[source]

Loads the experiment from the config file.

Parameters:

config_path – complete path of experiment
kwargs – keyword arguments to experiment

Returns:

an instance of Experiments class

loss_comparison(loss_name: str = 'loss', include: Optional[list] = None, figsize: Optional[int] = None, start: int = 0, end: Optional[int] = None, **kwargs) → Axes[source]

Plots the loss curves of the evaluated models. This method is only available if the models which are being compared are deep leanring mdoels.

Parameters:

loss_name (str, optional) – the name of loss value, must be recorded during training
include – name of models to include
figsize (tuple) – size of the figure
start (int) –
end (int) –
**kwargs –
any other keyword arguments to be passed to the plot

Return type:

matplotlib axes

Example

>>> from ai4water.experiments import DLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> exp = DLRegressionExperiments(
>>> input_features = data.columns.tolist()[0:-1],
>>> output_features = data.columns.tolist()[-1:],
>>> epochs=300,
>>> train_fraction=1.0,
>>> y_transformation="log",
>>> x_transformation="minmax",
>>> )

>>> exp.fit(data=data)
>>> exp.loss_comparison()

you may wish to plot on log scale

>>> exp.loss_comparison(ax_kws={'logy':True})

plot_cv_scores(name: str = 'cv_scores', exclude: Optional[Union[str, list]] = None, include: Optional[Union[str, list]] = None, **kwargs) → Optional[Axes][source]

Plots the box whisker plots of the cross validation scores.

This plot is only available if cross_validation was set to True during ai4water.experiments.Experiments.fit().

Parameters:

name (str) – name of the file to save the plot
include (str/list) – models to include
exclude (models to exclude) –
**kwargs (any of the following keyword arguments) –
- notch
- vert
- figsize
- bbox_inches

Return type:

matplotlib axes if the figure is drawn otherwise None

Example

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> exp = MLRegressionExperiments(cross_validator={"KFold": {"n_splits": 10}})
>>> exp.fit(data=busan_beach(), cross_validate=True)
>>> exp.plot_cv_scores()

plot_improvement(metric_name: str, plot_type: str = 'dumbbell', lower_limit: Union[int, float] = -1.0, upper_limit: Optional[Union[int, float]] = None, name: str = '', **kwargs) → DataFrame[source]

Shows how much improvement was observed after hyperparameter optimization. This plot is only available if run_type was set to optimize in ai4water.experiments.Experiments.fit().

Parameters:

metric_name – the peformance metric for comparison
plot_type (str, optional) – the kind of plot to draw. Either dumbbell or bar
lower_limit (float/int, optional (default=-1.0)) – clip the values below this value. Set this value to None to avoid clipping.
upper_limit (float/int, optional (default=None)) – clip the values above this value
name (str, optional) – name of file to save the figure
**kwargs –
any additional keyword arguments for dumbell plot

or bar_chart

Return type:

pd.DataFrame

Examples

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> experiment = MLRegressionExperiments()
>>> experiment.fit(data=busan_beach(), run_type="optimize", num_iterations=30)
>>> experiment.plot_improvement('r2')
...
>>>  # or draw dumbbell plot
...
>>> experiment.plot_improvement('r2', plot_type='bar')

sort_models_by_metric(metric_name, cutoff_val=None, cutoff_type=None, ignore_nans: bool = True, sort_by='test') → DataFrame[source]: returns the models sorted according to their performance

taylor_plot(x=None, y=None, data=None, include: Union[None, list] = None, exclude: Union[None, list] = None, figsize: tuple = (5, 8), **kwargs) → Figure[source]

Compares the models using taylor_plot.

Parameters:

x – input data, if not given, then data must be given.
y – target data
data –
raw unprocessed data from which x,y pairs can be drawn. This data will be passed to DataSet class and ai4water.preprocessing.DataSet.test_data()

method will be used to draw x,y pairs.
include (str, list, optional) – if not None, must be a list of models which will be included. None will result in plotting all the models.
exclude (str, list, optional) – if not None, must be a list of models which will excluded. None will result in no exclusion
figsize (tuple, optional) – figure size as (width,height)
**kwargs – all the keyword arguments for taylor_plot function.

Return type:

plt.Figure

Example

>>> from ai4water.experiments import MLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> inputs = list(data.columns)[0:-1]
>>> outputs = list(data.columns)[-1]
>>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs)
>>> experiment.fit(data=data)
>>> experiment.taylor_plot(data=data)

train_best(x, y, model_type)[source]: Finds the best model, builts it, fits it and makes predictions from it.

update_model_weight(model: Model, config_path: str)[source]: updates the weight of model. If no saved weight is found, a warning is raised.

verify_data(x=None, y=None, data=None, validation_data: Optional[tuple] = None, test_data: Optional[tuple] = None) → tuple[source]

verifies that either

only x,y should be given (val will be taken from it according to splitting schemes)
or x,y and validation_data should be given (means no test data)
or x, y and validation_data and test_data are given
or only data should be given (train, validation and test data will be
taken accoring to splitting schemes)

RegressionExperiments

class ai4water.experiments.MLRegressionExperiments(param_space=None, x0=None, cases=None, exp_name='MLRegressionExperiments', num_samples=5, verbosity=1, **model_kws)[source]

Bases: Experiments

Compares peformance of 40+ machine learning models for a regression problem. The experiment consists of models which are run using fit() method. A model is one experiment.

The user can define new models by subclassing this class. In fact any new method in the sub-class which starts with model_ wll be considered as a new model. Otherwise the user has to overwite the attribute models to redefine, which methods (of class) are to be used as models and which should not. The method which is a model must only return key word arguments which will be streamed to the Model using build_and_run method. Inside this new method the user must define, which parameters to optimize, their param_space for optimization and the initial values to use for optimization.

__init__(param_space=None, x0=None, cases=None, exp_name='MLRegressionExperiments', num_samples=5, verbosity=1, **model_kws)[source]

Initializes the class

Parameters:

param_space – dimensions of parameters which are to be optimized. These can be overwritten in models.
list (x0) – initial values of the parameters which are to be optimized. These can be overwritten in models
str (exp_name) – name of experiment, all results will be saved within this folder
dict (model_kws) – keyword arguments which are to be passed to Model and are not optimized.

Examples

>>> from ai4water.datasets import busan_beach
>>> from ai4water.experiments import MLRegressionExperiments
>>> # first compare the performance of all available models without optimizing their parameters
>>> data = busan_beach()  # read data file, in this case load the default data
>>> inputs = list(data.columns)[0:-1]  # define input and output columns in data
>>> outputs = list(data.columns)[-1]
>>> comparisons = MLRegressionExperiments(
...       input_features=inputs, output_features=outputs,
...       nan_filler= {'method': 'KNNImputer', 'features': inputs} )
>>> comparisons.fit(data=data,run_type="dry_run")
>>> comparisons.compare_errors('r2', data=data)
>>> # find out the models which resulted in r2> 0.5
>>> best_models = comparisons.compare_errors('r2', cutoff_type='greater',
...                                                cutoff_val=0.3, data=data)
>>> # now build a new experiment for best models and otpimize them
>>> comparisons = MLRegressionExperiments(
...     input_features=inputs, output_features=outputs,
...     nan_filler= {'method': 'KNNImputer', 'features': inputs},
...     exp_name="BestMLModels")
>>> comparisons.fit(data=data, run_type="optimize", include=best_models.index)
>>> comparisons.compare_errors('r2', data=data)
>>> comparisons.taylor_plot()  # see help(comparisons.taylor_plot()) to tweak the taylor plot

class ai4water.experiments.DLRegressionExperiments(input_features: list, param_space=None, x0=None, cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, **model_kws)[source]

Bases: Experiments

A framework for comparing several basic DL architectures for a given data. This class can also be used for hyperparameter optimization of more than one DL models/architectures. However, the parameters which determine the dimensions of input data such as lookback should are not allowed to optimize when using random or grid search.

To check the available models

>>> exp = DLRegressionExperiments(...)
>>> exp.models

If learning rate, batch size, and lookback are are to be optimzied, their space can be specified in the following way:

>>> exp = DLRegressionExperiments(...)
>>> exp.lookback_space = [Integer(1, 100, name='lookback')]

Example

>>> from ai4water.experiments import DLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> exp = DLRegressionExperiments(
>>> input_features = data.columns.tolist()[0:-1],
>>> output_features = data.columns.tolist()[-1:],
>>> epochs=300,
>>> train_fraction=1.0,
>>> y_transformation="log",
>>> x_transformation="minmax",
>>> ts_args={'lookback':9}
>>> )
... # runt he experiments
>>> exp.fit(data=data)

__init__(input_features: list, param_space=None, x0=None, cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, **model_kws)[source]: initializes the experiment.

model_CNN(**kwargs)[source]: 1D CNN based model

model_CNNLSTM(**kwargs) → dict[source]: CNN-LSTM model

model_LSTM(**kwargs)[source]: LSTM based model

model_LSTMAutoEncoder(**kwargs)[source]: LSTM based auto-encoder model.

model_MLP(**kwargs)[source]: multi-layer perceptron model

model_TCN(**kwargs)[source]: Temporal Convolution network based model.

model_TFT(**kwargs)[source]: temporal fusion transformer model.

ClassificationExperiments

class ai4water.experiments.MLClassificationExperiments(param_space=None, x0=None, cases=None, exp_name='MLClassificationExperiments', num_samples=5, monitor=None, **model_kws)[source]

Bases: Experiments

Runs classification models for comparison, with or without optimization of hyperparameters. It compares around 30 classification algorithms from sklearn, xgboost, catboost and lightgbm.

Examples

>>> from ai4water.datasets import MtropicsLaos
>>> from ai4water.experiments import MLClassificationExperiments
>>> data = MtropicsLaos().make_classification(lookback_steps=2)
>>> inputs = data.columns.tolist()[0:-1]
>>> outputs = data.columns.tolist()[-1:]
>>> exp = MLClassificationExperiments(input_features=inputs,
>>>                                       output_features=outputs)
>>> exp.fit(data=data, include=["CatBoostClassifier", "LGBMClassifier",
>>>             'RandomForestClassifier', 'XGBClassifier'])
>>> exp.compare_errors('accuracy', data=data)

__init__(param_space=None, x0=None, cases=None, exp_name='MLClassificationExperiments', num_samples=5, monitor=None, **model_kws)[source]

Parameters:

param_space (list, optional) –
x0 (list, optional) –
cases (dict, optional) –
exp_name (str, optional) – name of experiment
num_samples (int, optional) –
monitor (list/str, optional) –
**model_kws – keyword arguments for ai4water.Model class

DLRegressionExperiments

class ai4water.experiments.DLRegressionExperiments(input_features: list, param_space=None, x0=None, cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, **model_kws)[source]

Bases: Experiments

A framework for comparing several basic DL architectures for a given data. This class can also be used for hyperparameter optimization of more than one DL models/architectures. However, the parameters which determine the dimensions of input data such as lookback should are not allowed to optimize when using random or grid search.

To check the available models

>>> exp = DLRegressionExperiments(...)
>>> exp.models

If learning rate, batch size, and lookback are are to be optimzied, their space can be specified in the following way:

>>> exp = DLRegressionExperiments(...)
>>> exp.lookback_space = [Integer(1, 100, name='lookback')]

Example

>>> from ai4water.experiments import DLRegressionExperiments
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> exp = DLRegressionExperiments(
>>> input_features = data.columns.tolist()[0:-1],
>>> output_features = data.columns.tolist()[-1:],
>>> epochs=300,
>>> train_fraction=1.0,
>>> y_transformation="log",
>>> x_transformation="minmax",
>>> ts_args={'lookback':9}
>>> )
... # runt he experiments
>>> exp.fit(data=data)

model_CNN(**kwargs)[source]: 1D CNN based model

model_CNNLSTM(**kwargs) → dict[source]: CNN-LSTM model

model_LSTM(**kwargs)[source]: LSTM based model

model_LSTMAutoEncoder(**kwargs)[source]: LSTM based auto-encoder model.

model_MLP(**kwargs)[source]: multi-layer perceptron model

model_TCN(**kwargs)[source]: Temporal Convolution network based model.

model_TFT(**kwargs)[source]: temporal fusion transformer model.

DLClassificationExperiments

class ai4water.experiments.DLClassificationExperiments(exp_name='DLClassificationExperiments_20230210_155908', *args, **kwargs)[source]

Bases: DLRegressionExperiments

Compare multiple neural network architectures for a classification problem

Examples

>>> from ai4water.experiments import DLClassificationExperiments
>>> from ai4water.datasets import MtropicsLaos
>>> data = MtropicsLaos().make_classification(
...     input_features=['air_temp', 'rel_hum'],
...     lookback_steps=5)
... #define inputs and outputs
>>> inputs = data.columns.tolist()[0:-1]
>>> outputs = data.columns.tolist()[-1:]
... #create the experiments class
>>> exp = DLClassificationExperiments(
...     input_features=inputs,
...     output_features=outputs,
...     epochs=5,
...     ts_args={"lookback": 5}
...)
... #run the experiments
>>> exp.fit(data=data, include=["TFT", "MLP"])