Experiments
The purpose of this module is to compare more than one models. Furthermore, this module can also optimize the hyper-parameters of these models and compare them. The Experiments class provides the basic building block for conducting experiments. The MLRegressionExperiments and MLClassificationExperiments compare several classical machine learning regression and classification models respectively. The DLRegressionExperiments class compares some common basic deep learning algorithms for a given data.
Experiments
- class ai4water.experiments.Experiments(cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, monitor: Optional[Union[str, list, Callable]] = None, show: bool = True, save: bool = True, **model_kws)[source]
Bases:
object
Base class for all the experiments.
All the experiments must be subclasses of this class. The core idea of
Experiments
is based uponmodel
. An experiment consists of one or more models. The models differ from each other in their structure/idea/concept/configuration. Whenai4water.experiments.Experiments.fit()
is called, eachmodel
is built and trained. The user can customize, building and training process by subclassing this class and customizingai4water.experiments.Experiments._build()
andai4water.experiments.Experiments._fit()
methods.- - metrics
- - exp_path
- - model_
- - models
- - fit
- - taylor_plot
- - loss_comparison
- - plot_convergence
- - from_config
- - compare_errors
- - plot_improvement
- - compare_convergence
- - plot_cv_scores
- - fit_with_tpot
- __init__(cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, monitor: Optional[Union[str, list, Callable]] = None, show: bool = True, save: bool = True, **model_kws)[source]
- Parameters:
cases – python dictionary defining different cases/scenarios. See TransformationExperiments for use case.
exp_name – name of experiment, used to define path in which results are saved
num_samples – only relevent when you wan to optimize hyperparameters of models using
grid
methodverbosity (bool, optional) – determines the amount of information
monitor (str, list, optional) –
list of performance metrics to monitor. It can be any performance metric SeqMetrics library. By default
r2
,corr_coeff
,mse
,rmse
,r2_score
,nse
,kge
,mape
,pbias
,bias
,mae
,nrmse
mase
are considered for regression andaccuracy
,precision
recall
are considered for classification. The user can also put a custom metric to monitor. In such a case we it should be callable which accepts two input arguments. The first one is array of true and second is array of predicted values.>>> def f1_score(t,p)->float: >>> return ClassificationMetrics(t, p).f1_score(average="macro") >>> monitor = [f1_score, "accuracy"]
Here
f1_score
is a function which accepts two arays.**model_kws –
Model (keyword arguments which are to be passed to) – and are not optimized.
- compare_convergence(name: str = 'convergence_comparison', **kwargs) Optional[Axes] [source]
Plots and compares the convergence plots of hyperparameter optimization runs. Only valid if run_type=optimize during
ai4water.experiments.Experiments.fit()
call.- Parameters:
- Returns:
if the optimized models are >1 then it returns the maplotlib axes
on which the figure is drawn otherwise it returns None.
Examples
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> experiment = MLRegressionExperiments() >>> experiment.fit(data=busan_beach(), run_type="optimize", num_iterations=30) >>> experiment.compare_convergence()
- compare_edf_plots(x=None, y=None, data=None, exclude: Optional[Union[str, list]] = None, figsize=None, fname: Optional[str] = 'edf', **kwargs)[source]
compare EDF plots of all the models which have been fitted. This plot is only available for regression problems.
- Parameters:
x – input data
y – target data
data – raw unprocessed data from which x,y pairs of the test data are drawn
exclude (list) – name of models to exclude from plotting
figsize – figure size as (width, height)
fname (str, optional) – name of the file to save plot
**kwargs – any keword arguments for py:meth:ai4water.utils.utils.edf_plot
- Returns:
matplotlib
- Return type:
plt.Figure
Example
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> dataset = busan_beach() >>> inputs = list(dataset.columns)[0:-1] >>> outputs = list(dataset.columns)[-1] >>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs) >>> experiment.fit(data=dataset, include="LMs") >>> experiment.compare_edf_plots(data=dataset, exclude="SGDRegressor")
- compare_errors(matric_name: str, x=None, y=None, data=None, cutoff_val: Optional[float] = None, cutoff_type: Optional[str] = None, sort_by: str = 'test', ignore_nans: bool = True, colors=None, cmaps=None, figsize: Optional[tuple] = None, **kwargs) DataFrame [source]
Plots a specific performance matric for all the models which were run during
ai4water.experiments.Experiments.fit()
call.- Parameters:
matric_name (str) – performance matric whose value to plot for all the models
x – input data, if not given, then
data
must be given.y – target data
data – raw unprocessed data from which x,y pairs can be drawn. This data will be passed to
ai4water.preprocessing.DataSet()
class andai4water.preprocessing.DataSet.test_data()
method will be used to draw x,y pairs.cutoff_val (float) – if provided, only those models will be plotted for whome the matric is greater/smaller than this value. This works in conjuction with cutoff_type.
cutoff_type (str) – one of
greater
,greater_equal
,less
orless_equal
. Criteria to determine cutoff_val. For example if we want to show only those models whose $R^2$ is > 0.5, it will be ‘max’.sort_by – either
test
ortrain
. How to sort the results for plotting. If ‘test’, then test performance matrics will be sorted otherwise train performance matrics will be sorted.ignore_nans – default True, if True, then performance matrics with nans are ignored otherwise nans/empty bars will be shown to depict which models have resulted in nans for the given performance matric.
colors – color for bar chart. To assign separate colors for both bar charts, provide a list of two.
cmaps – color map for bar chart. To assign separate cmap for both bar charts, provide a list of two.
figsize (tuple) – figure size as (width, height)
**kwargs – any keyword argument that goes to easy_mpl.bar_chart
- Returns:
pandas dataframe whose index is models and has two columns with name ‘train’ and ‘test’ These columns contain performance metrics for the models..
- Return type:
pd.DataFrame
Example
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> data = busan_beach() >>> inputs = list(data.columns)[0:-1] >>> outputs = list(data.columns)[-1] >>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs) >>> experiment.fit(data=data) >>> experiment.compare_errors('mse', data=data) >>> experiment.compare_errors('r2', data=data, cutoff_val=0.2, cutoff_type='greater')
- compare_precision_recall_curves(x, y, figsize: Optional[tuple] = None, **kwargs)[source]
compares precision recall curves of the all the models.
- Parameters:
x – input data
y – labels for the input data
figsize (tuple) – figure size
**kwargs – any keyword arguments for :obj:matplotlib.plot function
- Returns:
matplotlib axes on which figure is drawn
- Return type:
plt.Axes
Example
>>> from ai4water.datasets import MtropicsLaos >>> from ai4water.experiments import MLClassificationExperiments >>> data = MtropicsLaos().make_classification(lookback_steps=1) # define inputs and outputs >>> inputs = data.columns.tolist()[0:-1] >>> outputs = data.columns.tolist()[-1:] # initiate the experiment >>> exp = MLClassificationExperiments( ... input_features=inputs, ... output_features=outputs) # run the experiment >>> exp.fit(data=data, include=["model_LGBMClassifier", ... "model_XGBClassifier", ... "RandomForestClassifier"]) ... # Compare Precision Recall curves >>> exp.compare_precision_recall_curves(data[inputs].values, data[outputs].values)
- compare_regression_plots(x=None, y=None, data=None, include: Union[None, list] = None, figsize: Optional[tuple] = None, fname: Optional[str] = 'regression', **kwargs) Figure [source]
compare regression plots of all the models which have been fitted. This plot is only available for regression problems.
- Parameters:
x – input data
y – target data
data – raw unprocessed data from which x,y pairs of the test data are drawn
include (str, list, optional) – if not None, must be a list of models which will be included. None will result in plotting all the models.
figsize – figure size as (width, length)
fname (str, optional) – name of the file to save the plot
**kwargs – any keyword arguments for obj:easy_mpl.reg_plot
- Returns:
matplotlib
- Return type:
plt.Figure
Example
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> dataset = busan_beach() >>> inputs = list(dataset.columns)[0:-1] >>> outputs = list(dataset.columns)[-1] >>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs) >>> experiment.fit(data=dataset) >>> experiment.compare_regression_plots(data=dataset)
- compare_residual_plots(x=None, y=None, data=None, include: Union[None, list] = None, figsize: Optional[tuple] = None, fname: Optional[str] = 'residual') Figure [source]
compare residual plots of all the models which have been fitted. This plot is only available for regression problems.
- Parameters:
x – input data
y – target data
data – raw unprocessed data frmm which test x,y pairs are drawn using
ai4water.preprocessing.DataSet()
. class. Only valid if x and y are not given.include (str, list, optional) – if not None, must be a list of models which will be included. None will result in plotting all the models.
figsize (tuple) – figure size as (width, height)
fname (str, optional) – name of file to save the plot
- Returns:
matplotlib
- Return type:
plt.Figure
Example
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> dataset = busan_beach() >>> inputs = list(dataset.columns)[0:-1] >>> outputs = list(dataset.columns)[-1] >>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs) >>> experiment.fit(data=dataset) >>> experiment.compare_residual_plots(data=dataset)
- compare_roc_curves(x, y, figsize: Optional[tuple] = None, **kwargs)[source]
compares roc curves of the all the models.
- Parameters:
x – input data
y – labels for the input data
figsize (tuple) – figure size
**kwargs – any keyword arguments for :obj:matplotlib.plot function
- Returns:
matplotlib axes on which figure is drawn
- Return type:
plt.Axes
Example
>>> from ai4water.datasets import MtropicsLaos >>> from ai4water.experiments import MLClassificationExperiments >>> data = MtropicsLaos().make_classification(lookback_steps=1) # define inputs and outputs >>> inputs = data.columns.tolist()[0:-1] >>> outputs = data.columns.tolist()[-1:] # initiate the experiment >>> exp = MLClassificationExperiments( ... input_features=inputs, ... output_features=outputs) # run the experiment >>> exp.fit(data=data, include=["model_LGBMClassifier", ... "model_XGBClassifier", ... "RandomForestClassifier"]) ... # Compare ROC curves >>> exp.compare_roc_curves(data[inputs].values, data[outputs].values)
- fit(x=None, y=None, data=None, validation_data: Optional[tuple] = None, run_type: str = 'dry_run', opt_method: str = 'bayes', num_iterations: int = 12, include: Union[None, str, list] = None, exclude: Union[None, list, str] = '', cross_validate: bool = False, post_optimize: str = 'eval_best', **hpo_kws)[source]
Runs the fit loop for all the
models
of experiment. The user can however, specify the models by making use ofinclude
andexclude
keywords.The data should be defined according to following four rules either
only x,y should be given (val will be taken from it according to splitting schemes)
or x,y and validation_data should be given
or only data should be given (train and validation data will be taken accoring to splitting schemes)
- Parameters:
x – input data. When
run_type
isdry_run
, then the each model is trained on this data. Ifrun_type
isoptimize
, validation_data is not given, then x,y pairs of validation data are extracted from this data based upon splitting scheme i.e.val_fraction
argument.y – label/true/observed data
data – Raw unprepared data from which x,y pairs for training and validation will be extracted. this will be passed to
ai4water.Model.fit()
. This is is only required ifx
andy
are not givenvalidation_data – a tuple which consists of x,y pairs for validation data. This can only be given if
x
andy
are given anddata
is not given.run_type (str, optional (default="dry_run")) – One of
dry_run
oroptimize
. Ifdry_run
, then all the models will be trained only once. ifoptimize
, then hyperparameters of all the models will be optimized.opt_method (str, optional (default="bayes")) – which optimization method to use. options are
bayes
,random
,grid
. Only valid ifrun_type
isoptimize
num_iterations (int, optional) – number of iterations for optimization. Only valid if
run_type
isoptimize
.include (list/str optional (default="DTs")) – name of models to included. If None, all the models found will be trained and or optimized. Default is “DTs”, which means all decision tree based models will be used.
exclude – name of
models
to be excludedcross_validate (bool, optional (default=False)) – whether to cross validate the model or not. This depends upon cross_validator agrument to the Model.
post_optimize (str, optional) – one of
eval_best
ortrain_best
. If eval_best, the weights from the best models will be uploaded again and the model will be evaluated on train, test and all the data. Iftrain_best
, then a new model will be built and trained using the parameters of the best model.**hpo_kws – keyword arguments for
ai4water.hyperopt.HyperOpt
class.
Examples
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> exp = MLRegressionExperiments() >>> exp.fit(data=busan_beach())
If you want to compare only RandomForest, XGBRegressor, CatBoostRegressor and LGBMRegressor, use the
include
keyword>>> exp.fit(data=busan_beach(), include=['RandomForestRegressor', 'XGBRegressor', >>> 'CatBoostRegressor', 'LGBMRegressor'])
Similarly, if you want to exclude certain models from comparison, you can use
exclude
keyword>>> exp.fit(data=busan_beach(), exclude=["SGDRegressor"])
if you want to perform cross validation for each model, we must give the
cross_validator
argument which will be passed to ai4water Model>>> exp = MLRegressionExperiments(cross_validator={"KFold": {"n_splits": 10}}) >>> exp.fit(data=busan_beach(), cross_validate=True)
Setting
cross_validate
to True will populate cv_scores_ dictionary which can be accessed asexp.cv_scores_
if you want to optimize the hyperparameters of each model,
>>> exp.fit(data=busan_beach(), run_type="optimize", num_iterations=20)
- fit_with_tpot(data, models: Optional[Union[int, List[str], dict, str]] = None, selection_criteria: str = 'mse', scoring: Optional[str] = None, **tpot_args)[source]
Fits the tpot’s fit method which finds out the best pipline for the given data.
- Parameters:
data –
models –
It can be of three types.
- If list, it will be the names of machine learning models/
algorithms to consider.
- If integer, it will be the number of top
algorithms to consider for tpot. In such a case, you must have first run .fit method before running this method. If you run the tpot using all available models, it will take hours to days for medium sized data (consisting of few thousand examples). However, if you run first .fit and see for example what are the top 5 models, then you can set this argument to 5. In such a case, tpot will search pipeline using only the top 5 algorithms/models that have been found using .fit method.
- if dictionary, then the keys should be the names of algorithms/models
and values shoudl be the parameters for each model/algorithm to be optimized.
- You can also set it to
all
consider all models available in ai4water’s Experiment module.
- You can also set it to
default is None, which means, the tpot_config argument will be None
selection_criteria – The name of performance metric. If
models
is integer, then according to this performance metric the models will be choosen. By default the models will be selected based upon their mse values on test data.scoring (the performance metric to use for finding the pipeline.) –
tpot_args – any keyword argument for tpot’s Regressor or Classifier class. This can include arguments like
generations
,population_size
etc.
- Return type:
the tpot object
Example
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> exp = MLRegressionExperiments(exp_name=f"tpot_reg_{dateandtime_now()}") >>> exp.fit(data=busan_beach()) >>> tpot_regr = exp.fit_with_tpot(busan_beach(), 2, generations=1, population_size=2)
- classmethod from_config(config_path: str, **kwargs) Experiments [source]
Loads the experiment from the config file.
- Parameters:
config_path – complete path of experiment
kwargs – keyword arguments to experiment
- Returns:
an instance of Experiments class
- loss_comparison(loss_name: str = 'loss', include: Optional[list] = None, figsize: Optional[int] = None, start: int = 0, end: Optional[int] = None, **kwargs) Axes [source]
Plots the loss curves of the evaluated models. This method is only available if the models which are being compared are deep leanring mdoels.
- Parameters:
- Return type:
matplotlib axes
Example
>>> from ai4water.experiments import DLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> data = busan_beach() >>> exp = DLRegressionExperiments( >>> input_features = data.columns.tolist()[0:-1], >>> output_features = data.columns.tolist()[-1:], >>> epochs=300, >>> train_fraction=1.0, >>> y_transformation="log", >>> x_transformation="minmax", >>> )
>>> exp.fit(data=data) >>> exp.loss_comparison()
you may wish to plot on log scale
>>> exp.loss_comparison(ax_kws={'logy':True})
- plot_cv_scores(name: str = 'cv_scores', exclude: Optional[Union[str, list]] = None, include: Optional[Union[str, list]] = None, **kwargs) Optional[Axes] [source]
Plots the box whisker plots of the cross validation scores.
This plot is only available if cross_validation was set to True during
ai4water.experiments.Experiments.fit()
.- Parameters:
name (str) – name of the file to save the plot
include (str/list) – models to include
exclude (models to exclude) –
**kwargs (any of the following keyword arguments) –
notch
vert
figsize
bbox_inches
- Return type:
matplotlib axes if the figure is drawn otherwise None
Example
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> exp = MLRegressionExperiments(cross_validator={"KFold": {"n_splits": 10}}) >>> exp.fit(data=busan_beach(), cross_validate=True) >>> exp.plot_cv_scores()
- plot_improvement(metric_name: str, plot_type: str = 'dumbbell', lower_limit: Union[int, float] = -1.0, upper_limit: Optional[Union[int, float]] = None, name: str = '', **kwargs) DataFrame [source]
Shows how much improvement was observed after hyperparameter optimization. This plot is only available if
run_type
was set to optimize inai4water.experiments.Experiments.fit()
.- Parameters:
metric_name – the peformance metric for comparison
plot_type (str, optional) – the kind of plot to draw. Either
dumbbell
orbar
lower_limit (float/int, optional (default=-1.0)) – clip the values below this value. Set this value to None to avoid clipping.
upper_limit (float/int, optional (default=None)) – clip the values above this value
name (str, optional) – name of file to save the figure
**kwargs –
any additional keyword arguments for dumbell plot
or bar_chart
- Return type:
pd.DataFrame
Examples
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> experiment = MLRegressionExperiments() >>> experiment.fit(data=busan_beach(), run_type="optimize", num_iterations=30) >>> experiment.plot_improvement('r2') ... >>> # or draw dumbbell plot ... >>> experiment.plot_improvement('r2', plot_type='bar')
- sort_models_by_metric(metric_name, cutoff_val=None, cutoff_type=None, ignore_nans: bool = True, sort_by='test') DataFrame [source]
returns the models sorted according to their performance
- taylor_plot(x=None, y=None, data=None, include: Union[None, list] = None, exclude: Union[None, list] = None, figsize: tuple = (5, 8), **kwargs) Figure [source]
Compares the models using taylor_plot.
- Parameters:
x – input data, if not given, then
data
must be given.y – target data
data –
raw unprocessed data from which x,y pairs can be drawn. This data will be passed to DataSet class and
ai4water.preprocessing.DataSet.test_data()
method will be used to draw x,y pairs.
include (str, list, optional) – if not None, must be a list of models which will be included. None will result in plotting all the models.
exclude (str, list, optional) – if not None, must be a list of models which will excluded. None will result in no exclusion
figsize (tuple, optional) – figure size as (width,height)
**kwargs – all the keyword arguments for taylor_plot function.
- Return type:
plt.Figure
Example
>>> from ai4water.experiments import MLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> data = busan_beach() >>> inputs = list(data.columns)[0:-1] >>> outputs = list(data.columns)[-1] >>> experiment = MLRegressionExperiments(input_features=inputs, output_features=outputs) >>> experiment.fit(data=data) >>> experiment.taylor_plot(data=data)
- train_best(x, y, model_type)[source]
Finds the best model, builts it, fits it and makes predictions from it.
- update_model_weight(model: Model, config_path: str)[source]
updates the weight of model. If no saved weight is found, a warning is raised.
- verify_data(x=None, y=None, data=None, validation_data: Optional[tuple] = None, test_data: Optional[tuple] = None) tuple [source]
- verifies that either
only x,y should be given (val will be taken from it according to splitting schemes)
or x,y and validation_data should be given (means no test data)
or x, y and validation_data and test_data are given
- or only data should be given (train, validation and test data will be
taken accoring to splitting schemes)
RegressionExperiments
- class ai4water.experiments.MLRegressionExperiments(param_space=None, x0=None, cases=None, exp_name='MLRegressionExperiments', num_samples=5, verbosity=1, **model_kws)[source]
Bases:
Experiments
Compares peformance of 40+ machine learning models for a regression problem. The experiment consists of models which are run using fit() method. A model is one experiment.
The user can define new models by subclassing this class. In fact any new method in the sub-class which starts with model_ wll be considered as a new model. Otherwise the user has to overwite the attribute models to redefine, which methods (of class) are to be used as models and which should not. The method which is a model must only return key word arguments which will be streamed to the Model using build_and_run method. Inside this new method the user must define, which parameters to optimize, their param_space for optimization and the initial values to use for optimization.
- __init__(param_space=None, x0=None, cases=None, exp_name='MLRegressionExperiments', num_samples=5, verbosity=1, **model_kws)[source]
Initializes the class
- Parameters:
param_space – dimensions of parameters which are to be optimized. These can be overwritten in models.
list (x0) – initial values of the parameters which are to be optimized. These can be overwritten in models
str (exp_name) – name of experiment, all results will be saved within this folder
dict (model_kws) – keyword arguments which are to be passed to Model and are not optimized.
Examples
>>> from ai4water.datasets import busan_beach >>> from ai4water.experiments import MLRegressionExperiments >>> # first compare the performance of all available models without optimizing their parameters >>> data = busan_beach() # read data file, in this case load the default data >>> inputs = list(data.columns)[0:-1] # define input and output columns in data >>> outputs = list(data.columns)[-1] >>> comparisons = MLRegressionExperiments( ... input_features=inputs, output_features=outputs, ... nan_filler= {'method': 'KNNImputer', 'features': inputs} ) >>> comparisons.fit(data=data,run_type="dry_run") >>> comparisons.compare_errors('r2', data=data) >>> # find out the models which resulted in r2> 0.5 >>> best_models = comparisons.compare_errors('r2', cutoff_type='greater', ... cutoff_val=0.3, data=data) >>> # now build a new experiment for best models and otpimize them >>> comparisons = MLRegressionExperiments( ... input_features=inputs, output_features=outputs, ... nan_filler= {'method': 'KNNImputer', 'features': inputs}, ... exp_name="BestMLModels") >>> comparisons.fit(data=data, run_type="optimize", include=best_models.index) >>> comparisons.compare_errors('r2', data=data) >>> comparisons.taylor_plot() # see help(comparisons.taylor_plot()) to tweak the taylor plot
- class ai4water.experiments.DLRegressionExperiments(input_features: list, param_space=None, x0=None, cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, **model_kws)[source]
Bases:
Experiments
A framework for comparing several basic DL architectures for a given data. This class can also be used for hyperparameter optimization of more than one DL models/architectures. However, the parameters which determine the dimensions of input data such as
lookback
should are not allowed to optimize when using random or grid search.To check the available models
>>> exp = DLRegressionExperiments(...) >>> exp.models
If learning rate, batch size, and lookback are are to be optimzied, their space can be specified in the following way:
>>> exp = DLRegressionExperiments(...) >>> exp.lookback_space = [Integer(1, 100, name='lookback')]
Example
>>> from ai4water.experiments import DLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> data = busan_beach() >>> exp = DLRegressionExperiments( >>> input_features = data.columns.tolist()[0:-1], >>> output_features = data.columns.tolist()[-1:], >>> epochs=300, >>> train_fraction=1.0, >>> y_transformation="log", >>> x_transformation="minmax", >>> ts_args={'lookback':9} >>> ) ... # runt he experiments >>> exp.fit(data=data)
ClassificationExperiments
- class ai4water.experiments.MLClassificationExperiments(param_space=None, x0=None, cases=None, exp_name='MLClassificationExperiments', num_samples=5, monitor=None, **model_kws)[source]
Bases:
Experiments
Runs classification models for comparison, with or without optimization of hyperparameters. It compares around 30 classification algorithms from sklearn, xgboost, catboost and lightgbm.
Examples
>>> from ai4water.datasets import MtropicsLaos >>> from ai4water.experiments import MLClassificationExperiments >>> data = MtropicsLaos().make_classification(lookback_steps=2) >>> inputs = data.columns.tolist()[0:-1] >>> outputs = data.columns.tolist()[-1:] >>> exp = MLClassificationExperiments(input_features=inputs, >>> output_features=outputs) >>> exp.fit(data=data, include=["CatBoostClassifier", "LGBMClassifier", >>> 'RandomForestClassifier', 'XGBClassifier']) >>> exp.compare_errors('accuracy', data=data)
DLRegressionExperiments
- class ai4water.experiments.DLRegressionExperiments(input_features: list, param_space=None, x0=None, cases: Optional[dict] = None, exp_name: Optional[str] = None, num_samples: int = 5, verbosity: int = 1, **model_kws)[source]
Bases:
Experiments
A framework for comparing several basic DL architectures for a given data. This class can also be used for hyperparameter optimization of more than one DL models/architectures. However, the parameters which determine the dimensions of input data such as
lookback
should are not allowed to optimize when using random or grid search.To check the available models
>>> exp = DLRegressionExperiments(...) >>> exp.models
If learning rate, batch size, and lookback are are to be optimzied, their space can be specified in the following way:
>>> exp = DLRegressionExperiments(...) >>> exp.lookback_space = [Integer(1, 100, name='lookback')]
Example
>>> from ai4water.experiments import DLRegressionExperiments >>> from ai4water.datasets import busan_beach >>> data = busan_beach() >>> exp = DLRegressionExperiments( >>> input_features = data.columns.tolist()[0:-1], >>> output_features = data.columns.tolist()[-1:], >>> epochs=300, >>> train_fraction=1.0, >>> y_transformation="log", >>> x_transformation="minmax", >>> ts_args={'lookback':9} >>> ) ... # runt he experiments >>> exp.fit(data=data)
DLClassificationExperiments
- class ai4water.experiments.DLClassificationExperiments(exp_name='DLClassificationExperiments_20230210_155908', *args, **kwargs)[source]
Bases:
DLRegressionExperiments
Compare multiple neural network architectures for a classification problem
Examples
>>> from ai4water.experiments import DLClassificationExperiments >>> from ai4water.datasets import MtropicsLaos >>> data = MtropicsLaos().make_classification( ... input_features=['air_temp', 'rel_hum'], ... lookback_steps=5) ... #define inputs and outputs >>> inputs = data.columns.tolist()[0:-1] >>> outputs = data.columns.tolist()[-1:] ... #create the experiments class >>> exp = DLClassificationExperiments( ... input_features=inputs, ... output_features=outputs, ... epochs=5, ... ts_args={"lookback": 5} ...) ... #run the experiments >>> exp.fit(data=data, include=["TFT", "MLP"])