Model

BaseModel

The core of AI4Water is the Model class which builds and trains the machine learning model. This class interacts with pre-processing and post-processing modules.

The Model class uses a python dictionary to build layers of neural networks.

To build Tensorflow based models using python dictionary see the guide for declarative model definition for tensorflow. To build pytorch based NN models using python dictionary see the guide for declarative model definition for pytorch .

class ai4water._main.BaseModel(model: Optional[Union[dict, str, Callable]] = None, x_transformation: Optional[Union[str, dict, list]] = None, y_transformation: Optional[Union[str, dict, list]] = None, lr: float = 0.001, optimizer='Adam', loss: Union[str, Callable] = 'mse', quantiles=None, epochs: int = 14, min_val_loss: float = 0.0001, patience: int = 100, save_model: bool = True, monitor: Optional[Union[str, list]] = None, val_metric: Optional[str] = None, cross_validator: Optional[dict] = None, wandb_config: Optional[dict] = None, seed: int = 313, prefix: Optional[str] = None, path: Optional[str] = None, verbosity: int = 1, accept_additional_args: bool = False, **kwargs)[source]

Model class that implements logic of AI4Water.

__init__(model: Optional[Union[dict, str, Callable]] = None, x_transformation: Optional[Union[str, dict, list]] = None, y_transformation: Optional[Union[str, dict, list]] = None, lr: float = 0.001, optimizer='Adam', loss: Union[str, Callable] = 'mse', quantiles=None, epochs: int = 14, min_val_loss: float = 0.0001, patience: int = 100, save_model: bool = True, monitor: Optional[Union[str, list]] = None, val_metric: Optional[str] = None, cross_validator: Optional[dict] = None, wandb_config: Optional[dict] = None, seed: int = 313, prefix: Optional[str] = None, path: Optional[str] = None, verbosity: int = 1, accept_additional_args: bool = False, **kwargs)[source]

The Model class can take a large number of possible arguments depending upon the machine learning model/algorithm used. Not all the arguments are applicable in each case. The user must define only the relevant/applicable parameters and leave the others as it is.

Parameters:
  • model

    a dictionary defining machine learning model. If you are building a non-neural network model then this dictionary must consist of name of name of model as key and the keyword arguments to that model as dictionary. For example to build a decision forest based model

    >>> model = {'DecisionTreeRegressor': {"max_depth": 3,
    ...                                    "criterion": "mae"}}
    

    The key ‘DecisionTreeRegressor’ should exactly match the name of the model from one of following libraries

    The value {“max_depth”: 3, “criterion”: “mae”} is another dictionary which can be any keyword argument which the model (DecisionTreeRegressor in this case) accepts. The user must refer to the documentation of the underlying library (scikit-learn for DecisionTreeRegressor) to find out complete keyword arguments applicable for a particular model. See examples to learn how to build machine learning models If You are building a Deep Learning model using tensorflow, then the key must be ‘layers’ and the value must itself be a dictionary defining layers of neural networks. For example we can build an MLP as following

    >>> model = {'layers': {
    ...             "Dense_0": {'units': 64, 'activation': 'relu'},
    ...              "Flatten": {},
    ...              "Dense_3": {'units': 1}
    >>>             }}
    

    The MLP in this case consists of dense, and flatten layers. The user can define any keyword arguments which is accepted by that layer in TensorFlow. For example the Dense layer in TensorFlow can accept units and activation keyword argument among others. For details on how to buld neural networks using such layered API see examples

  • x_transformation

    type of transformation to be applied on x/input data. The transformation can be any transformation name from ai4water.preprocessing.transformations.Transformation . The user can specify more than one transformation. Moreover, the user can also determine which transformation to be applied on which input feature. Default is ‘minmax’. To apply a single transformation on all the data

    >>> x_transformation = 'minmax'
    

    To apply different transformations on different input and output features

    >>> x_transformation = [{'method': 'minmax', 'features': ['input1', 'input2']},
    ...                {'method': 'zscore', 'features': ['input3', 'input4']}
    ...                 ]
    

    Here input1, input2, input3 and input4 are the columns in the data. For more info see ai4water.preprocessing.Transformations and ai4water.preprocessing.Transformation classes.

  • y_transformation – type of transformation to be applied on y/label/output data.

  • lr (, default 0.001.) – learning rate,

  • optimizer (str/keras.optimizers like) – the optimizer to be used for neural network training. Default is ‘Adam’

  • loss (str/callable Default is mse.) – the cost/loss function to be used for training neural networks.

  • quantiles (list Default is None) – quantiles to be used when the problem is quantile regression.

  • epochs (int Default is 14) – number of epochs to be used.

  • min_val_loss (float Default is 0.0001.) – minimum value of validatin loss/error to be used for early stopping.

  • patience (int) – number of epochs to wait before early stopping. Set this value to None if you don’t want to use EarlyStopping.

  • save_model (bool) – whether to save the model or not. For neural networks, the model will be saved only an improvement in training/validation loss is observed. Otherwise model is not saved.

  • monitor (str/list) – metrics to be monitored. e.g. [‘nse’, ‘pbias’]

  • val_metric (str) – performance metric to be used for validation/cross_validation. This metric will be used for hyper-parameter optimizationa and experiment comparison. If not defined then r2_score will be used for regression and accuracy will be used for classification.

  • cross_validator (dict) –

    selects the type of cross validation to be applied. It can be any cross validator from sklear.model_selection. Default is None, which means validation will be done using validation_data. To use kfold cross validation,

    >>> cross_validator = {'KFold': {'n_splits': 5}}
    

  • batches (str) – either 2d or 3d`.

  • wandb_config (dict) –

    Only valid if wandb package is installed. Default value is None, which means, wandb will not be utilized. For simplest case, pass a dictionary with at least two keys namely project and entity. Otherwise use a dictionary of all the arugments for wandb.init, wandb.log and WandbCallback. For training_data and validation_data in WandbCallback, pass True instead of providing a tuple as shown below

    >>> wandb_config = {'entity': 'entity_name', 'project': 'project_name',
    ...                 'training_data':True, 'validation_data': True}
    

  • int (seed) – random seed for reproducibility. This can be set to None. The seed is set to os, tf, torch and random modules simultaneously. Please note that this seed is not set for numpy because that will result in constant sampling during hyperparameter optimization. If you want to seed everything, then use following function >>> model.seed_everything()

  • prefix (str) – prefix to be used for the folder in which the results are saved. default is None, which means within ./results/model_path

  • path (str/path like) – if not given, new model_path path will not be created.

  • verbosity (int default is 1) – determines the amount of information being printed. 0 means no print information. Can be between 0 and 3. Setting this value to 0 will also reqult in not showing some plots such as loss curve or regression plot. These plots will only be saved in self.path.

  • accept_additional_args (bool Default is False) – If you want to pass any additional argument, then this argument must be set to True, otherwise an error will be raise.

  • **kwargs – keyword arguments for ai4water.preprocessing.DataSet.__init__()

Note

The transformations applied on x and y data using x_transformation and y_transformations are part of model. See transformation

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> df = busan_beach()
>>> ann = Model(input_features=df.columns.tolist()[0:-1],
...              batch_size=16,
...              output_features=df.columns.tolist()[-1:],
...              model={'layers': {'Dense': 64, 'Dense': 1}},
... )
>>> history = ann.fit(data=df)
>>> y = ann.predict()
all_data(x=None, y=None, data=None) tuple[source]

it returns all data i.e. training+validation+test after extracting them data.

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> model = Model(model="XGBRegressor")
>>> train_x, train_y = model.training_data(data=data)
>>> print(train_x.shape, train_y.shape)
>>> val_x, val_y = model.validation_data(data=data)
>>> print(val_x.shape, val_y.shape)
... # all_data will contain both training and validation data
>>> all_x, all_y = model.all_data(data=data)
>>> print(all_x.shape, all_y.shape
cross_val_score(x=None, y=None, data: Optional[Union[DataFrame, ndarray, str]] = None, scoring: Optional[Union[str, list]] = None, refit: bool = False, process_results: bool = False) list[source]

computes cross validation score

Parameters:
  • x – input data

  • y – output corresponding to x.

  • data – raw unprepared data which will be given to ai4water.preprocessing.DataSet to prepare x,y from it.

  • scoring – performance metric to use for cross validation. If None, it will be taken from config[‘val_metric’]

  • refit (bool, optional (default=False) – If True, the model will be trained on the whole training+validation data after calculating cross validation score.

  • process_results (bool, optional) – whether to process results at each cv iteration or not

Returns:

cross validation score for each of metric in scoring

Return type:

list

Example

>>> from ai4water.datasets import busan_beach
>>> from ai4water import Model
>>> model = Model(model="RandomForestRegressor",
>>>               cross_validator={"KFold": {"n_splits": 5}})
>>> model.cross_val_score(data=busan_beach())

Note

Currently not working for deep learning models.

eda(data, freq: Optional[str] = None)[source]

Performs comprehensive Exploratory Data Analysis.

Parameters:
  • data

  • freq – if specified, small chunks of data will be plotted instead of whole data at once. The data will NOT be resampled. This is valid only plot_data and box_plot. Possible values are yearly, weekly`, and monthly.

Return type:

an instance of EDA ai4water.eda.EDA class

evaluate(x=None, y=None, data=None, metrics=None, **kwargs)[source]

Evaluates the performance of the model on a given data. calls the evaluate method of underlying model. If the evaluate method is not available in underlying model, then predict is called.

Parameters:
  • x – inputs

  • y – outputs/true data corresponding to x

  • data – Raw unprepared data which will be fed to ai4water.preprocessing.DataSet to prepare x and y. If x and y are given, this argument will have no meaning.

  • metrics

    the metrics to evaluate. It can a string indicating the metric to evaluate. It can also be a list of metrics to evaluate. Any metric name from RegressionMetrics or ClassificationMetrics can be given. It can also be name of group of metrics to evaluate. Following groups are available

    • minimal

    • all

    • hydro_metrics

    If this argument is given, the evaluate function of the underlying class is not called. Rather the model is evaluated manually for given metrics. Otherwise, if this argument is not given, then evaluate method of underlying model is called, if available.

  • kwargs – any keyword argument for the evaluate method of the underlying model.

Returns:

If metrics is not given then this method returns whatever is returned by evaluate method of underlying model. Otherwise the model is evaluated for given metric or group of metrics and the result is returned

Examples

>>> import numpy as np
>>> from ai4water import Model
>>> from ai4water.models import MLP
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model=MLP(),
...               input_features=data.columns.tolist()[0:-1],
...               output_features=data.columns.tolist()[-1:])
>>> model.fit(data=data)

for evaluation on test data

>>> model.evaluate(data=data)
...

evaluate on any metric from SeqMetrics library

>>> model.evaluate(data=data, metrics='pbias')
...
... # to evaluate on custom data, the user can provide its own x and y
>>> new_inputs = np.random.random((10, 13))
>>> new_outputs = np.random.random((10, 1, 1))
>>> model.evaluate(new_inputs, new_outputs)

backward compatability Since the ai4water’s Model is supposed to behave same as Keras’ Model the following expressions are equally valid.

>>> model.evaluate(x, y=y)
>>> model.evaluate(x=x, y=y)
evaluate_on_all_data(data, metrics=None, **kwargs)[source]

evaluates the model on all i.e. training+validation+test data. .. rubric:: Examples

>>> from ai4water import Model
>>> from ai4water.models import MLP
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model=MLP(),
...               input_features=data.columns.tolist()[0:-1],
...               output_features=data.columns.tolist()[-1:])
>>> model.fit(data=data)
... # for evaluation on all data
>>> print(model.evaluate_on_all_data(data=data)))
>>> print(model.evaluate_on_all_data(data=data, metrics='pbias'))
evaluate_on_test_data(data, metrics=None, **kwargs)[source]

evaluates the model on test data.

Parameters:
  • data – Raw unprepared data which will be fed to ai4water.preprocessing.DataSet to prepare x and y. If x and y are given, this argument will have no meaning.

  • metrics

    the metrics to evaluate. It can a string indicating the metric to evaluate. It can also be a list of metrics to evaluate. Any metric name from RegressionMetrics or ClassificationMetrics can be given. It can also be name of group of metrics to evaluate. Following groups are available

    • minimal

    • all

    • hydro_metrics

    If this argument is given, the evaluate function of the underlying class is not called. Rather the model is evaluated manually for given metrics. Otherwise, if this argument is not given, then evaluate method of underlying model is called, if available.

  • kwargs – any keyword argument for the evaluate method of the underlying model.

Returns:

  • If metrics is not given then this method returns whatever is returned

  • by evaluate method of underlying model. Otherwise the model is evaluated

  • for given metric or group of metrics and the result is returned as float

  • or dictionary

Examples

>>> from ai4water import Model
>>> from ai4water.models import MLP
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model=MLP(),
...               input_features=data.columns.tolist()[0:-1],
...               output_features=data.columns.tolist()[-1:])
>>> model.fit(data=data)
... # for evaluation on test data
>>> model.evaluate_on_test_data(data=data)
>>> model.evaluate_on_test_data(data=data, metrics='pbias')
evaluate_on_training_data(data, metrics=None, **kwargs)[source]

evaluates the model on training data.

Parameters:
  • data – Raw unprepared data which will be fed to ai4water.preprocessing.DataSet to prepare x and y. If x and y are given, this argument will have no meaning.

  • metrics

    the metrics to evaluate. It can a string indicating the metric to evaluate. It can also be a list of metrics to evaluate. Any metric name from RegressionMetrics or ClassificationMetrics can be given. It can also be name of group of metrics to evaluate. Following groups are available

    • minimal

    • all

    • hydro_metrics

    If this argument is given, the evaluate function of the underlying class is not called. Rather the model is evaluated manually for given metrics. Otherwise, if this argument is not given, then evaluate method of underlying model is called, if available.

  • kwargs – any keyword argument for the evaluate method of the underlying model.

Returns:

  • If metrics is not given then this method returns whatever is returned

  • by evaluate method of underlying model. Otherwise the model is evaluated

  • for given metric or group of metrics and the result is returned as float

  • or dictionary

Examples

>>> from ai4water import Model
>>> from ai4water.models import MLP
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model=MLP(),
...               input_features=data.columns.tolist()[0:-1],
...               output_features=data.columns.tolist()[-1:])
>>> model.fit(data=data)
... # for evaluation on training data
>>> model.evaluate_on_training_data(data=data)
>>> model.evaluate(data=data, metrics='pbias')
evaluate_on_validation_data(data, metrics=None, **kwargs)[source]

evaluates the model on validation data.

Parameters:
  • data – Raw unprepared data which will be fed to ai4water.preprocessing.DataSet to prepare x and y. If x and y are given, this argument will have no meaning.

  • metrics

    the metrics to evaluate. It can a string indicating the metric to evaluate. It can also be a list of metrics to evaluate. Any metric name from RegressionMetrics or ClassificationMetrics can be given. It can also be name of group of metrics to evaluate. Following groups are available

    • minimal

    • all

    • hydro_metrics

    If this argument is given, the evaluate function of the underlying class is not called. Rather the model is evaluated manually for given metrics. Otherwise, if this argument is not given, then evaluate method of underlying model is called, if available.

  • kwargs – any keyword argument for the evaluate method of the underlying model.

Returns:

  • If metrics is not given then this method returns whatever is returned

  • by evaluate method of underlying model. Otherwise the model is evaluated

  • for given metric or group of metrics and the result is returned as float

  • or dictionary

Examples

>>> from ai4water import Model
>>> from ai4water.models import MLP
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model=MLP(),
...               input_features=data.columns.tolist()[0:-1],
...               output_features=data.columns.tolist()[-1:])
>>> model.fit(data=data)
... # for evaluation on validation data
>>> model.evaluate_on_validation_data(data=data)
>>> model.evaluate_on_validation_data(data=data, metrics='pbias')
explain(*args, **kwargs)[source]
Calls the :py:func:ai4water.postprocessing.explain.explain_model` function

to explain the model. Example ——-

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model="RandomForestRegressor")
>>> model.fit(data=data)
>>> model.explain(total_data=data, examples_to_explain=2)
explain_example(data, example_num: int, method='shap')[source]

explains a single exmaple either using shap or lime

Parameters:
  • data – the data to use

  • example_num – the example/sample number/index to explain

  • method – either shap or lime

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model="RandomForestRegressor")
>>> model.fit(data=data)
>>> model.explain_example(data=data, example_num=2)
fit(x=None, y=None, data: Union[ndarray, DataFrame, DataSet, str] = 'training', callbacks: Optional[Union[list, dict]] = None, **kwargs)[source]

Trains the model with data. The data is either x or it is taken from data by feeding it to DataSet.

Parameters:
  • x – The input data consisting of input features. It can also be tf.Dataset or TorchDataset.

  • y – Correct labels/observations/true data corresponding to ‘x’.

  • data – Raw data fromw which x,``y`` pairs are prepared. This will be passed to ai4water.preprocessing.DataSet. It can also be an instance if ai4water.preprocessing.DataSet or ai4water.preprocessing.DataSetPipeline. It can also be name of dataset from ai4water.datasets.all_datasets

  • callbacks

    Any callback compatible with keras. If you want to log the output to tensorboard, then just use callbacks={‘tensorboard’:{}} or to provide additional arguments

    >>> callbacks={'tensorboard': {'histogram_freq': 1}}
    

  • kwargs – Any keyword argument for the fit method of the underlying library. if ‘x’ is present in kwargs, that will take precedent over data.

Returns:

A keras history object in case of deep learning model with tensorflow as backend or anything returned by fit method of underlying model.

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> model = Model(model="XGBRegressor")
>>> model.fit(data=busan_beach())

using your own data for training

>>> import numpy as np
>>> new_inputs = np.random.random((100, 10))
>>> new_outputs = np.random.random(100)
>>> model.fit(x=new_inputs, y=new_outputs)
fit_on_all_training_data(x=None, y=None, data=None, **kwargs)[source]

This function trains the model on training + validation data.

Parameters:
  • x – x data which is supposed to be consisting of training and validation. If not given, then data must be given.

  • y – label/target data corresponding to x data.

  • data

    raw data which will be passed to py:meth:ai4water.preprocessing.DataSet

    to get training and validation x,y pairs.

    The x data from training and validation is concatenated. Similarly, y data from training and validation is concatenated

  • **kwargs – any keyword arguments for fit method.

classmethod from_config(config: dict, make_new_path: bool = False, **kwargs) BaseModel[source]

Loads the model from config dictionary i.e. model.config

Parameters:
  • config (dict) – dictionary containing model’s parameters i.e. model.config

  • make_new_path (bool, optional) – whether to make new path or not?

  • **kwargs – any additional keyword arguments to Model class.

Return type:

an instalnce of ai4water.Model

Example

>>> from ai4water import Model
>>> import numpy as np
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> old_model = Model(model="XGBRegressor")
>>> old_model.fit(data=data)
... # now construct a new model instance from config dictionary
>>> model = Model.from_config(old_model.config)
>>> model.update_weights()
>>> x = np.random.random((100, 14))
>>> prediction = model.predict(x=x)
classmethod from_config_file(config_path: str, make_new_path: bool = False, **kwargs) BaseModel[source]

Loads the model from a config file.

Parameters:
  • config_path – complete path of config file

  • make_new_path (bool, optional) – If true, then it means we want to use the config file, only to build the model and a new path will be made. We would not normally update the weights in such a case.

  • **kwargs – any additional keyword arguments for the ai4water.Model

Return type:

an instance of ai4water.Model class

Example

>>> from ai4water import Model
>>> config_file_path = "../file/to/config.json"
>>> model = Model.from_config_file(config_file_path)
>>> x = np.random.random((100, 14))
>>> prediction = model.predict(x=x)
interpret(**kwargs)[source]

Interprets the underlying model. Call it after training.

Returns:

An instance of ai4water.postprocessing.interpret.Interpret class

Example

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> model = Model(model=...)
>>> model.fit(data=busan_beach())
>>> model.interpret()
optimize_hyperparameters(data: Union[tuple, list, DataFrame, ndarray], algorithm: str = 'bayes', num_iterations: int = 14, process_results: bool = True, refit: bool = True, **kwargs)[source]

optimizes the hyperparameters of the built model

The parameaters that needs to be optimized, must be given as space.

Parameters:
  • data

    It can be one of following

    • raw unprepared data in the form of a numpy array or pandas dataframe

    • a tuple of x,y pairs

    If it is unprepared data, it is passed to ai4water.preprocessing.DataSet. which prepares x,y pairs from it. The DataSet class also splits the data into training, validation and tests sets. If it is a tuple of x,y pairs, it is split into training and validation. In both cases, the loss on validation set is used as objective function. The loss calculated using val_metric.

  • algorithm – str, optional (default=”bayes”) the algorithm to use for optimization

  • num_iterations – int, optional (default=14) number of iterations for optimization.

  • process_results – bool, optional (default=True) whether to perform postprocessing of optimization results or not

  • refit – bool, optional (default=True) whether to retrain the model using both training and validation data

Returns:

an instance of ai4water.hyperopt.HyperOpt which is used for optimization

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> from ai4water.hyperopt import Integer, Categorical, Real
>>> model_config = {"XGBRegressor": {"n_estimators": Integer(low=10, high=20),
>>>                 "max_depth": Categorical([10, 20, 30]),
>>>                 "learning_rate": Real(0.00001, 0.1)}}
>>> model = Model(model=model_config)
>>> optimizer = model.optimize_hyperparameters(data=busan_beach())

Same can be done if a model is defined using neural networks

… lookback = 14 >>> model_config = {“layers”: { … “Input”: {“input_shape”: (lookback, 13)}, … “LSTM”: {“config”: {“units”: Integer(32, 64), “activation”: “relu”}}, … “Dense”: {“units”: 1, … “activation”: Categorical([“relu”, “tanh”], name=”dense1_act”)}}} >>> model = Model(model=model_config, ts_args={“lookback”: lookback}) >>> optimizer = model.optimize_hyperparameters(data=busan_beach(), … refit=False)

optimize_transformations(data: Union[ndarray, DataFrame], transformations: Optional[Union[str, list]] = None, include: Optional[Union[str, dict, list]] = None, exclude: Optional[Union[str, list]] = None, append: Optional[dict] = None, y_transformations: Optional[Union[list, dict]] = None, algorithm: str = 'bayes', num_iterations: int = 12, process_results: bool = True, update_config: bool = True)[source]

optimizes the transformations for the input/output features

The ‘val_score’ parameter given as input to the Model is used as objective function for optimization problem.

Parameters:
  • data

    It can be one of following

    • raw unprepared data in the form of a numpy array or pandas dataframe

    • a tuple of x,y pairs

    If it is unprepared data, it is passed to ai4water.preprocessing.DataSet. which prepares x,y pairs from it. The DataSet class also splits the data into training, validation and tests sets. If it is a tuple of x,y pairs, it is split into training and validation. In both cases, the loss on validation set is used as objective function. The loss calculated using val_metric.

  • transformations

    the transformations to consider for input features. By default, following transformations are considered for input features

    • minmax rescale from 0 to 1

    • center center the data by subtracting mean from it

    • scale scale the data by dividing it with its standard deviation

    • zscore first performs centering and then scaling

    • box-cox

    • yeo-johnson

    • quantile

    • robust

    • log

    • log2

    • log10

    • sqrt square root

  • include – list, dict, str, optional the name/names of input features to include. If you don’t want to include any feature. Set this to an empty list

  • exclude – the name/names of input features to exclude

  • append

    the input features with custom candidate transformations. For example if we want to try only minmax and zscore on feature tide_cm, then it can be done as following

    >>> append={"tide_cm": ["minmax", "zscore"]}
    

  • y_transformations

    It can either be a list of transformations to be considered for output features for example

    >>> y_transformations = ['log', 'log10', 'log2', 'sqrt']
    

    would mean that consider log, log10, log2 and sqrt are to be considered for output transformations during optimization. It can also be a dictionary whose keys are names of output features and whose values are lists of transformations to be considered for output features. For example

    >>> y_transformations = {'output1': ['log2', 'log10'], 'output2': ['log', 'sqrt']}
    

    Default is None, which means do not optimize transformation for output features.

  • algorithm – str The algorithm to use for optimizing transformations

  • num_iterations – int The number of iterations for optimizatino algorithm.

  • process_results – whether to perform postprocessing of optimization results or not

  • update_config – whether to update the config of model or not.

Returns:

an instance of HyperOpt ai4water.hyperopt.HyperOpt class which is used for optimization

Example

>>> from ai4water.datasets import busan_beach
>>> from ai4water import Model
>>> model = Model(model="XGBRegressor")
>>> optimizer_ = model.optimize_transformations(data=busan_beach(), exclude="tide_cm")
>>> print(optimizer_.best_paras())  # find the best/optimized transformations
>>> model.fit(data=busan_beach())
>>> model.predict()
partial_dependence_plot(x=None, data=None, data_type='all', feature_name=None, num_points: int = 100, show: bool = True)[source]

Shows partial depedence plot for a feature.

Parameters:
  • x – the input data to use. If not given, then data must be given.

  • data – raw unprepared data from which x,y paris are to be made. If given, x must not be given.

  • data_type (str) – the kind of the data to be used. It is only valid when data is given.

  • feature_name (str/list) – name/names of features. If only one feature is given, 1 dimensional partial dependence plot is plotted. You can also provide a list of two feature names, in which case 2d interaction plot will be plotted.

  • num_points (int) – number of points. It is used to define grid.

  • show (bool) – whether to show the plot or not!

Return type:

an instance of ai4water.postprocessing.PartialDependencePlot

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model="RandomForestRegressor")
>>> model.fit(data=data)
>>> model.partial_dependence_plot(x=data.iloc[:, 0:-1], feature_name="tide_cm")
...
>>> model.partial_dependence_plot(data=data, feature_name="tide_cm")
permutation_importance(data=None, data_type: str = 'test', x=None, y=None, scoring: Union[str, Callable] = 'r2', n_repeats: int = 5, noise: Optional[Union[str, ndarray]] = None, use_noise_only: bool = False, weights=None, plot_type: Optional[str] = None)[source]

Calculates the permutation importance on the given data

Parameters:
  • data – Raw unprepared data from which x,y paris of training and test data are prepared.

  • data_type (str) – one of training, test or validation. By default test data is used based upon recommendations of Christoph Molnar’s book. Only valid if data argument is given.

  • x – inputs for the model. alternative to data

  • y – target/observation data for the model. alternative to data

  • scoring – the scoring to use to calculate importance

  • n_repeats – number of times the permutation for each feature is performed.

  • noise – the noise to add when a feature is permutated. It can be a 1D array of length equal to len(data) or string defining the distribution

  • use_noise_only – If True, then the feature being perturbed is replaced by the noise instead of adding the noise into the feature. This argument is only valid if noise is not None.

  • weights

  • plot_type – if not None, it must be either heatmap or boxplot or bar_chart

Return type:

an instance of ai4water.postprocessing.PermutationImprotance

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> model = Model(model="XGBRegressor")
>>> model.fit(data=busan_beach())
>>> perm_imp = model.permutation_importance(data=busan_beach(),
...  data_type="validation", plot_type="boxplot")
>>> perm_imp.importances
predict(x=None, y=None, data: Union[str, DataFrame, ndarray, DataSet] = 'test', process_results: bool = True, metrics: str = 'minimal', return_true: bool = False, plots: Optional[Union[str, list]] = None, **kwargs)[source]

Makes prediction from the trained model.

Parameters:
  • x – The data on which to make prediction. if given, it will override data. It can also be tf.Dataset or TorchDataset

  • y – Used for pos-processing etc. if given it will overrite data

  • data – It can also be unprepared/raw data which will be given to ai4water.preprocessing.DataSet to prepare x,y values.

  • process_results – bool post processing of results

  • metrics – str only valid if process_results is True. The metrics to calculate. Valid values are minimal, all, hydro_metrics

  • return_true – bool whether to return the true values along with predicted values or not. Default is False, so that this method behaves sklearn type.

  • plots – optional (default=None) The kind of of plots to draw. Only valid if post_process is True

  • kwargs – any keyword argument for predict method.

Returns:

An numpy array of predicted values. If return_true is True then a tuple of arrays. The first is true and the second is predicted. If x is given but y is not given, then, first array which is returned is None.

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> model = Model(model="RandomForestRegressor")
>>> model.fit(data=busan_beach())
>>> pred = model.predict(data=busan_beach())

get true values

>>> true, pred = model.predict(data=busan_beach(), return_true=True)

postprocessing of results

>>> pred = model.predict(data=busan_beach(), process_results=True)

calculate all metrics during postprocessing

>>> pred = model.predict(data=busan_beach(), process_results=True, metrics="all")

using your own data

>>> import numpy as np
>>> new_input = np.random.random((10, 13))
>>> pred = model.predict(x = new_input)
predict_log_proba(x=None, data='test', **kwargs)[source]

since preprocessing is part of Model, so the trained model with sklearn/xgboost/catboost/lgbm as backend must also be able to apply preprocessing on inputs before calling predict_log_proba from underlying library. Currently it just calls the log_proba function of underlying library by first transforming x

predict_on_all_data(data, process_results=True, return_true=False, metrics='minimal', plots: Optional[Union[str, list]] = None, **kwargs)[source]

It makes prediction on training+validation+test data.

Parameters:
  • data – raw, unprepared data from which x,y paris will be generated.

  • process_results (bool, optional) – whether to post-process the results or not

  • return_true (bool, optional) – If true, the returned value will be tuple, first is true and second is predicted array

  • metrics (str, optional) – the metrics to calculate during post-processing

  • plots (optional (default=None)) –

    The kind of of plots to draw. Only valid if post_process is True Following plots are avialble.

    residual regression prediction errors fdc murphy edf

  • **kwargs – any keyword argument for .predict method.

predict_on_test_data(data, process_results=True, return_true=False, metrics='minimal', plots: Optional[Union[str, list]] = None, **kwargs)[source]

makes prediction on test data.

Parameters:
  • data – raw, unprepared data from which test data (x,y paris) will be generated.

  • process_results (bool, optional) – whether to post-process the results or not

  • return_true (bool, optional) – If true, the returned value will be tuple, first is true and second is predicted array

  • metrics (str, optional) – the metrics to calculate during post-processing

  • plots (optional (default=None)) –

    The kind of of plots to draw. Only valid if post_process is True Following plots are avialble.

    residual regression prediction errors fdc murphy edf

  • **kwargs – any keyword argument for .predict method.

predict_on_training_data(data, process_results=True, return_true=False, metrics='minimal', plots: Optional[Union[str, list]] = None, **kwargs)[source]

makes prediction on training data.

Parameters:
  • data – raw, unprepared data from which training data (x,y paris) will be generated.

  • process_results (bool, optional) – whether to post-process the results or not

  • return_true (bool, optional) – If true, the returned value will be tuple, first is true and second is predicted array

  • metrics (str, optional) – the metrics to calculate during post-processing

  • plots (optional (default=None)) –

    The kind of of plots to draw. Only valid if post_process is True Following plots are avialble.

    residual regression prediction errors fdc murphy edf

  • **kwargs – any keyword argument for .predict method.

predict_on_validation_data(data, process_results=True, return_true=False, metrics='minimal', plots: Optional[Union[str, list]] = None, **kwargs)[source]

makes prediction on validation data.

Parameters:
  • data – raw, unprepared data from which validation data (x,y paris) will be generated.

  • process_results (bool, optional) – whether to post-process the results or not

  • return_true (bool, optional) – If true, the returned value will be tuple, first is true and second is predicted array

  • metrics (str, optional) – the metrics to calculate during post-processing

  • plots (optional (default=None)) –

    The kind of of plots to draw. Only valid if post_process is True Following plots are avialble.

    residual regression prediction errors fdc murphy edf

  • **kwargs – any keyword argument for .predict method.

predict_proba(x=None, data='test', **kwargs)[source]

since preprocessing is part of Model, so the trained model with sklearn/xgboost/catboost/lgbm as backend must also be able to apply preprocessing on inputs before calling predict_proba from underlying library. Currently it just calls the predict_proba function of underlying library by first transforming x

prediction_analysis(features: Union[list, str], x: Optional[Union[ndarray, DataFrame]] = None, y: Optional[ndarray] = None, data=None, data_type: str = 'all', feature_names: Optional[Union[str, list]] = None, num_grid_points: Optional[int] = None, grid_types='percentile', percentile_ranges=None, grid_ranges=None, custom_grid: Optional[list] = None, show_percentile: bool = False, show_outliers: bool = False, end_point: bool = True, which_classes=None, ncols=2, figsize: Optional[tuple] = None, annotate: bool = True, annotate_kws: Optional[dict] = None, cmap='YlGn', border=False, show: bool = True, save_metadata: bool = True) Axes[source]

shows prediction distribution with respect to two input features.

Parameters:
  • x – input data to the model.

  • y – true data corresponding to x.

  • data – raw unprepared data from which x,y pairs for training,validation and test are generated. It must only be given if x is not given.

  • data_type (str, optional (default="test")) – The kind of data to be used. It is only valid if data argument is used. It should be one of training, validation, test or all.

  • features (str/list) – name or names of features to investigate

  • feature_names (list) – feature names

  • num_grid_points (list, optional, default=None) – number of grid points for each feature

  • grid_types (list, optional, default=None) – type of grid points for each feature

  • percentile_ranges (list of tuple, optional, default=None) – percentile range to investigate for each feature

  • grid_ranges (list of tuple, optional, default=None) – value range to investigate for each feature

  • custom_grid (list of (Series, 1d-array, list), optional, default=None) – customized list of grid points for each feature

  • show_percentile (bool, optional, default=False) – whether to display the percentile buckets for both feature

  • show_outliers (bool, optional, default=False) – whether to display the out of range buckets for both features

  • end_point (bool, optional) – If True, stop is the last grid point, default=True Otherwise, it is not included

  • which_classes (list, optional, default=None) – which classes to plot, only use when it is a multi-class problem

  • figsize (tuple or None, optional, default=None) – size of the figure, (width, height)

  • ncols (integer, optional, default=2) – number subplot columns, used when it is multi-class problem

  • annotate (bool, default=False) – whether to annotate the points

  • annotate_kws (dict, optional) –

    a dictionary of keyword arguments with following keys
    annotate_countsbool, default=False

    whether to annotate counts or not.

    annotate_colorstuple

    pair of colors

    annotate_color_thresholdfloat

    threshold value for annotation

    annotate_fmtstr

    format string for annotation.

    annotate_fontsizeint, optinoal (default=7)

    fontsize for annotation

  • cmap

  • border

  • show (bool, optional (default=True)) – whether to show the plot or not

  • save_metadata (bool, optional, default=True) – whether to save the information as csv or not

Returns:

a pandas dataframe and matplotlib Axes

Return type:

tuple

Examples

>>> from ai4water.datasets import busan_beach
>>> from ai4water import Model
...
>>> model = Model(model="XGBRegressor")
>>> model.fit(data=busan_beach())
>>> model.prediction_analysis(features="tide_cm",
... data=busan_beach(), show_percentile=True)
... # for multiple features
>>> model.prediction_analysis(
...     ['tide_cm', 'sal_psu'],
...     data=busan_beach(),
...     annotate_kws={"annotate_counts":True,
...     "annotate_colors":("black", "black"),
...     "annotate_fontsize":10},
...     custom_grid=[[-41.4, -20.0, 0.0, 20.0, 42.0],
...                       [33.45, 33.7, 33.9, 34.05, 34.4]],
... )
score(x=None, y=None, data='test', **kwargs)[source]

since preprocessing is part of Model, so the trained model with sklearn as backend must also be able to apply preprocessing on inputs before calculating score from sklearn. Currently it just calls the score function of sklearn by first transforming x and y.

seed_everything(seed=None) None[source]

resets seeds of numpy, os, random, tensorflow, torch. If any of these module is not available, the seed for that module is not set.

sensitivity_analysis(data=None, bounds=None, sampler='morris', analyzer: Union[str, list] = 'sobol', sampler_kwds: Optional[dict] = None, analyzer_kwds: Optional[dict] = None, save_plots: bool = True, names: Optional[List[str]] = None) dict[source]

performs sensitivity analysis of the model w.r.t input features in data.

The model and its hyperprameters remain fixed while the input data is changed.

Parameters:
  • data – data which will be used to get the bounds/limits of input features. If given, it must be 2d numpy array. It should be remembered that the given data is not used during sensitivity analysis. But new synthetic data is prepared on which sensitivity analysis is performed.

  • bounds (list,) – alternative to data

  • sampler (str, optional) – any sampler from SALib library. For example morris, fast_sampler, ff, finite_diff, latin, saltelli, sobol_sequence

  • analyzer (str, optional) – any analyzer from SALib lirary. For example sobol, dgsm, fast ff, hdmr, morris, pawn, rbd_fast. You can also choose more than one analyzer. This is useful when you want to compare results of more than one analyzers. It should be noted that having more than one analyzers does not increases computation time except for hdmr and delta analyzers. The hdmr and delta analyzers ane computation heavy. For example >>> analyzer = [“morris”, “sobol”, “rbd_fast”]

  • sampler_kwds (dict) – keyword arguments for sampler

  • analyzer_kwds (dict) – keyword arguments for analyzer

  • save_plots (bool, optional) –

  • names (list, optional) – names of input features. If not given, names of input features will be used.

Returns:

a dictionary whose keys are names of analyzers and values and sensitivity results for that analyzer.

Return type:

dict

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> df = busan_beach()
>>> input_features=df.columns.tolist()[0:-1]
>>> output_features = df.columns.tolist()[-1:]
... # build the model
>>> model=Model(model="RandomForestRegressor",
>>>     input_features=input_features,
>>>     output_features=output_features)
... # train the model
>>> model.fit(data=df)
... # perform sensitivity analysis
>>> si = model.sensitivity_analysis(data=df[input_features].values,
>>>                    sampler="morris", analyzer=["morris", "sobol"],
>>>                        sampler_kwds={'N': 100})
shap_values(data, layer=None) ndarray[source]

returns shap values

Parameters:
  • data – raw unprepared data from which training and test data are extracted.

  • layer

Examples

>>> from ai4water import Model
>>> from ai4water.datasets import busan_beach
>>> data = busan_beach()
>>> model = Model(model="RandomForestRegressor")
>>> model.fit(data=data)
>>> model.shap_values(data=data)
test_data(x=None, y=None, data='test', key='test') tuple[source]

returns the x,y pairs for test. x,y are not used but only given to be used if user overwrites this method for further processing of x, y as shown below.

>>> from ai4water import Model
>>> class MyModel(Model):
>>>     def ttest_data(self, *args, **kwargs) ->tuple:
>>>         train_x, train_y = super().training_data(*args, **kwargs)
...         # further process x, y
>>>         return train_x, train_y
training_data(x=None, y=None, data='training', key='train') tuple[source]

returns the x,y pairs for training. x,y are not used but only given to be used if user overwrites this method for further processing of x, y as shown below.

>>> from ai4water import Model
>>> class MyModel(Model):
>>>     def training_data(self, *args, **kwargs) ->tuple:
>>>         train_x, train_y = super().training_data(*args, **kwargs)
...         # further process x, y
>>>         return train_x, train_y
update_weights(weight_file: Optional[str] = None)[source]

Updates the weights of the underlying model.

Parameters:

weight_file (str, optional) – complete path of weight file. If not given, the weights are updated from model.w_path directory. For neural network based models, the best weights are updated if more than one weight file is present in model.w_path.

Return type:

None

validation_data(x=None, y=None, data='validation', key='val') tuple[source]

returns the x,y pairs for validation. x,y are not used but only given to be used if user overwrites this method for further processing of x, y as shown below.

>>> from ai4water import Model
>>> class MyModel(Model):
>>>     def validation_data(self, *args, **kwargs) ->tuple:
>>>         train_x, train_y = super().training_data(*args, **kwargs)
...         # further process x, y
>>>         return train_x, train_y
view(layer_name: Optional[Union[str, list]] = None, data=None, data_type: str = 'training', x=None, y=None, examples_to_view=None, show=False)[source]

shows all activations, weights and gradients of the model.

Parameters:
  • layer_name – the layer to view. If not given, all the layers will be viewed. This argument is only required when the model consists of layers of neural networks.

  • data – the data to use when making calls to model for activation calculation or for gradient calculation.

  • data_type – str It can either training, validation or test or all.

  • x – input, alternative to data. If given it will override data argument.

  • y – target/observed/label, alternative to data. If given it will override data argument.

  • examples_to_view – the examples to view.

  • show – whether to show the plot or not!

Returns:

An isntance of Visualize ai4water.postprocessing.visualize.Visualize class.

Model subclassing

Model subclassing is different from functional API in the way the model (neural network) is constructed. To understand the difference between model-subclassing API and functional API see Model subclassing vs functional API

This class Inherits from BaseModel. This class is a subclass of keras.Model/torch.nn.Module depending upon the backend used. For scikit-learn/xgboost/catboost type models, this class only inherits from BaseModel. For deep learning/neural network based models, this class directly exposes all the functionalities of underlying Model. Thus `self is now a keras Model or torch.nn.Module. If the user wishes to create his/her own NN architecture, he/she should overwrite initialize_layers and call/forward methods.

ai4water.main.Model.__init__(self, verbosity=1, model=None, path=None, prefix=None, **kwargs)

Initializes the layers of NN model using initialize_layers method. All other input arguments goes to BaseModel.

ai4water.main.Model.fit_pytorch(self, x, **kwargs)

Trains the pytorch model.

ai4water.main.Model.forward(self, *inputs: Any, **kwargs: Any)

implements forward pass implementation for pytorch based NN models.

ai4water.main.Model.initialize_layers(self, layers_config: dict, inputs=None)

Initializes the layers/weights/variables which are to be used in forward or call method.

Parameters:
  • layers_config (python dictionary to define neural network. For details) – [see](https://ai4water.readthedocs.io/en/latest/build_dl_models.html)

  • inputs (if None, it will be supposed the the Input layer either) – exists in layers_config or an Input layer will be created withing this method before adding any other layer. If not None, then it must be in Input layer and the remaining NN architecture will be built as defined in layers_config. This can be handy when we want to use this method several times to build a complex or parallel NN structure. Avoid Input in layer names.

Model for functional API

class ai4water.functional.Model(*args, **kwargs)[source]

Model class with Functional API and inherits from BaseModel.

For ML/non-Neural Network based models, there is no difference in functional or sub-clsasing api. For DL/NN-based models, this class implements functional api and differs from subclassing api in internal implementation of NN. This class is usefull, if you want to use the functional API of keras to build your own NN structure. In such as case you can construct your NN structure by overwriting add_layers. Another advantage of this class is that sometimes, model_subclsasing is not possible for example due to some bugs in tensorflow. In such a case this class can be used. Otherwise all the features of ai4water are available in this class as well.

Example

>>>from ai4water.functional import Model

__init__(*args, **kwargs)[source]

Initializes and builds the NN/ML model.

add_layers(layers_config: dict, inputs=None)[source]

Builds the NN from dictionary.

Parameters:
  • layers_config

    wholse keys can be one of the following: config: dict/lambda, Every layer must contain initializing

    arguments as config dictionary. The config dictionary for every layer can contain name key and its value must be str type. If name key is not provided in the config, the provided layer name will be used as its name e.g in following case

    layers = {‘LSTM’: {‘config’: {‘units’: 16}}}

    the name of LSTM layer will be LSTM while in follwoing case

    layers = {‘LSTM’: {‘config’: {‘units’: 16, ‘name’: ‘MyLSTM’}}}

    the name of the lstm will be MyLSTM.

    inputs: str/list, The calling arguments for the list. If inputs

    key is missing for a layer, it will be supposed that either this is an Input layer or it uses previous outputs as inputs.

    outputs: str/list We can specifity the outputs from a layer

    by using the outputs key. The value to outputs must be a string or list of strings specifying the name of outputs from current layer which can be used later in the mdoel.

    call_args: str/list We can also specify additional call arguments

    by call_args key. The value to call_args must be a string or a list of strings.

  • inputs – if None, it will be supposed the the Input layer either exists in layers_config or an Input layer will be created within this method before adding any other layer. If not None, then it must be in Input layer and the remaining NN architecture will be built as defined in layers_config. This can be handy when we want to use this method several times to build a complex or parallel NN structure. avoid Input in layer names.

Returns:

outputs :

Return type:

inputs

Pytorch Learner

This module can be used to train models which are built outside AI4Water’s model class. Thus, this module does not do any pre-processing, model building and post-processing of results.

This module is inspired from fastai’s Learner and keras’s Model class.

class ai4water.models._torch.Learner(model, batch_size: int = 32, num_epochs: int = 14, patience: int = 100, shuffle: bool = True, to_monitor: Optional[list] = None, use_cuda: bool = False, path: Optional[str] = None, wandb_config: Optional[dict] = None, verbosity=1, **kwargs)[source]

Bases: AttributeContainer

Trains the pytorch model. Motivated from fastai

__init__(model, batch_size: int = 32, num_epochs: int = 14, patience: int = 100, shuffle: bool = True, to_monitor: Optional[list] = None, use_cuda: bool = False, path: Optional[str] = None, wandb_config: Optional[dict] = None, verbosity=1, **kwargs)[source]

Initializes the Learner class

Parameters:
  • model

    a pytorch model having following attributes and methods

    • num_outs

    • w_path

    • loss

    • get_optimizer

  • batch_size – batch size

  • num_epochs – Number of epochs for which to train the model

  • patience – how many epochs to wait before stopping the training in case to_monitor does not improve.

  • shuffle

  • use_cuda – whether to use cuda or not

  • to_monitor – list of metrics to monitor

  • path – path to save results/weights

  • wandb_config – config for wandb

Example

>>> from torch import nn
>>> import torch
>>> from ai4water.models._torch import Learner
...
>>> class Net(nn.Module):
>>>    def __init__(self, D_in, H, D_out):
...        super(Net, self).__init__()
...        # hidden layer
...        self.linear1 = nn.Linear(D_in, H)
...        self.linear2 = nn.Linear(H, D_out)
>>>    def forward(self, x):
...        l1 = self.linear1(x)
...        a1 = torch.sigmoid(l1)
...        yhat = torch.sigmoid(self.linear2(a1))
...        return yhat
...
>>> learner = Learner(model=Net(1, 2, 1),
...                      num_epochs=501,
...                      patience=50,
...                      batch_size=1,
...                      shuffle=False)
...
>>> learner.optimizer = torch.optim.SGD(learner.model.parameters(), lr=0.1)
>>> def criterion_cross(labels, outputs):
...    out = -1 * torch.mean(labels * torch.log(outputs) + (1 - labels) * torch.log(1 - outputs))
...    return out
>>> learner.loss = criterion_cross
...
>>> X = torch.arange(-20, 20, 1).view(-1, 1).type(torch.FloatTensor)
>>> Y = torch.zeros(X.shape[0])
>>> Y[(X[:, 0] > -4) & (X[:, 0] < 4)] = 1.0
...
>>> learner.fit(X, Y)
>>> metrics = learner.evaluate(X, y=Y, metrics=['r2', 'nse', 'mape'])
>>> t = learner.predict(X, y=Y, name='training')
evaluate(x, y, batch_size: Optional[int] = None, metrics: Union[str, list] = 'r2', **kwargs)[source]

Evaluates the model on the given data.

Parameters:
  • x

    data on which to evalute. It can be

    • a torch.utils.data.Dataset

    • a torch.utils.data.DataLoader

    • a torch.Tensor

    • a numpy.ndarray

    • a list of torch tensors numpy arrays

  • y – It comprises labels for correspoing x.

  • batch_size – None means make prediction on whole data in one go

  • metrics – name of performance metric to measure. It can be a single metric or a list of metrics. Allowed metrics are anyone from ai4water.post_processing.SeqMetrics.RegressionMetrics

  • kwargs

Returns:

if metrics is string the returned value is float otherwise it will be a dictionary

fit(x, y=None, validation_data=None, **kwargs)[source]

Runs the training loop for pytorch model.

Parameters:
  • x

    Can be one of following

    • an instance of torch.Dataset, y will be ignored

    • an instance of torch.DataLoader, y will be ignored

    • a torch tensor containing input data for each example

    • a numpy array or pandas DataFrame

    • a list of torch tensors or numpy arrays

  • y – if x is torch tensor, then y is the label/target for each corresponding example.

  • validation_data – can be one of following: - an instance of torch.Dataset - an instance of torch.DataLoader - a tuple of x,y pairs where x and y are tensors Default is None, which means no validation is performed.

  • kwargs

    can be callbacks For example to use a callable as callback use following

    >>> callbacks = [{'after_epochs': 300, 'func': PlotStuff}]
    

    where PlotStuff is a callable. Each callable is provided with following keyword arguments

    • epoch : the current epoch at which callable is called.

    • model : the model

    • train_data : training data_loader

    • val_data : validation data_loader

plot_model(y=None)[source]

Helper function to plot dot diagram of model using torchviz module.

Parameters:

y (torch.Tensor) – output tensor

predict(x, y=None, batch_size: Optional[int] = None, reg_plot: bool = True, name: Optional[str] = None, **kwargs) ndarray[source]

Makes prediction on the given data

Parameters:
  • x

    data on which to evalute. It can be

    • a torch.utils.data.Dataset

    • a torch.utils.data.DataLoader

    • a torch.Tensor

    • a numpy array

    • a list of torch tensors numpy arrays

  • y – only relevent if x is torch.Tensor. It comprises labels for correspoing x.

  • batch_size – None means make prediction on whole data in one go

  • reg_plot – whether to plot regression line or not

  • name – string to be used for title and name of saved plot

Returns:

predicted output as numpy array

update_metrics()[source]
update_weights(weight_file_path: Optional[str] = None)[source]

If weight_file_path is not given then it finds the best weights and updates the model with best wieghts.

Parameters:

weight_file_path – complete path of weights which are to be loaded