utility functions

Some utility functions

prepare_data

ai4water.utils.utils.prepare_data(data: ndarray, lookback: int, num_inputs: Optional[int] = None, num_outputs: Optional[int] = None, input_steps: int = 1, forecast_step: int = 0, forecast_len: int = 1, known_future_inputs: bool = False, output_steps: int = 1, mask: Optional[Union[int, float, ndarray]] = None) → Tuple[ndarray, ndarray, ndarray][source]

converts a numpy nd array into a supervised machine learning problem.

Parameters:

data – nd numpy array whose first dimension represents the number of examples and the second dimension represents the number of features. Some of those features will be used as inputs and some will be considered as outputs depending upon the values of num_inputs and num_outputs.
lookback – number of previous steps/values to be used at one step.
num_inputs – default None, number of input features in data. If None, it will be calculated as features-outputs. The input data will be all from start till num_outputs in second dimension.
num_outputs – number of columns (from last) in data to be used as output. If None, it will be caculated as features-inputs.
input_steps – strides/number of steps in input data
forecast_step – must be greater than equal to 0, which t+ith value to use as target where i is the horizon. For time series prediction, we can say, which horizon to predict.
forecast_len – number of horizons/future values to predict.
known_future_inputs – Only useful if forecast_len>1. If True, this means, we know and use ‘future inputs’ while making predictions at t>0
output_steps – step size in outputs. If =2, it means we want to predict every second value from the targets
mask – If int, then the examples with these values in the output will be skipped. If array then it must be a boolean mask indicating which examples to include/exclude. The length of mask should be equal to the number of generated examples. The number of generated examples is difficult to prognose because it depend upon lookback, input_steps, and forecast_step. Thus it is better to provide an integer indicating which values in outputs are to be considered as invalid. Default is None, which indicates all the generated examples will be returned.

Returns:

x (numpy array of shape (examples, lookback, ins) consisting of) – input examples
prev_y (numpy array consisting of previous outputs)
y (numpy array consisting of target values)

Given following data consisting of input/output pairs

input1	input2	output1	output2	output 3
1	11	21	31	41
2	12	22	32	42
3	13	23	33	43
4	14	24	34	44
5	15	25	35	45
6	16	26	36	46
7	17	27	37	47

If we use following 2 time series as input

input1	input2
1	11
2	12
3	13
4	14
5	15
6	16
7	17

then num_inputs =2, lookback =7, input_steps =1

and if we want to predict

output1	output2	output 3
27	37	47

then num_outputs =3, forecast_len =1, forecast_step =0,

if we want to predict

output1	output2	output 3
28	38	48

then num_outputs =3, forecast_len =1, forecast_step =1,

if we want to predict

output1	output2	output 3
27	37	47
28	38	48

then num_outputs =3, forecast_len =2, horizon/forecast_step=0,

if we want to predict

output1	output2	output 3
28	38	48
29	39	49
30	40	50

then num_outputs =3, forecast_len =3, forecast_step =1,

if we want to predict

output2
38
39
40

then num_outputs =1, forecast_len =3, forecast_step =0

if we predict

output2
39

then num_outputs =1, forecast_len =1, forecast_step =2

if we predict

output2
39
40
41

then num_outputs =1, forecast_len =3, forecast_step =2

If we use following two time series as input

input1	input2
1	11
3	13
5	15
7	17

then num_inputs =2, lookback =4, input_steps =2

If the input is

input1	input2
1	11
2	12
3	13
4	14
5	15
6	16
7	17

and target/output is

output1	output2	output 3
25	35	45
26	36	46
27	37	47

This means we make use of known future inputs. This can be achieved using following configuration num_inputs=2, num_outputs=3, lookback=4, forecast_len=3, forecast_step=1, known_future_inputs=True

The general shape of output/target/label is (examples, num_outputs, forecast_len)

The general shape of inputs/x is (examples, lookback + forecast_len-1, ….num_inputs)

Examples

>>> import numpy as np
>>> from ai4water.utils.utils import prepare_data
>>> num_examples = 50
>>> dataframe = np.arange(int(num_examples*5)).reshape(-1, num_examples).transpose()
>>> dataframe[0:10]
array([[  0,  50, 100, 150, 200],
       [  1,  51, 101, 151, 201],
       [  2,  52, 102, 152, 202],
       [  3,  53, 103, 153, 203],
       [  4,  54, 104, 154, 204],
       [  5,  55, 105, 155, 205],
       [  6,  56, 106, 156, 206],
       [  7,  57, 107, 157, 207],
       [  8,  58, 108, 158, 208],
       [  9,  59, 109, 159, 209]])
>>> x, prevy, y = prepare_data(dataframe, num_outputs=2, lookback=4,
...    input_steps=2, forecast_step=2, forecast_len=4)
>>> x[0]
array([[  0.,  50., 100.],
      [  2.,  52., 102.],
      [  4.,  54., 104.],
      [  6.,  56., 106.]], dtype=float32)
>>> y[0]
array([[158., 159., 160., 161.],
      [208., 209., 210., 211.]], dtype=float32)

>>> x, prevy, y = prepare_data(dataframe, num_outputs=2, lookback=4,
...    forecast_len=3, known_future_inputs=True)
>>> x[0]
array([[  0,  50, 100],
       [  1,  51, 101],
       [  2,  52, 102],
       [  3,  53, 103],
       [  4,  54, 104],
       [  5,  55, 105],
       [  6,  56, 106]])       # (7, 3)
>>> # it is important to note that although lookback=4 but x[0] has shape of 7
>>> y[0]
array([[154., 155., 156.],
       [204., 205., 206.]], dtype=float32)  # (2, 3)

get_attributes

tensorflow, torch, numpy, matplotlib, random and other libraries are imported here once and then used all over ai4water. This file does not import anything from other files of ai4water.

ai4water.backend.get_attributes(aus, what: str, retain: Optional[str] = None, case_sensitive: bool = False) → dict[source]

gets all callable attributes of aus from what and saves them in dictionary with their names as keys. If case_sensitive is True, then the all keys are capitalized so that calling them becomes case insensitive. It is possible that some of the attributes of tf.keras.layers are callable but still not a valid layer, sor some attributes of tf.keras.losses are callable but still not valid losses, in that case the error will be generated from tensorflow. We are not catching those error right now.

Parameters:

aus – parent module
what (str) – child module/package
retain (str, optional (default=None)) – if duplicates of ‘what’ exist then whether to prefer class or function. For example, fastica and FastICA exist in sklearn.decomposition then if retain is ‘function’ then fastica will be kept, if retain is ‘class’ then FastICA is kept. If retain is None, then what comes later will overwrite the previously kept object.
case_sensitive (bool, optional (default=False)) – whether to consider what as case-sensitive or not. In such a case, fastica and FastICA will both be saved as separate objects.

Example

>>> get_attributes(tf.keras, 'layers')  # will get all layers from tf.keras.layers

murphy_diagram

ai4water.utils.visualizations.murphy_diagram(observed: Union[list, ndarray, Series, DataFrame], predicted: Union[list, ndarray, Series, DataFrame], reference: Optional[Union[list, ndarray, Series, DataFrame]] = None, reference_model: Optional[Union[str, Callable]] = None, inputs=None, plot_type: str = 'scores', xaxis: str = 'theta', ax: Optional[Axes] = None, line_colors: Optional[tuple] = None, fill_color: str = 'lightgray', show: bool = True) → Axes[source]

Murphy diagram as introducted by Ehm et al., 2015: and illustrated by Rob Hyndman

Parameters:

observed – observed or true values
predicted – model’s prediction
reference – reference prediction
reference_model – The model for reference prediction. Only relevent if reference is None and plot_type is diff. It can be callable or a string. If it is a string, then it can be any model name from sklearn.linear_model
inputs – inputs for reference model. Only relevent if reference_model is not None and plot_type is diff
plot_type – either of scores or diff
xaxis – either of theta or time
ax – the axis to use for plotting
line_colors – colors of line
fill_color – color to fill confidence interval
show – whether to show the plot or not

Returns:

matplotlib axes

Example

>>> import numpy as np
>>> from ai4water.utils.visualizations import murphy_diagram
>>> yy = np.random.randint(1, 1000, 100)
>>> ff1 = np.random.randint(1, 1000, 100)
>>> ff2 = np.random.randint(1, 1000, 100)
>>> murphy_diagram(yy, ff1, ff2)
...
>>> murphy_diagram(yy, ff1, ff2, plot_type="diff")

fdc_plot

ai4water.utils.visualizations.fdc_plot(sim: Union[list, ndarray, Series, DataFrame], obs: Union[list, ndarray, Series, DataFrame], ax: Optional[Axes] = None, legend: bool = True, xlabel: str = 'Exceedence [%]', ylabel: str = 'Flow', show: bool = True) → Axes[source]

Plots flow duration curve

Parameters:

sim – simulated flow
obs – observed flow
ax – axis on which to plot
legend – whether to apply legend or not
xlabel – label to set on x-axis. set to None for no x-label
ylabel – label to set on y-axis
show – whether to show the plot or not

Returns:

matplotlib axes

Example

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from ai4water.utils.visualizations import fdc_plot
>>> simulated = np.random.random(100)
>>> observed = np.random.random(100)
>>> fdc_plot(simulated, observed)
>>> plt.show()

edf_plot

ai4water.utils.visualizations.edf_plot(y: ndarray, num_points: int = 100, xlabel='Objective Value', marker: str = '-', ax: Optional[Axes] = None, show: bool = True, **kwargs) → Axes[source]

Plots the empirical distribution function.

Parameters:

y (np.ndarray) – array of values
num_points (int) –
xlabel (str) –
marker (str) –
ax (plt.Axes, optional) –
show (bool, optional (default=True)) – whether to show the plot or not
**kwargs – key word arguments for plot

Return type:

plt.Axes

jsonize

ai4water.utils.utils.jsonize(obj, type_converters: Optional[dict] = None)[source]

Serializes an object to python’s native types so that it can be saved in json file format. If the object is a sequence, then each member of th sequence is serialized. Same goes for nested sequences like lists of lists or list of dictionaries.

Parameters:

obj – any python object that needs to be serialized.
type_converters (dict) – a dictionary definiting how to serialize any particular type The keys of the dictionary should be type the the values should be callable to serialize that type.

Return type:

a serialized python object

Examples

>>> import numpy as np
>>> from ai4water.utils import jsonize
>>> a = np.array([2.0])
>>> b = jsonize(a)
>>> type(b)  # int
... # if a data container consists of mix of native and third party types
... # only third party types are converted into native types
>>> print(jsonize({1: [1, None, True, np.array(3)], 'b': np.array([1, 3])}))
... {1: [1, None, True, 3], 'b': [1, 2, 3]}

The user can define the methods to serialize some types e. g., we can serialize tensorflow’s tensors using serialize method

>>> from tensorflow.keras.layers import Lambda, serialize
>>> tensor = Lambda(lambda _x: _x[Ellipsis, -1, :])
>>> jsonize({'my_tensor': tensor}, {Lambda: serialize})

TrainTestSplit

class ai4water.utils.utils.TrainTestSplit(test_fraction: float = 0.3, seed: Optional[int] = None, train_indices: Optional[Union[list, ndarray]] = None, test_indices: Optional[Union[list, ndarray]] = None)[source]

train_test_split of sklearn can not be used for list of arrays so here we go

Examples

>>> import numpy as np
>>> from ai4water.utils.utils import TrainTestSplit
>>> x1 = np.random.random((100, 10, 4))
>>> x2 = np.random.random((100, 4))
>>> x = [x1, x2]
>>> y = np.random.random(100)
...
>>> train_x, test_x, train_y, test_y = TrainTestSplit().split_by_random(x, y)
>>> # works as well when only a single array i.e. is provided
>>> train_x, test_x, _, _ = TrainTestSplit().split_by_random(x)
... # if we have a time-series like data, where we want to use earlier samples
... # for training and later samples for test then we can do slice based
>>> train_x, test_x, train_y, test_y = TrainTestSplit().split_by_slicing(x, y)

split_by_indices(x: Union[list, ndarray, Series, DataFrame, List[ndarray]], y: Optional[Union[list, ndarray, Series, DataFrame, List[ndarray]]] = None)[source]: splits the x and y by user defined train_indices and test_indices

split_by_random(x: Union[list, ndarray, Series, DataFrame, List[ndarray]], y: Optional[Union[list, ndarray, Series, DataFrame, List[ndarray]]] = None) → Tuple[Any, Any, Any, Any][source]

splits the x and y by random splitting. :param x: arrays to split

array like such as list, numpy array or pandas dataframe/series

list of array like objects

Parameters:

y –

array like

array like such as list, numpy array or pandas dataframe/series
list of array like objects

split_by_slicing(x: Union[list, ndarray, Series, DataFrame, List[ndarray]], y: Optional[Union[list, ndarray, Series, DataFrame, List[ndarray]]] = None)[source]

splits the x and y by slicing which is defined by test_fraction :param x: arrays to split

array like such as list, numpy array or pandas dataframe/series

list of array like objects

Parameters:

y –

array like

array like such as list, numpy array or pandas dataframe/series
list of array like objects