utility functions
Some utility functions
prepare_data
- ai4water.utils.utils.prepare_data(data: ndarray, lookback: int, num_inputs: Optional[int] = None, num_outputs: Optional[int] = None, input_steps: int = 1, forecast_step: int = 0, forecast_len: int = 1, known_future_inputs: bool = False, output_steps: int = 1, mask: Optional[Union[int, float, ndarray]] = None) Tuple[ndarray, ndarray, ndarray] [source]
converts a numpy nd array into a supervised machine learning problem.
- Parameters:
data – nd numpy array whose first dimension represents the number of examples and the second dimension represents the number of features. Some of those features will be used as inputs and some will be considered as outputs depending upon the values of num_inputs and num_outputs.
lookback – number of previous steps/values to be used at one step.
num_inputs – default None, number of input features in data. If None, it will be calculated as features-outputs. The input data will be all from start till num_outputs in second dimension.
num_outputs – number of columns (from last) in data to be used as output. If None, it will be caculated as features-inputs.
input_steps – strides/number of steps in input data
forecast_step – must be greater than equal to 0, which t+ith value to use as target where i is the horizon. For time series prediction, we can say, which horizon to predict.
forecast_len – number of horizons/future values to predict.
known_future_inputs – Only useful if forecast_len>1. If True, this means, we know and use ‘future inputs’ while making predictions at t>0
output_steps – step size in outputs. If =2, it means we want to predict every second value from the targets
mask – If int, then the examples with these values in the output will be skipped. If array then it must be a boolean mask indicating which examples to include/exclude. The length of mask should be equal to the number of generated examples. The number of generated examples is difficult to prognose because it depend upon lookback, input_steps, and forecast_step. Thus it is better to provide an integer indicating which values in outputs are to be considered as invalid. Default is None, which indicates all the generated examples will be returned.
- Returns:
x (numpy array of shape (examples, lookback, ins) consisting of) – input examples
prev_y (numpy array consisting of previous outputs)
y (numpy array consisting of target values)
Given following data consisting of input/output pairs
input1
input2
output1
output2
output 3
1
11
21
31
41
2
12
22
32
42
3
13
23
33
43
4
14
24
34
44
5
15
25
35
45
6
16
26
36
46
7
17
27
37
47
If we use following 2 time series as input
input1
input2
1
11
2
12
3
13
4
14
5
15
6
16
7
17
then
num_inputs
=2,lookback
=7,input_steps
=1and if we want to predict
output1
output2
output 3
27
37
47
then
num_outputs
=3,forecast_len
=1,forecast_step
=0,if we want to predict
output1
output2
output 3
28
38
48
then
num_outputs
=3,forecast_len
=1,forecast_step
=1,if we want to predict
output1
output2
output 3
27
37
47
28
38
48
then
num_outputs
=3,forecast_len
=2, horizon/forecast_step=0,if we want to predict
output1
output2
output 3
28
38
48
29
39
49
30
40
50
then
num_outputs
=3,forecast_len
=3,forecast_step
=1,if we want to predict
output2
38
39
40
then
num_outputs
=1,forecast_len
=3,forecast_step
=0if we predict
output2
39
then
num_outputs
=1,forecast_len
=1,forecast_step
=2if we predict
output2
39
40
41
then
num_outputs
=1,forecast_len
=3,forecast_step
=2If we use following two time series as input
input1
input2
1
11
3
13
5
15
7
17
then
num_inputs
=2,lookback
=4,input_steps
=2If the input is
input1
input2
1
11
2
12
3
13
4
14
5
15
6
16
7
17
and target/output is
output1
output2
output 3
25
35
45
26
36
46
27
37
47
This means we make use of
known future inputs
. This can be achieved using following configuration num_inputs=2, num_outputs=3, lookback=4, forecast_len=3, forecast_step=1, known_future_inputs=TrueThe general shape of output/target/label is (examples, num_outputs, forecast_len)
The general shape of inputs/x is (examples, lookback + forecast_len-1, ….num_inputs)
Examples
>>> import numpy as np >>> from ai4water.utils.utils import prepare_data >>> num_examples = 50 >>> dataframe = np.arange(int(num_examples*5)).reshape(-1, num_examples).transpose() >>> dataframe[0:10] array([[ 0, 50, 100, 150, 200], [ 1, 51, 101, 151, 201], [ 2, 52, 102, 152, 202], [ 3, 53, 103, 153, 203], [ 4, 54, 104, 154, 204], [ 5, 55, 105, 155, 205], [ 6, 56, 106, 156, 206], [ 7, 57, 107, 157, 207], [ 8, 58, 108, 158, 208], [ 9, 59, 109, 159, 209]]) >>> x, prevy, y = prepare_data(dataframe, num_outputs=2, lookback=4, ... input_steps=2, forecast_step=2, forecast_len=4) >>> x[0] array([[ 0., 50., 100.], [ 2., 52., 102.], [ 4., 54., 104.], [ 6., 56., 106.]], dtype=float32) >>> y[0] array([[158., 159., 160., 161.], [208., 209., 210., 211.]], dtype=float32)
>>> x, prevy, y = prepare_data(dataframe, num_outputs=2, lookback=4, ... forecast_len=3, known_future_inputs=True) >>> x[0] array([[ 0, 50, 100], [ 1, 51, 101], [ 2, 52, 102], [ 3, 53, 103], [ 4, 54, 104], [ 5, 55, 105], [ 6, 56, 106]]) # (7, 3) >>> # it is important to note that although lookback=4 but x[0] has shape of 7 >>> y[0] array([[154., 155., 156.], [204., 205., 206.]], dtype=float32) # (2, 3)
get_attributes
tensorflow, torch, numpy, matplotlib, random and other libraries are imported here once and then used all over ai4water. This file does not import anything from other files of ai4water.
- ai4water.backend.get_attributes(aus, what: str, retain: Optional[str] = None, case_sensitive: bool = False) dict [source]
gets all callable attributes of aus from what and saves them in dictionary with their names as keys. If case_sensitive is True, then the all keys are capitalized so that calling them becomes case insensitive. It is possible that some of the attributes of tf.keras.layers are callable but still not a valid layer, sor some attributes of tf.keras.losses are callable but still not valid losses, in that case the error will be generated from tensorflow. We are not catching those error right now.
- Parameters:
aus – parent module
what (str) – child module/package
retain (str, optional (default=None)) – if duplicates of ‘what’ exist then whether to prefer class or function. For example, fastica and FastICA exist in sklearn.decomposition then if retain is ‘function’ then fastica will be kept, if retain is ‘class’ then FastICA is kept. If retain is None, then what comes later will overwrite the previously kept object.
case_sensitive (bool, optional (default=False)) – whether to consider what as case-sensitive or not. In such a case, fastica and FastICA will both be saved as separate objects.
Example
>>> get_attributes(tf.keras, 'layers') # will get all layers from tf.keras.layers
murphy_diagram
- ai4water.utils.visualizations.murphy_diagram(observed: Union[list, ndarray, Series, DataFrame], predicted: Union[list, ndarray, Series, DataFrame], reference: Optional[Union[list, ndarray, Series, DataFrame]] = None, reference_model: Optional[Union[str, Callable]] = None, inputs=None, plot_type: str = 'scores', xaxis: str = 'theta', ax: Optional[Axes] = None, line_colors: Optional[tuple] = None, fill_color: str = 'lightgray', show: bool = True) Axes [source]
-
- Parameters:
observed – observed or true values
predicted – model’s prediction
reference – reference prediction
reference_model – The model for reference prediction. Only relevent if reference is None and plot_type is diff. It can be callable or a string. If it is a string, then it can be any model name from sklearn.linear_model
inputs – inputs for reference model. Only relevent if reference_model is not None and plot_type is diff
plot_type – either of scores or diff
xaxis – either of theta or time
ax – the axis to use for plotting
line_colors – colors of line
fill_color – color to fill confidence interval
show – whether to show the plot or not
- Returns:
matplotlib axes
Example
>>> import numpy as np >>> from ai4water.utils.visualizations import murphy_diagram >>> yy = np.random.randint(1, 1000, 100) >>> ff1 = np.random.randint(1, 1000, 100) >>> ff2 = np.random.randint(1, 1000, 100) >>> murphy_diagram(yy, ff1, ff2) ... >>> murphy_diagram(yy, ff1, ff2, plot_type="diff")
fdc_plot
- ai4water.utils.visualizations.fdc_plot(sim: Union[list, ndarray, Series, DataFrame], obs: Union[list, ndarray, Series, DataFrame], ax: Optional[Axes] = None, legend: bool = True, xlabel: str = 'Exceedence [%]', ylabel: str = 'Flow', show: bool = True) Axes [source]
Plots flow duration curve
- Parameters:
sim – simulated flow
obs – observed flow
ax – axis on which to plot
legend – whether to apply legend or not
xlabel – label to set on x-axis. set to None for no x-label
ylabel – label to set on y-axis
show – whether to show the plot or not
- Returns:
matplotlib axes
Example
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from ai4water.utils.visualizations import fdc_plot >>> simulated = np.random.random(100) >>> observed = np.random.random(100) >>> fdc_plot(simulated, observed) >>> plt.show()
edf_plot
jsonize
- ai4water.utils.utils.jsonize(obj, type_converters: Optional[dict] = None)[source]
Serializes an object to python’s native types so that it can be saved in json file format. If the object is a sequence, then each member of th sequence is serialized. Same goes for nested sequences like lists of lists or list of dictionaries.
- Parameters:
obj – any python object that needs to be serialized.
type_converters (dict) – a dictionary definiting how to serialize any particular type The keys of the dictionary should be
type
the the values should be callable to serialize that type.
- Return type:
a serialized python object
Examples
>>> import numpy as np >>> from ai4water.utils import jsonize >>> a = np.array([2.0]) >>> b = jsonize(a) >>> type(b) # int ... # if a data container consists of mix of native and third party types ... # only third party types are converted into native types >>> print(jsonize({1: [1, None, True, np.array(3)], 'b': np.array([1, 3])})) ... {1: [1, None, True, 3], 'b': [1, 2, 3]}
The user can define the methods to serialize some types e. g., we can serialize tensorflow’s tensors using serialize method
>>> from tensorflow.keras.layers import Lambda, serialize >>> tensor = Lambda(lambda _x: _x[Ellipsis, -1, :]) >>> jsonize({'my_tensor': tensor}, {Lambda: serialize})
TrainTestSplit
- class ai4water.utils.utils.TrainTestSplit(test_fraction: float = 0.3, seed: Optional[int] = None, train_indices: Optional[Union[list, ndarray]] = None, test_indices: Optional[Union[list, ndarray]] = None)[source]
train_test_split of sklearn can not be used for list of arrays so here we go
Examples
>>> import numpy as np >>> from ai4water.utils.utils import TrainTestSplit >>> x1 = np.random.random((100, 10, 4)) >>> x2 = np.random.random((100, 4)) >>> x = [x1, x2] >>> y = np.random.random(100) ... >>> train_x, test_x, train_y, test_y = TrainTestSplit().split_by_random(x, y) >>> # works as well when only a single array i.e. is provided >>> train_x, test_x, _, _ = TrainTestSplit().split_by_random(x) ... # if we have a time-series like data, where we want to use earlier samples ... # for training and later samples for test then we can do slice based >>> train_x, test_x, train_y, test_y = TrainTestSplit().split_by_slicing(x, y)
- split_by_indices(x: Union[list, ndarray, Series, DataFrame, List[ndarray]], y: Optional[Union[list, ndarray, Series, DataFrame, List[ndarray]]] = None)[source]
splits the x and y by user defined train_indices and test_indices
- split_by_random(x: Union[list, ndarray, Series, DataFrame, List[ndarray]], y: Optional[Union[list, ndarray, Series, DataFrame, List[ndarray]]] = None) Tuple[Any, Any, Any, Any] [source]
splits the x and y by random splitting. :param x: arrays to split
array like such as list, numpy array or pandas dataframe/series
list of array like objects
- Parameters:
y –
array like
array like such as list, numpy array or pandas dataframe/series
list of array like objects
- split_by_slicing(x: Union[list, ndarray, Series, DataFrame, List[ndarray]], y: Optional[Union[list, ndarray, Series, DataFrame, List[ndarray]]] = None)[source]
splits the x and y by slicing which is defined by test_fraction :param x: arrays to split
array like such as list, numpy array or pandas dataframe/series
list of array like objects
- Parameters:
y –
array like
array like such as list, numpy array or pandas dataframe/series
list of array like objects