utility functions
Some utility functions
prepare_data
 ai4water.utils.utils.prepare_data(data: ndarray, lookback: int, num_inputs: Optional[int] = None, num_outputs: Optional[int] = None, input_steps: int = 1, forecast_step: int = 0, forecast_len: int = 1, known_future_inputs: bool = False, output_steps: int = 1, mask: Optional[Union[int, float, ndarray]] = None) Tuple[ndarray, ndarray, ndarray] [source]
converts a numpy nd array into a supervised machine learning problem.
 Parameters
data – nd numpy array whose first dimension represents the number of examples and the second dimension represents the number of features. Some of those features will be used as inputs and some will be considered as outputs depending upon the values of num_inputs and num_outputs.
lookback – number of previous steps/values to be used at one step.
num_inputs – default None, number of input features in data. If None, it will be calculated as featuresoutputs. The input data will be all from start till num_outputs in second dimension.
num_outputs – number of columns (from last) in data to be used as output. If None, it will be caculated as featuresinputs.
input_steps – strides/number of steps in input data
forecast_step – must be greater than equal to 0, which t+ith value to use as target where i is the horizon. For time series prediction, we can say, which horizon to predict.
forecast_len – number of horizons/future values to predict.
known_future_inputs – Only useful if forecast_len>1. If True, this means, we know and use ‘future inputs’ while making predictions at t>0
output_steps – step size in outputs. If =2, it means we want to predict every second value from the targets
mask – If int, then the examples with these values in the output will be skipped. If array then it must be a boolean mask indicating which examples to include/exclude. The length of mask should be equal to the number of generated examples. The number of generated examples is difficult to prognose because it depend upon lookback, input_steps, and forecast_step. Thus it is better to provide an integer indicating which values in outputs are to be considered as invalid. Default is None, which indicates all the generated examples will be returned.
 Returns
x (numpy array of shape (examples, lookback, ins) consisting of) – input examples
prev_y (numpy array consisting of previous outputs)
y (numpy array consisting of target values)
Given following data consisting of input/output pairs
input1
input2
output1
output2
output 3
1
11
21
31
41
2
12
22
32
42
3
13
23
33
43
4
14
24
34
44
5
15
25
35
45
6
16
26
36
46
7
17
27
37
47
If we use following 2 time series as input
input1
input2
1
11
2
12
3
13
4
14
5
15
6
16
7
17
then
num_inputs
=2,lookback
=7,input_steps
=1and if we want to predict
output1
output2
output 3
27
37
47
then
num_outputs
=3,forecast_len
=1,forecast_step
=0,if we want to predict
output1
output2
output 3
28
38
48
then
num_outputs
=3,forecast_len
=1,forecast_step
=1,if we want to predict
output1
output2
output 3
27
37
47
28
38
48
then
num_outputs
=3,forecast_len
=2, horizon/forecast_step=0,if we want to predict
output1
output2
output 3
28
38
48
29
39
49
30
40
50
then
num_outputs
=3,forecast_len
=3,forecast_step
=1,if we want to predict
output2
38
39
40
then
num_outputs
=1,forecast_len
=3,forecast_step
=0if we predict
output2
39
then
num_outputs
=1,forecast_len
=1,forecast_step
=2if we predict
output2
39
40
41
then
num_outputs
=1,forecast_len
=3,forecast_step
=2If we use following two time series as input
input1
input2
1
11
3
13
5
15
7
17
then
num_inputs
=2,lookback
=4,input_steps
=2If the input is
input1
input2
1
11
2
12
3
13
4
14
5
15
6
16
7
17
and target/output is
output1
output2
output 3
25
35
45
26
36
46
27
37
47
This means we make use of
known future inputs
. This can be achieved using following configuration num_inputs=2, num_outputs=3, lookback=4, forecast_len=3, forecast_step=1, known_future_inputs=TrueThe general shape of output/target/label is (examples, num_outputs, forecast_len)
The general shape of inputs/x is (examples, lookback + forecast_len1, ….num_inputs)
Examples
>>> import numpy as np >>> from ai4water.utils.utils import prepare_data >>> num_examples = 50 >>> dataframe = np.arange(int(num_examples*5)).reshape(1, num_examples).transpose() >>> dataframe[0:10] array([[ 0, 50, 100, 150, 200], [ 1, 51, 101, 151, 201], [ 2, 52, 102, 152, 202], [ 3, 53, 103, 153, 203], [ 4, 54, 104, 154, 204], [ 5, 55, 105, 155, 205], [ 6, 56, 106, 156, 206], [ 7, 57, 107, 157, 207], [ 8, 58, 108, 158, 208], [ 9, 59, 109, 159, 209]]) >>> x, prevy, y = prepare_data(data, num_outputs=2, lookback=4, ... input_steps=2, forecast_step=2, forecast_len=4) >>> x[0] array([[ 0., 50., 100.], [ 2., 52., 102.], [ 4., 54., 104.], [ 6., 56., 106.]], dtype=float32) >>> y[0] array([[158., 159., 160., 161.], [208., 209., 210., 211.]], dtype=float32)
>>> x, prevy, y = prepare_data(data, num_outputs=2, lookback=4, ... forecast_len=3, known_future_inputs=True) >>> x[0] array([[ 0, 50, 100], [ 1, 51, 101], [ 2, 52, 102], [ 3, 53, 103], [ 4, 54, 104], [ 5, 55, 105], [ 6, 56, 106]]) # (7, 3) >>> # it is important to note that although lookback=4 but x[0] has shape of 7 >>> y[0] array([[154., 155., 156.], [204., 205., 206.]], dtype=float32) # (2, 3)
get_attributes
tensorflow, torch, numpy, matplotlib, random and other libraries are imported here once and then used all over ai4water. This file does not import anything from other files of ai4water.
 ai4water.backend.get_attributes(aus, what: str, retain: Optional[str] = None, case_sensitive: bool = False) dict [source]
gets all callable attributes of aus from what and saves them in dictionary with their names as keys. If case_sensitive is True, then the all keys are capitalized so that calling them becomes case insensitive. It is possible that some of the attributes of tf.keras.layers are callable but still not a valid layer, sor some attributes of tf.keras.losses are callable but still not valid losses, in that case the error will be generated from tensorflow. We are not catching those error right now.
 Parameters
aus – parent module
what (str) – child module/package
retain (str, optional (default=None)) – if duplicates of ‘what’ exist then whether to prefer class or function. For example, fastica and FastICA exist in sklearn.decomposition then if retain is ‘function’ then fastica will be kept, if retain is ‘class’ then FastICA is kept. If retain is None, then what comes later will overwrite the previously kept object.
case_sensitive (bool, optional (default=False)) – whether to consider what as casesensitive or not. In such a case, fastica and FastICA will both be saved as separate objects.
Example
>>> get_attributes(tf.keras, 'layers') # will get all layers from tf.keras.layers
murphy_diagram
 ai4water.utils.visualizations.murphy_diagram(observed: Union[list, ndarray, Series, DataFrame], predicted: Union[list, ndarray, Series, DataFrame], reference: Optional[Union[list, ndarray, Series, DataFrame]] = None, reference_model: Optional[Union[str, Callable]] = None, inputs=None, plot_type: str = 'scores', xaxis: str = 'theta', ax: Optional[Axes] = None, line_colors: Optional[tuple] = None, fill_color: str = 'lightgray', show: bool = True) Axes [source]

 Parameters
observed – observed or true values
predicted – model’s prediction
reference – reference prediction
reference_model – The model for reference prediction. Only relevent if reference is None and plot_type is diff. It can be callable or a string. If it is a string, then it can be any model name from sklearn.linear_model
inputs – inputs for reference model. Only relevent if reference_model is not None and plot_type is diff
plot_type – either of scores or diff
xaxis – either of theta or time
ax – the axis to use for plotting
line_colors – colors of line
fill_color – color to fill confidence interval
show – whether to show the plot or not
 Returns
matplotlib axes
Example
>>> import numpy as np >>> from ai4water.utils.visualizations import murphy_diagram >>> yy = np.random.randint(1, 1000, 100) >>> ff1 = np.random.randint(1, 1000, 100) >>> ff2 = np.random.randint(1, 1000, 100) >>> murphy_diagram(yy, ff1, ff2) ... >>> murphy_diagram(yy, ff1, ff2, plot_type="diff")
fdc_plot
 ai4water.utils.visualizations.fdc_plot(sim: Union[list, ndarray, Series, DataFrame], obs: Union[list, ndarray, Series, DataFrame], ax: Optional[Axes] = None, legend: bool = True, xlabel: str = 'Exceedence [%]', ylabel: str = 'Flow', show: bool = True) Axes [source]
Plots flow duration curve
 Parameters
sim – simulated flow
obs – observed flow
ax – axis on which to plot
legend – whether to apply legend or not
xlabel – label to set on xaxis. set to None for no xlabel
ylabel – label to set on yaxis
show – whether to show the plot or not
 Returns
matplotlib axes
Example
>>> import numpy as np >>> import matplotlib.pyplot as plt >>> from ai4water.utils.visualizations import fdc_plot >>> simulated = np.random.random(100) >>> observed = np.random.random(100) >>> fdc_plot(simulated, observed) >>> plt.show()
edf_plot
 ai4water.utils.visualizations.edf_plot(y: ndarray, num_points: int = 100, xlabel='Objective Value', marker: str = '', ax: Optional[Axes] = None, show: bool = True, **kwargs) Axes [source]
Plots the empirical distribution function.
 Parameters
y (np.ndarray) – array of values
num_points (int) –
xlabel (str) –
marker (str) –
ax (plt.Axes, optional) –
show (bool, optional (default=True)) – whether to show the plot or not
**kwargs – key word arguments for plot
 Return type
plt.Axes