Imputation

class ai4water.preprocessing.imputation.Imputation(data: Union[pandas.core.frame.DataFrame, numpy.ndarray, list], method: str = 'KNNImputer', features=None, imputer_args: Optional[dict] = None)[source]

Bases: object

Implements imputation of missing values using a range of methods.

  • pandas:

    Pandas library provides two methods for filling input data. interpolate: filling by interpolation

    Example of imputer_args can be

    {‘method’: ‘spline’: ‘order’: 2}

    For detailed args to be passed see interpolate

    fillna:
    example of imputer_args can be

    {‘method’: ‘ffill’}

    For detailed args to be passed see fillna

  • sklearn:

    scikit-learn library provides 3 different imputation methods. SimplteImputer:

    For details see SimpleImputer

    IterativeImputer:

    imputer_args example: {‘n_nearest_features’: 2} For details see IterativeImputer

    KNNIMputer:

    All the args accepted by KNNImputer of sklearn can be passed as in imputer_args. imputer_args example: {‘n_neighbors’: 3}. For details KNNImputer

  • fancyimpute:

    knn: NuclearnNormMinimization SoftImpute Biscaler

transdim:

- :py:meth:`ai4water.preprocessing.imputation.Imputation.plot` plots the imputed values.
- :py:meth:`ai4water.preprocessing.imputation.Imputation.missing_indices` indices of missing data.

Examples

>>> df = pd.DataFrame([1,3,np.nan,  np.nan, 9, np.nan, 11])
>>> imputer = Imputation(df, method='fillna', imputer_args={'method': 'ffill'})
>>> imputer()
# change the imputation method
>>> imputer.method = 'interpolate'
>>> imputer(method='cubic')
# Now try with KNN imputation
>>> imputer.method = 'KNNImputer'
>>> imputer(n_neighbors=3)
__init__(data: Union[pandas.core.frame.DataFrame, numpy.ndarray, list], method: str = 'KNNImputer', features=None, imputer_args: Optional[dict] = None)[source]
Parameters
  • data – the data which contains missing values

  • method – the method to apply for missing

  • features – the features on which imputation is to be applied

  • imputer_args – arguments for underlying imputer function

call(*args, **kwargs)[source]
maybe_make_df(data)[source]
property method
missing_indices() dict[source]
plot(cols=None, st=0, en=None)[source]

cols: columns to plot from data st: int en: int

Example

>>> imputer.plot(cols=['in1', 'in2'], st=0, en=25)