nan_mask#
- pybear.utilities.nan_mask(X)#
This function combines pybear
nan_mask_numerical()andnan_mask_string(), giving a centralized location for masking numerical and non-numerical data.For full details, see the docs for nan_mask_numerical and nan_mask_string.
Briefly, when passing numerical or non-numerical data, this function accepts Python built-ins, numpy arrays, pandas dataframes/series, and polars dataframes/series of shape (n_samples, n_features) or (n_samples, ) and returns an identically sized numpy array of booleans indicating the locations of nan-like representations. Also, when passing numerical data, this function accepts scipy sparse matrices / arrays of all formats except dok and lil. In that case, a numpy boolean vector of shape identical to that of the sparse object’s ‘data’ attribute is returned. “nan-like representations” include, at least, np.nan, pandas.NA, pandas.NaT, None (of type None, not string “None”), and string representations of “nan”. This function does not accept any ragged Python built-ins, numpy recarrays, or numpy masked arrays.
- Parameters:
- XXContainer of shape (n_samples, n_features) or (n_samples,)
The object for which to locate nan-like representations.
- Returns:
- masknumpy.ndarray[bool]
shape (n_samples, n_features) or (n_samples,) or (n_non_zero_values, )
Indicates the locations of nan-like representations in X via the value boolean True. Values that are not nan-like are False.
Notes
- PythonTypes:
list | tuple | set | list[list] | tuple[tuple]]
- NumpyTypes:
numpy.ndarray
- PandasTypes:
pandas.DataFrame | pandas.Series]
- PolarsTypes:
polars.DataFrame | polars.Series]
- ScipySparseTypes:
ss._csr.csr_matrix | ss._csc.csc_matrix | ss._coo.coo_matrix | ss._dia.dia_matrix | ss._bsr.bsr_matrix | ss._csr.csr_array | ss._csc.csc_array | ss._coo.coo_array | ss._dia.dia_array | ss._bsr.bsr_array
- XContainer:
PythonTypes | NumpyTypes | PandasTypes | PolarsTypes | ScipySparseTypes
Examples
>>> from pybear.utilities import nan_mask >>> import numpy as np >>> X1 = np.arange(6).astype(np.float64) >>> X1[0] = np.nan >>> X1[-1] = np.nan >>> X1 array([nan, 1., 2., 3., 4., nan]) >>> nan_mask(X1) array([ True, False, False, False, False, True])
>>> X2 = list('vwxyz') >>> X2[0] = 'nan' >>> X2[2] = 'nan' >>> X2 ['nan', 'w', 'nan', 'y', 'z'] >>> nan_mask(X2) array([ True, False, True, False, False])