nan_mask#

pybear.utilities.nan_mask(X)#

This function combines pybear nan_mask_numerical() and nan_mask_string(), giving a centralized location for masking numerical and non-numerical data.

For full details, see the docs for nan_mask_numerical and nan_mask_string.

Briefly, when passing numerical or non-numerical data, this function accepts Python built-ins, numpy arrays, pandas dataframes/series, and polars dataframes/series of shape (n_samples, n_features) or (n_samples, ) and returns an identically sized numpy array of booleans indicating the locations of nan-like representations. Also, when passing numerical data, this function accepts scipy sparse matrices / arrays of all formats except dok and lil. In that case, a numpy boolean vector of shape identical to that of the sparse object’s ‘data’ attribute is returned. “nan-like representations” include, at least, np.nan, pandas.NA, pandas.NaT, None (of type None, not string “None”), and string representations of “nan”. This function does not accept any ragged Python built-ins, numpy recarrays, or numpy masked arrays.

Parameters:
XXContainer of shape (n_samples, n_features) or (n_samples,)

The object for which to locate nan-like representations.

Returns:
masknumpy.ndarray[bool]

shape (n_samples, n_features) or (n_samples,) or (n_non_zero_values, )

Indicates the locations of nan-like representations in X via the value boolean True. Values that are not nan-like are False.

Notes

PythonTypes:

list | tuple | set | list[list] | tuple[tuple]]

NumpyTypes:

numpy.ndarray

PandasTypes:

pandas.DataFrame | pandas.Series]

PolarsTypes:

polars.DataFrame | polars.Series]

ScipySparseTypes:

ss._csr.csr_matrix | ss._csc.csc_matrix | ss._coo.coo_matrix | ss._dia.dia_matrix | ss._bsr.bsr_matrix | ss._csr.csr_array | ss._csc.csc_array | ss._coo.coo_array | ss._dia.dia_array | ss._bsr.bsr_array

XContainer:

PythonTypes | NumpyTypes | PandasTypes | PolarsTypes | ScipySparseTypes

Examples

>>> from pybear.utilities import nan_mask
>>> import numpy as np
>>> X1 = np.arange(6).astype(np.float64)
>>> X1[0] = np.nan
>>> X1[-1] = np.nan
>>> X1
array([nan,  1.,  2.,  3.,  4., nan])
>>> nan_mask(X1)
array([ True, False, False, False, False,  True])
>>> X2 = list('vwxyz')
>>> X2[0] = 'nan'
>>> X2[2] = 'nan'
>>> X2
['nan', 'w', 'nan', 'y', 'z']
>>> nan_mask(X2)
array([ True, False,  True, False, False])