inf_mask#

pybear.utilities.inf_mask(X)#

Return a boolean numpy array or vector indicating the locations of infinity-like values in the data.

“Infinity-like values” include, at least, numpy.inf, -numpy.inf, numpy.PINF, numpy.NINF, math.inf, -math.inf, str(‘inf’), str(‘-inf’), float(‘inf’), float(‘-inf’), decimal.Decimal(‘Infinity’), and ‘decimal.Decimal(‘-Infinity’).

This module accepts Python lists, tuples, and sets, numpy arrays, pandas series and dataframes, polars series and dataframes, and all scipy sparse matrices/arrays except dok and lil formats. This module does not accept ragged Python built-in containers, numpy recarrays, or numpy masked arrays.

In all cases, the given containers are ultimately coerced to a numpy representation of the data. The boolean mask is then generated from the numpy container. Numpy arrays are handled as is. Pandas objects are converted to a numpy array via the ‘to_numpy’ method. Polars objects are first cast to a pandas dataframe by the ‘to_pandas’ method. It is up to the user to ensure the particular infinity-like values you are using in a polars container are preserved when converted to a pandas dataframe by this method. The new pandas container is then handled in the same way as any other passed pandas container. For scipy sparse objects, the ‘data’ attribute (which is a numpy ndarray) is extracted.

In the cases of 1D and 2D shaped objects of shape (n_samples, ) or (n_samples, n_features), return an identically shaped boolean numpy array. In the cases of scipy sparse objects, return a boolean numpy vector of shape equal to that of the ‘data’ attribute of the sparse object.

‘dok’ is the only scipy sparse format that does not have a ‘data’ attribute, and for that reason it is not handled by inf_mask. scipy sparse ‘lil’ cannot be masked in an elegant way, and for that reason it is also not handled by inf_mask. All other scipy sparse formats only take numeric data.

This module relies heavily on numpy.isinf to locate infinity-like values in float dtype data. All infinity-like forms mentioned above are found by this function in float dtype data.

Of the third-party containers handled by this module, none of them allow for infinity-like values in integer dtype data. This makes for straightforward handling of these objects, in that every position in the returned boolean mask must be False.

String and object dtype data are not handled by the numpy.isinf function. Fortunately, at creation of a string dtype numpy array, if there are float or string infinity-like values in it almost all of them are coerced to str(‘inf’) or str(‘-inf’). The exception is decimal.Decimal(‘Infinity’) and decimal.Decimal(‘-Infinity’), which are coerced to str(‘Infinity’) and str(‘-Infinity’). Building a mask from this is straightforward. But object dtype numpy arrays do not make these conversions, so the float infinity-likes stay in the object array in float format. This poses a problem because numpy.isinf cannot take object formats, but it is very plausible that there are infinity-likes in it. So object dtype data are to cast to string dtype, which forces the conversion.

Parameters:
XXContainer of shape (n_samples, n_features) or (n_samples, )

The object for which to mask infinity-like representations.

Returns:
masknumpy.ndarray[bool]

shape (n_samples, n_features) or (n_samples, ) or (n_non_zero_values, ), Indicates the locations of infinity-like representations in X via the value boolean True. Values that are not infinity-like are False.

See also

numpy.isinf
numpy.inf
numpy.PINF
numpy.NINF
math.inf
decimal.Decimal

Notes

Type Aliases

PythonTypes:

list | tuple | set | list[list] | tuple[tuple]]

NumpyTypes:

numpy.ndarray

PandasTypes:

pandas.Series | pandas.DataFrame

PolarsTypes:

polars.Series | polars.DataFrame

ScipySparseTypes:

ss._csr.csr_matrix | ss._csc.csc_matrix | ss._coo.coo_matrix | ss._dia.dia_matrix | ss._bsr.bsr_matrix | ss._csr.csr_array | ss._csc.csc_array | ss._coo.coo_array | ss._dia.dia_array | ss._bsr.bsr_array

XContainer:

PythonTypes | NumpyTypes | PandasTypes | PolarsTypes | ScipySparseTypes

Examples

>>> from pybear.utilities import inf_mask
>>> import numpy as np
>>> X = np.arange(5).astype(np.float64)
>>> X[1] = float('inf')
>>> X[-1] = float('-inf')
>>> X
array([  0.,  inf,   2.,   3., -inf])
>>> inf_mask(X)
array([False,  True, False, False,  True])