check_feature_names#

pybear.base.check_feature_names(X, feature_names_in_, reset)#

Set or check the feature_names_in_ attribute.

pybear recommends setting reset=True in fit() and in the first call to partial_fit(). All other methods that validate X should set reset=False.

If reset is True:

Get the feature names from X and return. If X does not have valid string feature names, return None. feature_names_in_ passed as a parameter to the function does not matter.

If reset is False:

When feature_names_in_ exists and the checks of this module are satisfied then feature_names_in_ is always returned.

If feature_names_in_ exists (a header was seen on first fit) and:

X has a header: Validate that the feature names of X have the exact names and order as those seen during fit. If they are equal, return the feature names; if they are not equal, raise ValueError.

X does not have a header: Warn and return feature_names_in_.

If feature_names_in_ does not exist and the checks of this module are satisfied then None is always returned regardless of any header that the current X may have.

If feature_names_in_ does not exist (a header was not seen on first fit) and:

X has a header: Warn and return None.

X does not have a header: return None

Parameters:
Xarray_like of shape (n_samples, n_features) or (n_samples, )

The data from which to extract feature names. X will provide feature names if it is a dataframe constructed with a valid header of strings. Some objects that are known to yield feature names are pandas and polars dataframes. If X does not have a valid header then None is returned. Objects that are known to not yield feature names are numpy arrays and scipy sparse matrices/arrays. .

feature_names_in_numpy.ndarray[object] of shape (n_features, )

The feature names seen on the first fit, if an object with a valid header was passed on the first fit. None if feature names were not seen on the first fit.

resetbool

Whether to reset the feature_names_in_ attribute. If False, the feature names of X will be checked for consistency with feature names of data provided when reset was last True.

Returns:
feature_names_in_numpy.ndarray[object] | None

The validated feature names if feature names were seen the last time reset was set to True. None if the estimator/transformer did not see valid feature names at the first fit.

Examples

>>> from pybear.base import check_feature_names
>>> import pandas as pd
>>> import numpy as np
>>> data = np.random.randint(0, 10, (5, 3))
>>> feature_names_in_ = np.array(['a', 'b', 'c'])
>>> X = pd.DataFrame(data=data, columns=list('abc'))

# Verify accepts a valid header and returns it >>> check_feature_names(X, feature_names_in_=feature_names_in_, reset=False) array([‘a’, ‘b’, ‘c’], dtype=’<U1’)