SlimPolyFeatures#

class pybear.preprocessing.SlimPolyFeatures(degree=2, *, min_degree=1, interaction_only=False, scan_X=True, keep='first', sparse_output=True, feature_name_combiner='as_indices', equal_nan=True, rtol=1e-05, atol=1e-08, n_jobs=None, job_size=50)#

Bases: FeatureMixin, GetParamsMixin, SetOutputMixin, SetParamsMixin, FitTransformMixin, ReprMixin

SlimPolyFeatures (SPF) performs a polynomial feature expansion on a dataset, where any feature produced that is a column of constants or is a duplicate of another column is omitted from the final output.

SPF follows the standard scikit-learn transformer API, and makes the standard transformer methods available: get_feature_names_out, fit, partial_fit, transform, fit_transform, set_params, and get_params. SPF also has a reset method which is covered elsewhere in the docs.

Numpy arrays, pandas dataframes, polars dataframes, and all scipy sparse objects (csr, csc, coo, lil, dia, dok, and bsr matrices and arrays) are accepted by partial_fit(), fit(), and transform(). SPF only accepts numerical data, and will raise an exception if passed non-numerical values.

SPF requires that data passed to it have no duplicate and no constant columns. This stipulation is explained in more detail later in the docs. If you are expanding non-binary integer or float data under this condition, there is low (but non-zero) likelihood that polynomial terms will be constant or duplicate. Therefore, SPF has the least utility for that type of data, and the user may be better off using scikit PolynomialFeatures, which has much lower cost of operation than SPF. SPF is of greatest benefit when it is used to perform polynomial expansion on one-hot encoded data, where there is high likelihood that polynomial terms are constant (all zeros) or duplicate.

A polynomial feature expansion generates all possible multiplicative combinations of the original features, typically from the zero-degree term up to and including a specified maximum degree. For example, if a dataset has two features, called ‘a’ and ‘b’, then the degree-2 polynomial expansion is [1, a, b, a^2, ab, b^2], where the column of ones is the zero-degree term, columns ‘a’ and ‘b’ are the first degree terms, and columns a^2, ab, and b^2 are the second degree terms. Generating these features for analysis is useful for finding non-linear relationships between the data and the target.

A conventional workflow would be to perform a polynomial expansion (that fits in memory) on data, remove duplicate and constant columns, and perform an analysis. The memory occupied by the columns that were eventually removed may have prevented us from doing a higher order expansion, which may have provided useful information. Unfortunately, even low-degree polynomial expansions of modest-sized datasets quickly grow to consume large amounts, or maybe even all, available RAM. This limits the number of polynomial terms that can be analyzed.

But there is opportunity to reduce memory footprint for a given expansion by not generating features that do not add value. Many polynomial expansions generate columns that are duplicate or constant (think columns of all zeros from interactions of a one-hot encoded feature). SPF finds columns such as these before the data is fully expanded and prevents them from ever appearing in the final array and occupying memory. This affords the opportunity to (possibly) work with more data and do higher order expansions than otherwise possible because the non-value-added columns are never even created. SPF also saves the time of manually finding and removing such columns after the expansion.

To robustly and tractably do this, SPF requires that the dataset to undergo expansion has no duplicate or constant columns. There is further discussion in these documents about how SPF handles these conditions in-situ during multiple partial fits, but ultimately SPF requires that the totality of seen data has no constants and no duplicates. When scan_X is True, SPF is able to find any columns in the original data that are constant/duplicate and prevent transform until the condition is fixed; it could only be fixed in-situ with more partial fits. To properly pre-condition your data beforehand, remove constant columns from your data with pybear InterceptManager, and remove duplicate columns from your data with pybear ColumnDeduplicator. If there are no constant or duplicate columns in the data, setting scan_X to False can reduce the cost of the polynomial expansion. See more discussion in the scan_X parameter.

During the fitting process, SPF learns what columns in the expansion would be constant and/or duplicate. SPF does the expansion out brute force, which can be expensive. Even with all that work, this preliminary expansion is not directly returned as a result at transform time. At transform, the polynomial expansion is built based on what was learned about constant and duplicate columns during the building of the preliminary expansion.

At transform time, SPF applies the rules it learned during fitting and only builds the polynomial features that could add value to the dataset. The internal construction of the polynomial expansion is always as a scipy sparse csc array to minimize the RAM footprint of the expansion. The expansion is also always done with float64 datatypes, regardless of datatype of the passed data, to prevent any overflow problems that might arise from multiplication of low bit number types. However, overflow is still possible with 64 bit datatypes, especially when the data has large values or the degree of the expansion is high. SPF HAS NO PROTECTIONS FOR OVERFLOW. IT IS UP TO THE USER TO AVOID OVERFLOW CONDITIONS AND VERIFY RESULTS.

Even though transform always constructs the expansion as scipy sparse csc, SPF will return the expansion in the format of the data passed to transform, unless instructed otherwise. If dense data is passed to transform, this negates the memory savings from building as sparse csc because the sparse format will be converted to the dense format for return. SPF can be instructed to return the output in sparse format via the sparse_output parameter, which preserves the lower memory footprint of the internal expansion. If sparse_output is set to True, the expansion is returned as scipy sparse csr array. If sparse_output is set to False, then SPF will convert the polynomial expansion from a sparse csc array to the same format passed to transform.

The SPF partial_fit method allows for incremental fitting. Through this method, even if the data is bigger-than-memory, SPF is able to learn what columns in X are constant/duplicate and what columns in the expansion are constant/duplicate, and carry out instructions to build the expansion batch-wise. This makes SPF amenable to batch-wise fitting and transforming via dask_ml Incremental and ParallelPostFit wrappers.

SPF takes parameters to set the minimum (min_degree) and maximum (degree) degrees of the polynomial terms produced during the expansion. The edge case of returning only the 0-degree column of ones is disallowed. SPF never returns the 0-degree column of ones under any circumstance. The lowest degree SPF ever returns is degree one (the original data in addition to whatever other order terms are required). SPF terminates if min_degree is set to zero (minimum allowed setting is 1). To append a zero-degree column to your data, use pybear InterceptManager after using SPF. Also, SPF does not allow the no-op case of degree = 1, where the original data would be returned unchanged without any polyomial features. The minimum setting for degree is 2.

When searching for constant and duplicate columns, SPF ONLY LOOKS IN WHAT IS TO BE RETURNED. That is, if you specify min_degree to be 1, then SPF will compare the created polynomial terms against the original data and themselves. If you specify min_degree to be 2 or greater, SPF will only compare the created polynomial terms against themselves. The ramifications of the latter case is that if you were to merge your original data onto the SPF output, there may be duplicates.

During fitting, SPF is able to tolerate constants and duplicates in the passed data. While this condition exists, however, SPF remains in a state where it waits for further partial fits to remedy the situation and does no-ops with warnings on most other actions (such as calls to attributes, transform, amongst others.) Only when the internal state of SPF is satisfied that there are no constant or duplicate columns in the training data will SPF allow access to the other functionality.

SPF MUST ONLY TRANSFORM DATA IT HAS BEEN FITTED ON. TRANSFORMATION OF ANY OTHER DATA WILL PRODUCE OUTPUT THAT MAY CONTAIN CONSTANTS, DUPLICATES, AND NONSENSICAL RESULTS.

SPF has 5 property attributes that are accessible at any point after fitting. They can only be accessed if there are no constants or duplicates in the training data, otherwise attempts to access them will result in a no-op that gives a warning and returns None.

Once SPF is fit, setting of most params via set_params() is blocked. This is to prevent SPF from failing because of new learning states that cannot be reconciled with earlier learning states. The only parameters that can be set after a fit are keep, n_jobs, job_size, sparse_output, and feature_name_combiner. SPF has a reset() method that resets the data-dependent state of SPF. This allows for re-initializing the instance and setting different learning parameters without forcing the user to create a new instance.

Parameters:
degreeint, default = 2

The maximum polynomial degree of the generated features. The minimum value accepted by SPF is 2; the no-op case of simply returning the original degree-one data is not allowed.

min_degreeint, default = 1

The minimum polynomial degree of the generated features. Polynomial terms with degree below min_degree are not included in the final output array. The minimum value accepted by SPF is 1; SPF cannot be used to generate a zero-degree column (a column of all ones).

interaction_onlybool, default = False

If True, only interaction features are produced, that is, polynomial features that are products of ‘degree’ distinct input features. Terms with power of 2 or higher for any feature are excluded. If False, produce the full polynomial expansion.

Consider 3 features ‘a’, ‘b’, and ‘c’. If interaction_only is True, min_degree is 1, and degree is 2, then only the first degree interaction terms [‘a’, ‘b’, ‘c’] and the second degree interaction terms [‘ab’, ‘ac’, ‘bc’] are returned in the polynomial expansion.

scan_Xbool, default = True

SPF requires that the data being fit has no columns of constants and no duplicate columns. When scan_X is True, SPF does not assume that the analyst knows these states of the data and diagnoses them during fitting, which can be very expensive to do, especially finding duplicate columns. If the analyst knows that there are no constant or duplicate columns in the data, setting this to False can greatly reduce the cost of the polynomial expansion. When in doubt, pybear recommends setting this to True (the default). When this is False, it is possible to pass columns of constants or duplicates, but SPF will continue to operate under the assumptions of the stated design requirement, and the output will be nonsensical.

keepLiteral[‘first’, ‘last’, ‘random’], default = ‘first’

The strategy for keeping a single representative from a set of identical columns in the polynomial expansion. This is overruled if a polynomial feature is a duplicate of one of the original features, as the original feature will always be kept and the polynomial duplicates will always be dropped. One of SPF’s design rules is to never alter the originally passed data, so the original feature will always be kept. Under SPF’s design rule that the original data has no duplicate columns, an expansion feature cannot be identical to 2 of the original features. In all cases where the duplicates are only within the polynomial expansion, ‘first’ retains the column left-most in the expansion (lowest degree); ‘last’ keeps the column right-most in the expansion (highest degree); ‘random’ keeps a single randomly selected feature of the set of duplicates.

sparse_outputbool, default = True

If set to True, the polynomial expansion is returned from transform as a scipy sparse csr array. If set to False, the polynomial expansion is returned in the same format as passed to transform.

feature_name_combinerFeatureNameCombinerType, default = ‘as_indices’

Sets the naming convention for the created polynomial features. This does not set nor change any original feature names that may have been seen during fitting on containers that have a header.

feature_name_combiner must be literal ‘as_feature_names’, literal ‘as_indices’, or a user-defined function (callable) for mapping polynomial column index combination tuples to polynomial feature names.

If literal ‘as_feature_names’ is used, SPF generates new polynomial feature names based on the feature names in the original data. For example, if the feature names of X are [‘x0’, ‘x1’, …, ‘xn’] and the polynomial column index tuple is (1, 1, 3), then the polynomial feature name is ‘x1^2_x3’.

If the default literal ‘as_indices’ is used, SPF generates new polynomial feature names based on the polynomial column index tuple itself. For example, if the polynomial column index tuple is (2, 2, 4), then the polynomial feature name is ‘(2, 2, 4)’.

If a user-defined callable is passed, it must:

  1. Accept 2 arguments
    1. a 1D vector of strings that contains the original feature

      names of X, as is used internally in SPF,

    2. the polynomial column combination tuple, which is a tuple

      of integers of variable length. The minimum length of the tuple must be min_degree, and the maximum length must be degree, with each integer falling in the range of [0, n_features_in_-1]

  2. Return a string that
    1. is not a duplicate of any originally seen feature name

    2. is not a duplicate of any other polynomial feature name

equal_nanbool, default = True

When comparing two columns for equality:

If equal_nan is True, assume that a nan value would otherwise be the same as the compared non-nan counterpart, or if both compared values are nan, consider them as equal (contrary to the default numpy handling of nan, where numpy.nan != numpy.nan). If equal_nan is False and either one or both of the values in the compared pair of values is/are nan, consider the pair to be not equivalent, thus making the column pair not equal. This is in line with the normal numpy handling of nan values.

When assessing if a column is constant:

If equal_nan is True, assume any nan values equal the mean of all non-nan values in the respective column. If equal_nan is False, any nan-values could never take the value of the mean of the non-nan values in the column, making the column not constant.

rtolnumbers.Real, default = 1e-5

The relative difference tolerance for equality. Must be a non-boolean, non-negative, real number. See numpy.allclose.

atolnumbers.Real, default = 1e-8

The absolute difference tolerance for equality. Must be a non-boolean, non-negative, real number. See numpy.allclose.

n_jobsint | None, default=None

The number of joblib parallel jobs to use when looking for duplicate columns. The default is to use processes, but can be overridden externally using a joblib parallel_config context manager. The default number of jobs is None, which uses the joblib default setting. To get maximum speed benefit, pybear recommends using -1, which means use all processors.

job_sizeint, default=50

The number of columns to send to a joblib job. Must be an integer greater than or equal to 2. This allows the user to optimize CPU utilization for their particular circumstance. Long, thin datasets should use fewer columns, and wide, flat datasets should use more columns. Bear in mind that the columns sent to joblib jobs are deep copies of the original data, and larger job sizes increase RAM usage. Note that joblib is only engaged in scanning the original data when the number of columns in the data is at least 2*job_size. Also, joblib is only engaged in scanning the polynomial expansion if the number of columns in it is at least 2*job_size. For example, if job_size is 10, data with 20 or more columns will be processed with joblib, data with 19 or fewer columns will be processed linearly.

Attributes:
n_features_in_int

The number of features in the fitted data, i.e., number of features before expansion.

feature_names_in_numpy.ndarray[object]

The names of the features as seen during fitting. Only accessible if X is passed to partial_fit() or fit() in a container that has a header.

poly_combinations_

Get the poly_combinations_ attribute.

poly_constants_

Get the poly_constants_ attribute.

poly_duplicates_

Get the poly_duplicates_ attribute.

dropped_poly_duplicates_

Get the dropped_poly_duplicates_ attribute.

kept_poly_duplicates_

Get the kept_poly_duplicates_ attribute.

Methods

fit(X[, y])

Perform a single fitting on a dataset.

fit_transform(X[, y])

Fit to data, then transform it.

get_feature_names_out([input_features])

Get the feature names for the output of transform.

get_metadata_routing()

Get metadata routing is not implemented.

get_params([deep])

Get parameters for this instance.

partial_fit(X[, y])

Incrementally train the SPF transformer instance on one or more batches of data.

reset()

Reset the internal data-dependent state of SPF.

score(X[, y])

Dummy method to spoof dask Incremental and ParallelPostFit wrappers.

set_output([transform])

Set the output container when the transform and fit_transform methods of the transformer are called.

set_params(**params)

Set the parameters of the SPF instance.

transform(X)

Apply the expansion footprint that was learned during fitting to the given data.

See also

numpy.ndarray
pandas.DataFrame
polars.DataFrame
scipy.sparse
numpy.allclose
numpy.array_equal

Notes

Concerning the handling of nan-like representations. SPF accepts data in the form of numpy arrays, pandas dataframes, polars dataframes, and scipy sparse matrices/arrays. Regardless of the format of the passed data, during the construction of the preliminary (learning) expansion and during transform columns are extracted from the data as a numpy array with float64 dtype (see below for more detail about how scipy sparse is handled.) After the conversion to numpy array and prior to calculating the products of the columns in the extraction, SPF identifies any nan-like representations in the extracted numpy array and standardizes all of them to numpy.nan. The user is advised that whatever is used to indicate ‘not-a-number’ in the original data must first survive the conversion to numpy array and then be recognized by SPF as nan-like, so that SPF can standardize it to numpy.nan. nan-like representations that are recognized by SPF include, at least, numpy.nan, pandas.NA, None (of type None, not string ‘None’), and string representations of ‘nan’ (not case sensitive).

Concerning the handling of infinity. SPF has no special handling for the various infinity-types, e.g, numpy.inf, -numpy.inf, float(‘inf’), float(‘-inf’), etc. This is a design decision to not force infinity values to numpy.nan. SPF falls back to the native handling of these values for Python and numpy. Specifically, numpy.inf==numpy.inf and float(‘inf’)==float(‘inf’).

Concerning the handling of scipy sparse arrays. When constructing the preliminary (learning) expansion and during transform, the columns extracted from the data are converted to dense numpy arrays via the ‘toarray’ method, then undergo multiplication. This a compromise that causes some memory expansion but allows for efficient handling of polynomial calculations.

Type Aliases

XContainer:

numpy.ndarray | pandas.DataFrame | polars.DataFrame | ss._csr.csr_matrix | ss._csc.csc_matrix | ss._coo.coo_matrix | ss._dia.dia_matrix | ss._lil.lil_matrix | ss._dok.dok_matrix | ss._bsr.bsr_matrix | ss._csr.csr_array | ss._csc.csc_array | ss._coo.coo_array | ss._dia.dia_array | ss._lil.lil_array | ss._dok.dok_array | ss._bsr.bsr_array

FeatureNameCombinerType:

Callable[[Sequence[str], tuple[int, …]], str], | Literal[‘as_feature_names’, ‘as_indices’]

CombinationsType:

tuple[tuple[int, …], …]

PolyDuplicatesType:

list[list[tuple[int, …]]]

KeptPolyDuplicatesType:

dict[tuple[int, …], list[tuple[int, …]]]

DroppedPolyDuplicatesType:

dict[tuple[int, …], tuple[int, …]]

PolyConstantsType:

dict[tuple[int, …], Any]

FeatureNamesInType:

numpy.ndarray[object]

Examples

>>> from pybear.preprocessing import SlimPolyFeatures as SPF
>>> import numpy as np
>>> trf = SPF(
...     degree=2, min_degree=1, interaction_only=False,
...     sparse_output=False, feature_name_combiner='as_indices'
... )
>>> X = np.array([[0,1],[0,1],[1,1],[1,0]], dtype=np.uint8)
>>> out = trf.fit_transform(X)
>>> out
array([[0., 1., 0.],
       [0., 1., 0.],
       [1., 1., 1.],
       [1., 0., 0.]])
>>> trf.n_features_in_
2
>>> trf.poly_combinations_
((0, 1),)
>>> trf.poly_constants_
{}
>>> trf.poly_duplicates_
[[(0,), (0, 0)], [(1,), (1, 1)]]
>>> trf.kept_poly_duplicates_
{(0,): [(0, 0)], (1,): [(1, 1)]}
>>> trf.dropped_poly_duplicates_
{(0, 0): (0,), (1, 1): (1,)}
>>> trf.get_feature_names_out()
array(['x0', 'x1', '(0, 1)'], dtype=object)
property dropped_poly_duplicates_#

Get the dropped_poly_duplicates_ attribute.

A dictionary whose keys are the tuples that are removed from the polynomial expansion because they produced a duplicate of another column. The values of the dictionary are the tuples of indices of the respective duplicate that was kept.

Returns:
dropped_poly_duplicates_DroppedPolyDuplicatesType | None

keys: the poly combinations that were dropped from the expansion; values: the respective duplicate that was kept.

fit(X, y=None)#

Perform a single fitting on a dataset.

Parameters:
XXContainer of shape (n_samples, n_features)

Required. The data to undergo polynomial expansion.

yAny, default=None

Ignored. The target for the data.

Returns:
selfobject

The fitted SlimPolyFeatures instance.

fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray_like of shape (n_samples, n_features)

Required. The data.

yarray_like of shape (n_samples, n_outputs) or (n_samples,)

Optional, default=None. Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_trarray_like of shape (n_samples, n_features_new)

Transformed array.

get_feature_names_out(input_features=None)#

Get the feature names for the output of transform.

Use input_features and feature_name_combiner to build the feature names for the polynomial component of the transformed data.

Parameters:
input_featuresSequence[str] | None, default = None

Externally provided feature names for the fitted data, not the transformed data.

If input_features is None:

if feature_names_in_ is defined, then feature_names_in_ is used as the input features.

if feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].

If input_features is not None:

if feature_names_in_ is not defined, then input_features is used as the input features.

if feature_names_in_ is defined, then input_features must exactly match the features in feature_names_in_.

Returns:
feature_names_outFeatureNamesInType

The feature names of the transformed data.

get_metadata_routing()#

Get metadata routing is not implemented.

get_params(deep=True)#

Get parameters for this instance.

The ‘instance’ may be a pybear estimator, transformer, or a gridsearch module that wraps a nested estimator or pipeline.

Parameters:
deepbool, default = True

For instances that do not have nested instances in an estimator attribute (such as estimators or transformers), this parameter is ignored and the same (full) set of parameters for the instance is returned regardless of the value of this parameter.

For instances that have nested instances (such as a GridSearch instance with a nested estimator or pipeline) in the estimator attribute, deep=False will only return the parameters for the wrapping instance. For example, a GridSearch module wrapping an estimator will only return parameters for the GridSearch instance, ignoring the parameters of the nested instance. When deep=True, this method returns the parameters of the wrapping instance as well as the parameters of the nested instance. When the nested instance is a single estimator, the full set of parameters for the single estimator are returned in addition to the parameters of the wrapping instance. If the nested object is a pipeline, the parameters of the pipeline and the parameters of each of the steps in the pipeline are returned in addition to the parameters of the wrapping instance. The estimator’s parameters are prefixed with estimator__.

Returns:
paramsdict[str, Any]

Parameter names mapped to their values.

property kept_poly_duplicates_#

Get the kept_poly_duplicates_ attribute.

A dictionary whose keys are tuples of the indices of the columns of X that produced a polynomial column that was kept from the sets of duplicates. The dictionary values are lists of the tuples of indices that created polynomial columns that were duplicate of the column indicated by the dictionary key, but were removed from the polynomial expansion.

Returns:
kept_poly_duplicates_KeptPolyDuplicatesType | None

A dictionary whose keys are the columns that were kept out of the sets of duplicates and the values are lists of the columns that were duplicates of the respective key.

partial_fit(X, y=None)#

Incrementally train the SPF transformer instance on one or more batches of data.

Parameters:
XXContainer of shape (n_samples, n_features)

Required. The data to undergo polynomial expansion.

yAny, default=None

Ignored. The target for the data.

Returns:
selfobject

The fitted SlimPolyFeatures instance.

property poly_combinations_#

Get the poly_combinations_ attribute.

The polynomial column combinations from X that are in the polynomial expansion part of the final output. An example might be ((0,0), (0,1), …), where each tuple holds the column indices from X that are multiplied to produce a feature in the polynomial expansion. This matches one-to-one with the created features, and similarly does not have any combinations that are excluded from the polynomial expansion for being duplicate or constant.

Returns:
poly_combinations_CombinationsType | None

The polynomial column combinations from X that are in the polynomial expansion part of the final output.

property poly_constants_#

Get the poly_constants_ attribute.

A dictionary whose keys are tuples of indices in the original data that produced a column of constants in the polynomial expansion. The dictionary values are the constant values in those columns. For example, if an expansion has a constant column that was produced by multiplying the second and third columns in X (index positions 1 and 2, respectively) and the constant value is 0, then constant_columns_ will be {(1,2): 0}. If there are no constant columns, then constant_columns_ is an empty dictionary. These are always excluded from the polynomial expansion.

Returns:
poly_constants_PolyConstantsType | None

keys: the poly combinations that produced a column of constants; values: the constant value for that poly feature. These are always omitted from the final expansion.

property poly_duplicates_#

Get the poly_duplicates_ attribute.

A list of the groups of identical polynomial features, indicated by tuples of their zero-based column index positions in the originally fit data. Columns from the original data itself can be in a group of duplicates, along with any duplicates from the polynomial expansion. For example, poly_duplicates_ for some dataset might look like: [[(1,), (2,3), (2,4)], [(5,6), (5,7), (6,7)]]

Returns:
poly_duplicates_PolyDuplicatesType | None

The groups of identical polynomial features.

reset()#

Reset the internal data-dependent state of SPF.

__init__ parameters are not changed. reset is part of the external API because setting most params after a partial fit is blocked, and this allows for re-initializing the instance without forcing the user to create a new instance.

Returns:
selfobject

The SlimPolyFeatures instance.

score(X, y=None)#

Dummy method to spoof dask Incremental and ParallelPostFit wrappers.

Verified must be here for dask wrappers.

Parameters:
XAny

The data. Ignored.

yAny, default = None

The target for the data. Ignored.

Returns:
None
set_output(transform=None)#

Set the output container when the transform and fit_transform methods of the transformer are called.

Parameters:
transformLiteral[‘default’, ‘pandas’, ‘polars’] | None,

The default value for the transform parameter is None.

Configure the output of transform and fit_transform.

‘default’: Default output format (numpy array)

‘pandas’: pandas dataframe output

‘polars’: polars dataframe output

None: The output container is the same as the given container.

Returns:
selfobject

The transformer instance.

set_params(**params)#

Set the parameters of the SPF instance.

Pass the exact parameter name and its value as a keyword argument to the set_params method call. Or use ** dictionary unpacking on a dictionary keyed with exact parameter names and the new parameter values as the dictionary values. Valid parameter keys can be listed with get_params(). Note that you can directly set the parameters of MinCountTransformer.

Once SPF is fitted, only the following parameters can be changed via set_params: sparse_output, keep, n_jobs, job_size, and feature_name_combiner. All other parameters are blocked. To use different parameters without creating a new instance of SPF, call SPF reset() on the instance, otherwise create a new SPF instance.

Parameters:
**paramsdict[str, Any]

SlimPolyFeatures parameters.

Returns:
selfobject

The SlimPolyFeatures instance.

transform(X)#

Apply the expansion footprint that was learned during fitting to the given data.

pybear strongly urges that only data that was seen during fitting be passed here.

Parameters:
XXContainer of shape (n_samples, n_features)

The data to undergo polynomial expansion.

Returns:
X_trXContainer of shape (n_samples, n_transformed_features)

The polynomial feature expansion for X.