preprocessing#

ColumnDeduplicator

ColumnDeduplicator (CDT) is a scikit-style transformer that removes duplicate columns from data, leaving behind one column out of a set of duplicate columns.

InterceptManager

A scikit-style transformer that identifies and manages the constant columns in a dataset.

MinCountTransformer

Remove examples that contain values whose frequencies within their respective feature fall below the specified count threshold.

NanStandardizer

Convert all nan-like representations in a dataset to the same value.

SlimPolyFeatures

SlimPolyFeatures (SPF) performs a polynomial feature expansion on a dataset, where any feature produced that is a column of constants or is a duplicate of another column is omitted from the final output.