preprocessing#
ColumnDeduplicator (CDT) is a scikit-style transformer that removes duplicate columns from data, leaving behind one column out of a set of duplicate columns. |
|
A scikit-style transformer that identifies and manages the constant columns in a dataset. |
|
Remove examples that contain values whose frequencies within their respective feature fall below the specified count threshold. |
|
Convert all nan-like representations in a dataset to the same value. |
|
SlimPolyFeatures (SPF) performs a polynomial feature expansion on a dataset, where any feature produced that is a column of constants or is a duplicate of another column is omitted from the final output. |