Preprocessing

This package is used to manipulate Dataset.

Base classes

class sekupy.preprocessing.base.PreprocessingPipeline(name='pipeline', nodes=None, nodes_kwargs=None)[source]

Pipeline for chaining multiple preprocessing transformers.

This class allows combining multiple preprocessing steps into a single pipeline that can be applied to datasets sequentially.

Parameters:
  • name (str, optional) – Name of the pipeline, by default ‘pipeline’

  • nodes (list, optional) – List of transformer nodes or node names to include in the pipeline

  • nodes_kwargs (dict, optional) – Keyword arguments for nodes if nodes are specified as strings

nodes

List of transformer nodes in the pipeline

Type:

list

sliced_nodes

Copy of nodes list for internal use

Type:

list

add(node)[source]

Add a transformer node to the pipeline.

Parameters:

node (Transformer) – The transformer node to add to the pipeline

Returns:

Self, for method chaining

Return type:

PreprocessingPipeline

transform(ds)[source]

Transform the dataset through all nodes in the pipeline.

This method applies each transformer in the pipeline sequentially to the dataset.

Parameters:

ds (Dataset) – The dataset to transform

Returns:

The transformed dataset after applying all pipeline nodes

Return type:

Dataset

class sekupy.preprocessing.base.Transformer(name='transformer', **kwargs)[source]

Base class for data transformation components.

Transformers are used to preprocess datasets in the sekupy framework. They inherit from Node and provide functionality to transform datasets while tracking the applied transformations.

Parameters:
  • name (str, optional) – Name of the transformer, by default ‘transformer’

  • **kwargs (dict) – Additional parameters for the transformer

_mapper

Dictionary storing the transformer’s configuration

Type:

dict

map_transformer(ds)[source]

Map the transformer to the dataset’s preprocessing history.

This method records the transformer configuration in the dataset’s preprocessing attribute for reproducibility.

Parameters:

ds (Dataset) – The dataset to which the transformer mapping is applied

save(path=None)[source]

Save the node to a specified path.

Base implementation that should be overridden by subclasses to provide actual saving functionality.

Parameters:

path (str, optional) – Path where to save the node, by default None

Return type:

None

transform(ds)[source]

Transform the provided dataset.

This method applies the transformation to the dataset and records the transformation in the dataset’s preprocessing history.

Parameters:

ds (Dataset) – The dataset to transform

Returns:

The transformed dataset

Return type:

Dataset

Transformers