Preprocessing¶

This package is used to manipulate Dataset.

Base classes¶

class sekupy.preprocessing.base.PreprocessingPipeline(name='pipeline', nodes=None, nodes_kwargs=None)[source]¶

Pipeline for chaining multiple preprocessing transformers.

This class allows combining multiple preprocessing steps into a single pipeline that can be applied to datasets sequentially.

Parameters:

name (str, optional) – Name of the pipeline, by default ‘pipeline’
nodes (list, optional) – List of transformer nodes or node names to include in the pipeline
nodes_kwargs (dict, optional) – Keyword arguments for nodes if nodes are specified as strings

nodes¶

List of transformer nodes in the pipeline

Type:: list

sliced_nodes¶

Copy of nodes list for internal use

Type:: list

add(node)[source]¶

Add a transformer node to the pipeline.

Parameters:: node (Transformer) – The transformer node to add to the pipeline
Returns:: Self, for method chaining
Return type:: PreprocessingPipeline

transform(ds)[source]¶

Transform the dataset through all nodes in the pipeline.

This method applies each transformer in the pipeline sequentially to the dataset.

Parameters:: ds (Dataset) – The dataset to transform
Returns:: The transformed dataset after applying all pipeline nodes
Return type:: Dataset

class sekupy.preprocessing.base.Transformer(name='transformer', **kwargs)[source]¶

Base class for data transformation components.

Transformers are used to preprocess datasets in the sekupy framework. They inherit from Node and provide functionality to transform datasets while tracking the applied transformations.

Parameters:

name (str, optional) – Name of the transformer, by default ‘transformer’
**kwargs (dict) – Additional parameters for the transformer

_mapper¶

Dictionary storing the transformer’s configuration

Type:: dict

map_transformer(ds)[source]¶

Map the transformer to the dataset’s preprocessing history.

This method records the transformer configuration in the dataset’s preprocessing attribute for reproducibility.

Parameters:: ds (Dataset) – The dataset to which the transformer mapping is applied

save(path=None)[source]¶

Save the node to a specified path.

Base implementation that should be overridden by subclasses to provide actual saving functionality.

Parameters:: path (str, optional) – Path where to save the node, by default None
Return type:: None

transform(ds)[source]¶

Transform the provided dataset.

This method applies the transformation to the dataset and records the transformation in the dataset’s preprocessing history.

Parameters:: ds (Dataset) – The dataset to transform
Returns:: The transformed dataset
Return type:: Dataset

Transformers¶

Balancing transformers : balance samples in the Dataset.
Connectivity transformers : transform dataset for connectivity analyses
Filtering in time : time filtering.
Basic Functions :
Mapper : list of all Transformer s
Math Transformer : transformation based on mathematical formulas (e.g. fisher transformation)
Memory savers : for memory management
Normalizer : normalize features or samples.
Pipelines : used to concatenate different Transformer s
Regression : transform based on regression and orthogonalization
Scikit-Learn Wrapper : wrapper for sklearn package
Slicers : used to slice the dataset based on attributes