VerticalSizePartitioner¶

Bases: Partitioner

Creates vertical partitions by splitting features (columns) based on sizes.

The sizes refer to the number of columns after the drop_columns are dropped. shared_columns and active_party_column are excluded and added only after the size-based division.

Enables selection of “active party” column(s) and placement into a specific partition or creation of a new partition just for it. Also enables dropping columns and sharing specified columns across all partitions.

Parameters:

partition_sizes (Union[list[int], list[float]]) – A list where each value represents the size of a partition. list[int] -> each value represent an absolute number of columns. Size zero is allowed and will result in an empty partition if no shared columns are present. A list of floats -> each value represent a fraction total number of columns. Note that these values apply to columns without active_party_columns, shared_columns. They are additionally included in to the partition(s). drop_columns are also not counted toward the partition sizes. In case fo list[int]: sum(partition_sizes) == len(columns) - len(drop_columns) - len(shared_columns) - len(active_party_columns)
active_party_columns (Optional[Union[str, list[str]]]) – Column(s) (typically representing labels) associated with the “active party” (which can be the server).
active_party_columns_mode (Union[Literal[["add_to_first", "add_to_last", "create_as_first", "create_as_last", "add_to_all"], int]) –
Determines how to assign the active party columns:
- ”add_to_first”: Append active party columns to the first partition.
- ”add_to_last”: Append active party columns to the last partition.
- ”create_as_first”: Create a new partition at the start containing only these columns.
- ”create_as_last”: Create a new partition at the end containing only these columns.
- ”add_to_all”: Append active party columns to all partitions.
- int: Append active party columns to the specified partition index.
drop_columns (Optional[Union[str, list[str]]]) – Columns to remove entirely from the dataset before partitioning.
shared_columns (Optional[Union[str, list[str]]]) – Columns to duplicate into every partition after initial partitioning.
shuffle (bool) – Whether to shuffle the order of columns before partitioning.
seed (Optional[int]) – Random seed for shuffling columns. Has no effect if shuffle=False.

Examples

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import VerticalSizePartitioner
>>>
>>> partitioner = VerticalSizePartitioner(
...     partition_sizes=[8, 4, 2],
...     active_party_columns="income",
...     active_party_columns_mode="create_as_last"
... )
>>> fds = FederatedDataset(
...     dataset="scikit-learn/adult-census-income",
...     partitioners={"train": partitioner}
... )
>>> partitions = [fds.load_partition(i) for i in range(fds.partitioners["train"].num_partitions)]
>>> print([partition.column_names for partition in partitions])

Methods

`is_dataset_assigned`()	Check if a dataset has been assigned to the partitioner.
`load_partition`(partition_id)	Load a partition based on the partition index.

Attributes

`dataset`	Dataset property.
`num_partitions`	Number of partitions.

property dataset: Dataset¶: Dataset property.

is_dataset_assigned() → bool¶

Check if a dataset has been assigned to the partitioner.

This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.

Returns:: dataset_assigned – True if a dataset is assigned, otherwise False.
Return type:: bool

load_partition(partition_id: int) → Dataset[source]¶

Load a partition based on the partition index.

Parameters:: partition_id (int) – The index that corresponds to the requested partition.
Returns:: dataset_partition – Single partition of a dataset.
Return type:: Dataset

property num_partitions: int¶: Number of partitions.