NaturalIdPartitioner

class NaturalIdPartitioner(partition_by: str)[source]

Bases: Partitioner

Partitioner for a dataset that can be divided by a column with partition ids.

Parameters:

partition_by (str) – The name of the column that contains the unique values of partitions.

Examples

“flwrlabs/shakespeare” dataset

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import NaturalIdPartitioner
>>>
>>> partitioner = NaturalIdPartitioner(partition_by="character_id")
>>> fds = FederatedDataset(dataset="flwrlabs/shakespeare",
>>>                        partitioners={"train": partitioner})
>>> partition = fds.load_partition(0)

“sentiment140” (aka Twitter) dataset

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import NaturalIdPartitioner
>>>
>>> partitioner = NaturalIdPartitioner(partition_by="user")
>>> fds = FederatedDataset(dataset="sentiment140",
>>>                        partitioners={"train": partitioner})
>>> partition = fds.load_partition(0)

Methods

is_dataset_assigned()

Check if a dataset has been assigned to the partitioner.

load_partition(partition_id)

Load a single partition corresponding to a single partition_id.

Attributes

dataset

Dataset property.

num_partitions

Total number of partitions.

partition_id_to_natural_id

Node id to corresponding natural id present.

property dataset: Dataset

Dataset property.

is_dataset_assigned() bool

Check if a dataset has been assigned to the partitioner.

This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.

Returns:

dataset_assigned – True if a dataset is assigned, otherwise False.

Return type:

bool

load_partition(partition_id: int) Dataset[source]

Load a single partition corresponding to a single partition_id.

The choice of the partition is based on unique integers assigned to each natural id present in the dataset in the partition_by column.

Parameters:

partition_id (int) – the index that corresponds to the requested partition

Returns:

dataset_partition – single dataset partition

Return type:

Dataset

property num_partitions: int

Total number of partitions.

property partition_id_to_natural_id: dict[int, str]

Node id to corresponding natural id present.

Natural ids are the unique values in partition_by column in dataset.