GroupedNaturalIdPartitioner#

class GroupedNaturalIdPartitioner(partition_by: str, group_size: int, mode: Literal['allow-smaller', 'allow-bigger', 'drop-reminder', 'strict'] = 'allow-smaller', sort_unique_ids: bool = False)[source]#

Bases: Partitioner

Partition dataset by creating groups of natural ids.

Conceptually, you can think of this partitioner as a way of creating an organization of x users instead of each user represetning a separate partition. You can change the nature of the problem from cross-device to cross-silo (cross organization).

Parameters:
  • partition_by (str) – The name of the column that contains the unique values of partitions.

  • group_size (int) – The number of unique ids that will be placed in a single group.

  • mode (Literal["allow-smaller", "allow-bigger", "drop-reminder", ""strict"]) – The mode that will be used to handle the remainder of the unique ids. - “allow-smaller”: The last group can be smaller than the group_size. - “allow-bigger”: The first group can be bigger than the group_size. - “drop-reminder”: The last group will be dropped if it is smaller than the group_size. - “strict”: Raises a ValueError if the remainder is not zero. In this mode, you expect each group to have the same size.

  • sort_unique_ids (bool) – If True, the unique natural ids will be sorted before creating the groups.

Examples

Partition users in the “sentiment140” (aka Twitter) dataset into groups of two users following the default mode:

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import GroupedNaturalIdPartitioner
>>>
>>> partitioner = GroupedNaturalIdPartitioner(partition_by="user", group_size=2)
>>> fds = FederatedDataset(dataset="sentiment140",
>>>                        partitioners={"train": partitioner})
>>> partition = fds.load_partition(0)

Methods

is_dataset_assigned()

Check if a dataset has been assigned to the partitioner.

load_partition(partition_id)

Load a single partition corresponding to a single partition_id.

Attributes

dataset

Dataset property.

natural_id_to_partition_id

Natural id to the corresponding partition id.

num_partitions

Total number of partitions.

partition_id_to_natural_ids

Partition id to the corresponding group of natural ids present.

property dataset: Dataset#

Dataset property.

is_dataset_assigned() bool#

Check if a dataset has been assigned to the partitioner.

This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.

Returns:

dataset_assigned – True if a dataset is assigned, otherwise False.

Return type:

bool

load_partition(partition_id: int) Dataset[source]#

Load a single partition corresponding to a single partition_id.

The choice of the partition is based on unique integers assigned to each natural id present in the dataset in the partition_by column.

Parameters:

partition_id (int) – the index that corresponds to the requested partition

Returns:

dataset_partition – single dataset partition

Return type:

Dataset

property natural_id_to_partition_id: dict[Any, int]#

Natural id to the corresponding partition id.

property num_partitions: int#

Total number of partitions.

property partition_id_to_natural_ids: dict[int, list[Any]]#

Partition id to the corresponding group of natural ids present.

Natural ids are the unique values in partition_by column in dataset.