GroupedNaturalIdPartitioner¶

class GroupedNaturalIdPartitioner(partition_by: str, group_size: int, mode: Literal['allow-smaller', 'allow-bigger', 'drop-reminder', 'strict'] = 'allow-smaller', sort_unique_ids: bool = False)[source]¶

Bases: Partitioner

Partition dataset by creating groups of natural ids.

Conceptually, you can think of this partitioner as a way of creating an organization of x users instead of each user represetning a separate partition. You can change the nature of the problem from cross-device to cross-silo (cross organization).

Parameters:

partition_by (str) – The name of the column that contains the unique values of partitions.
group_size (int) – The number of unique ids that will be placed in a single group.
mode (Literal["allow-smaller", "allow-bigger", "drop-reminder", ""strict"]) – The mode that will be used to handle the remainder of the unique ids. - “allow-smaller”: The last group can be smaller than the group_size. - “allow-bigger”: The first group can be bigger than the group_size. - “drop-reminder”: The last group will be dropped if it is smaller than the group_size. - “strict”: Raises a ValueError if the remainder is not zero. In this mode, you expect each group to have the same size.
sort_unique_ids (bool) – If True, the unique natural ids will be sorted before creating the groups.

Examples

Partition users in the “sentiment140” (aka Twitter) dataset into groups of two users following the default mode:

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import GroupedNaturalIdPartitioner
>>>
>>> partitioner = GroupedNaturalIdPartitioner(partition_by="user", group_size=2)
>>> fds = FederatedDataset(dataset="sentiment140",
>>>                        partitioners={"train": partitioner})
>>> partition = fds.load_partition(0)

Methods

`is_dataset_assigned`()	Check if a dataset has been assigned to the partitioner.
`load_partition`(partition_id)	Load a single partition corresponding to a single partition_id.

Attributes

`dataset`	Dataset property.
`natural_id_to_partition_id`	Natural id to the corresponding partition id.
`num_partitions`	Total number of partitions.
`partition_id_to_natural_ids`	Partition id to the corresponding group of natural ids present.

property dataset: Dataset¶: Dataset property.

is_dataset_assigned() → bool¶

Check if a dataset has been assigned to the partitioner.

This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.

Returns:: dataset_assigned – True if a dataset is assigned, otherwise False.
Return type:: bool

load_partition(partition_id: int) → Dataset[source]¶

Load a single partition corresponding to a single partition_id.

The choice of the partition is based on unique integers assigned to each natural id present in the dataset in the partition_by column.

Parameters:: partition_id (int) – the index that corresponds to the requested partition
Returns:: dataset_partition – single dataset partition
Return type:: Dataset

property natural_id_to_partition_id: dict[Any, int]¶: Natural id to the corresponding partition id.

property num_partitions: int¶: Total number of partitions.

property partition_id_to_natural_ids: dict[int, list[Any]]¶

Partition id to the corresponding group of natural ids present.

Natural ids are the unique values in partition_by column in dataset.