GroupedNaturalIdPartitioner¶
- class GroupedNaturalIdPartitioner(partition_by: str, group_size: int, mode: Literal['allow-smaller', 'allow-bigger', 'drop-reminder', 'strict'] = 'allow-smaller', sort_unique_ids: bool = False)[source]¶
Bases:
Partitioner
Partition dataset by creating groups of natural ids.
Conceptually, you can think of this partitioner as a way of creating an organization of x users instead of each user represetning a separate partition. You can change the nature of the problem from cross-device to cross-silo (cross organization).
- Parameters:
partition_by (str) – The name of the column that contains the unique values of partitions.
group_size (int) – The number of unique ids that will be placed in a single group.
mode (Literal["allow-smaller", "allow-bigger", "drop-reminder", ""strict"]) – The mode that will be used to handle the remainder of the unique ids. - “allow-smaller”: The last group can be smaller than the group_size. - “allow-bigger”: The first group can be bigger than the group_size. - “drop-reminder”: The last group will be dropped if it is smaller than the group_size. - “strict”: Raises a ValueError if the remainder is not zero. In this mode, you expect each group to have the same size.
sort_unique_ids (bool) – If True, the unique natural ids will be sorted before creating the groups.
Examples
Partition users in the “sentiment140” (aka Twitter) dataset into groups of two users following the default mode:
>>> from flwr_datasets import FederatedDataset >>> from flwr_datasets.partitioner import GroupedNaturalIdPartitioner >>> >>> partitioner = GroupedNaturalIdPartitioner(partition_by="user", group_size=2) >>> fds = FederatedDataset(dataset="sentiment140", >>> partitioners={"train": partitioner}) >>> partition = fds.load_partition(0)
Methods
Check if a dataset has been assigned to the partitioner.
load_partition
(partition_id)Load a single partition corresponding to a single partition_id.
Attributes
Dataset property.
Natural id to the corresponding partition id.
Total number of partitions.
Partition id to the corresponding group of natural ids present.
- property dataset: Dataset¶
Dataset property.
- is_dataset_assigned() bool ¶
Check if a dataset has been assigned to the partitioner.
This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.
- Returns:
dataset_assigned – True if a dataset is assigned, otherwise False.
- Return type:
bool
- load_partition(partition_id: int) Dataset [source]¶
Load a single partition corresponding to a single partition_id.
The choice of the partition is based on unique integers assigned to each natural id present in the dataset in the partition_by column.
- Parameters:
partition_id (int) – the index that corresponds to the requested partition
- Returns:
dataset_partition – single dataset partition
- Return type:
Dataset
- property natural_id_to_partition_id: dict[Any, int]¶
Natural id to the corresponding partition id.
- property num_partitions: int¶
Total number of partitions.
- property partition_id_to_natural_ids: dict[int, list[Any]]¶
Partition id to the corresponding group of natural ids present.
Natural ids are the unique values in partition_by column in dataset.