IdToSizeFncPartitioner¶
- class IdToSizeFncPartitioner(num_partitions: int, partition_id_to_size_fn: Callable)[source]¶
Bases:
Partitioner
Base class for the deterministic size partitioning based on the partition_id.
The client with partition_id has the following relationship regarding the number of samples.
partition_id_to_size_fn(partition_id) ~ number of samples for partition_id
If the function doesn’t transform the partition_id it’s a linear correlation between the number of sample for the partition and the value of partition_id. For instance, if the partition ids range from 1 to M, partition with id 1 gets 1 unit of data, client 2 gets 2 units, and so on, up to partition M which gets M units.
Note that size corresponding to the partition_id is deterministic, yet in case of different dataset shuffling the assignment of samples to partition_id will vary.
- Parameters:
num_partitions (int) – The total number of partitions that the data will be divided into.
partition_id_to_size_fn (Callable) – Function that defines the relationship between partition id and the number of samples.
Methods
Check if a dataset has been assigned to the partitioner.
load_partition
(partition_id)Load a single partition based on the partition index.
Attributes
Dataset property.
Total number of partitions.
Node id to the list of indices.
Node id to the number of samples.
- property dataset: Dataset¶
Dataset property.
- is_dataset_assigned() bool ¶
Check if a dataset has been assigned to the partitioner.
This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.
- Returns:
dataset_assigned – True if a dataset is assigned, otherwise False.
- Return type:
bool
- load_partition(partition_id: int) Dataset [source]¶
Load a single partition based on the partition index.
The number of samples is dependent on the partition partition_id.
- Parameters:
partition_id (int) – the index that corresponds to the requested partition
- Returns:
dataset_partition – single dataset partition
- Return type:
Dataset
- property num_partitions: int¶
Total number of partitions.
- property partition_id_to_indices: dict[int, list[int]]¶
Node id to the list of indices.
- property partition_id_to_size: dict[int, int]¶
Node id to the number of samples.