IdToSizeFncPartitioner

class IdToSizeFncPartitioner(num_partitions: int, partition_id_to_size_fn: Callable)[source]

Bases: Partitioner

Base class for the deterministic size partitioning based on the partition_id.

The client with partition_id has the following relationship regarding the number of samples.

partition_id_to_size_fn(partition_id) ~ number of samples for partition_id

If the function doesn’t transform the partition_id it’s a linear correlation between the number of sample for the partition and the value of partition_id. For instance, if the partition ids range from 1 to M, partition with id 1 gets 1 unit of data, client 2 gets 2 units, and so on, up to partition M which gets M units.

Note that size corresponding to the partition_id is deterministic, yet in case of different dataset shuffling the assignment of samples to partition_id will vary.

Parameters:
  • num_partitions (int) – The total number of partitions that the data will be divided into.

  • partition_id_to_size_fn (Callable) – Function that defines the relationship between partition id and the number of samples.

Methods

is_dataset_assigned()

Check if a dataset has been assigned to the partitioner.

load_partition(partition_id)

Load a single partition based on the partition index.

Attributes

dataset

Dataset property.

num_partitions

Total number of partitions.

partition_id_to_indices

Node id to the list of indices.

partition_id_to_size

Node id to the number of samples.

property dataset: Dataset

Dataset property.

is_dataset_assigned() bool

Check if a dataset has been assigned to the partitioner.

This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.

Returns:

dataset_assigned – True if a dataset is assigned, otherwise False.

Return type:

bool

load_partition(partition_id: int) Dataset[source]

Load a single partition based on the partition index.

The number of samples is dependent on the partition partition_id.

Parameters:

partition_id (int) – the index that corresponds to the requested partition

Returns:

dataset_partition – single dataset partition

Return type:

Dataset

property num_partitions: int

Total number of partitions.

property partition_id_to_indices: dict[int, list[int]]

Node id to the list of indices.

property partition_id_to_size: dict[int, int]

Node id to the number of samples.