divide_dataset

divide_dataset(dataset: Dataset, division: list[float] | tuple[float, ...] | dict[str, float]) list[Dataset] | DatasetDict[source]

Divide the dataset according to the division.

The division support varying number of splits, which you can name. The splits are created from the beginning of the dataset.

Parameters:
  • dataset (Dataset) – Dataset to be divided.

  • division (Union[List[float], Tuple[float, ...], Dict[str, float]]) – Configuration specifying how the dataset is divided. Each fraction has to be >0 and <=1. They have to sum up to at most 1 (smaller sum is possible).

Returns:

divided_dataset – If division is List or Tuple then List[Dataset] is returned else if division is Dict then DatasetDict is returned.

Return type:

Union[List[Dataset], DatasetDict]

Examples

Use divide_dataset with division specified as a list.

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.utils import divide_dataset
>>>
>>> fds = FederatedDataset(dataset="mnist", partitioners={"train": 100})
>>> partition = fds.load_partition(0)
>>> division = [0.8, 0.2]
>>> train, test = divide_dataset(dataset=partition, division=division)

Use divide_dataset with division specified as a dict (this accomplishes the same goal as the example with a list above).

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.utils import divide_dataset
>>>
>>> fds = FederatedDataset(dataset="mnist", partitioners={"train": 100})
>>> partition = fds.load_partition(0)
>>> division = {"train": 0.8, "test": 0.2}
>>> train_test = divide_dataset(dataset=partition, division=division)
>>> train, test = train_test["train"], train_test["test"]