Flower AI Summit 2026·April 15–16·London
Published

Announcing Flower Datasets 0.2.0

Share blogpost

The Flower Team is excited to announce the release of Flower Datasets 0.2.0!

Flower Datasets (flwr-datasets) is a library to quickly and easily create datasets for federated learning, federated evaluation, and federated analytics. It was created by the Flower Labs team that also created Flower: A Friendly Federated Learning Framework.

What's new?

  • Add label distribution visualization (#3451)

    Introduce visualize.plot_label_distributions and visualize.plot_comparison_label_distribution. To learn how to use the new functions visit link.

Label distribution visualization: Single plot Label distribution visualization: Comparison plot
  • Add label count utils (#3551)

    Introduce the public methods for compute_counts and compute_frequencies.

  • Improve speed of NaturalIdPartitioner (#3276)

    The datasets with a bigger number of unique IDs and a bigger number of samples especially benefit from this improvement.

  • Documentation improvements

  • Fix utils.divide_dataset for division for more than 2 divisions (#3192) Support the division into more than 2 divisions (exclusive parts of a dataset) in utils.divide_dataset. They incorrectly return a dataset of size 0 for the divisions with an index starting from 2.

  • Add fixed seed in train_test_split in examples (#3211)

    Add the random_state parameter to the train_test_split function in the examples to ensure reproducibility. The examples now have constant train-test splits along multiple rounds and runs.

  • Add telemetry (#3479)

    Start collecting data about the used datasets, partitioners, the type of load_ function in FederatedDataset, and the visualization utils. To learn more visit link.

  • Limit the datasets versions (#3607)

    Avoid passing the obligatory trust_remote_code=True to the datasets.load_dataset function by limiting the datasets library version to datasets = ">=2.14.6 <2.20.0". This is a temporary change that can be relaxed once we support passing the kwargs via FederatedDataset to datasets.load_dataset

Incompatible changes

  • Rename resplitter parameter and type to preprocessor (#3476)

    Also, simplify the naming of the DivideResplitter to Divider and MergeResplitter to Merger. The *Resplitter are now of type Preprocessor not Resplitter.

  • Rename resplitter in examples (#3485)

    Update the examples to use the new preprocessor parameter in FederatedDataset instead of the resplitter.

Share Blogpost