Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
Flower Datasets 0.5.0
Logo
Flower Datasets 0.5.0

Tutorial

  • Quickstart
  • Use Partitioners
  • Visualize Label Distribution

How-to guides

  • Installation
  • Use with PyTorch
  • Use with TensorFlow
  • Use with NumPy
  • Use with Local Data
  • Disable/Enable Progress Bar

API reference

  • flwr_datasets
    • FederatedDataset
    • metrics
      • compute_counts
      • compute_frequencies
    • partitioner
      • ContinuousPartitioner
      • DirichletPartitioner
      • DistributionPartitioner
      • ExponentialPartitioner
      • GroupedNaturalIdPartitioner
      • IdToSizeFncPartitioner
      • IidPartitioner
      • InnerDirichletPartitioner
      • LinearPartitioner
      • NaturalIdPartitioner
      • Partitioner
      • PathologicalPartitioner
      • ShardPartitioner
      • SizePartitioner
      • SquarePartitioner
      • VerticalEvenPartitioner
      • VerticalSizePartitioner
    • preprocessor
      • Divider
      • Merger
    • utils
      • concatenate_divisions
      • divide_dataset
    • visualization
      • plot_comparison_label_distribution
      • plot_label_distributions

Reference docs

  • Recommended FL Datasets
  • Telemetry

Contributor tutorials

  • How to contribute a dataset
Back to top
View this page

Recommended FL Datasets¶

This page lists the recommended datasets for federated learning research, which can be used with Flower Datasets flwr-datasets. To learn about the library, see the quickstart tutorial . To see the full FL example with Flower and Flower Datasets open the quickstart-pytorch.

Note

All datasets from HuggingFace Hub can be used with our library. This page presents just a set of datasets we collected that you might find useful.

For more information about any dataset, visit its page by clicking the dataset name.

Image Datasets¶

Image Datasets¶

Name

Size

Image Shape

ylecun/mnist

train 60k; test 10k

28x28

uoft-cs/cifar10

train 50k; test 10k

32x32x3

uoft-cs/cifar100

train 50k; test 10k

32x32x3

zalando-datasets/fashion_mnist

train 60k; test 10k

28x28

flwrlabs/femnist

train 814k

28x28

zh-plus/tiny-imagenet

train 100k; valid 10k

64x64x3

flwrlabs/usps

train 7.3k; test 2k

16x16

flwrlabs/pacs

train 10k

227x227

flwrlabs/cinic10

train 90k; valid 90k; test 90k

32x32x3

flwrlabs/caltech101

train 8.7k

varies

flwrlabs/office-home

train 15.6k

varies

flwrlabs/fed-isic2019

train 18.6k; test 4.7k

varies

ufldl-stanford/svhn

train 73.3k; test 26k; extra 531k

32x32x3

sasha/dog-food

train 2.1k; test 0.9k

varies

Mike0307/MNIST-M

train 59k; test 9k

32x32

Audio Datasets¶

Audio Datasets¶

Name

Size

Subset

google/speech_commands

train 64.7k

v0.01

google/speech_commands

train 105.8k

v0.02

flwrlabs/ambient-acoustic-context

train 70.3k

fixie-ai/common_voice_17_0

varies

14 versions

fixie-ai/librispeech_asr

varies

clean/other

Tabular Datasets¶

Tabular Datasets¶

Name

Size

scikit-learn/adult-census-income

train 32.6k

jlh/uci-mushrooms

train 8.1k

scikit-learn/iris

train 150

jiahborcn/chembl_aqsol

train 12.9k; test 3.2k

jiahborcn/chembl_multiassay_activity

train 350k; test 87.5k

Text Datasets¶

Text Datasets¶

Name

Size

Category

sentiment140

train 1.6M; test 0.5k

Sentiment

google-research-datasets/mbpp

full 974; sanitized 427

General

openai/openai_humaneval

test 164

General

lukaemon/mmlu

varies

General

takala/financial_phrasebank

train 4.8k

Financial

pauri32/fiqa-2018

train 0.9k; validation 0.1k; test 0.2k

Financial

zeroshot/twitter-financial-news-sentiment

train 9.5k; validation 2.4k

Financial

bigbio/pubmed_qa

train 2M; validation 11k

Medical

openlifescienceai/medmcqa

train 183k; validation 4.3k; test 6.2k

Medical

bigbio/med_qa

train 10.1k; test 1.3k; validation 1.3k

Medical

Next
Telemetry
Previous
plot_label_distributions
Copyright © 2025 Flower Labs GmbH
Made with Sphinx and @pradyunsg's Furo
On this page
  • Recommended FL Datasets
    • Image Datasets
    • Audio Datasets
    • Tabular Datasets
    • Text Datasets