Generate Demo Data for SuperNodes¶

In Flower simulations, datasets are downloaded and partitioned on-the-fly. While convenient for prototyping, production deployments require SuperNodes to have pre-existing data on disk. This ensures immediate startup, data persistence across restarts, and a setup that mirrors real-world federated AI where each node owns its local data.

Flower Datasets enables you to generate pre-partitioned datasets for deployment prototyping using the Flower Datasets CLI. By materializing partitions to disk ahead of time, each SuperNode can read from its designated partition—just as it would in production.

Note

This guide is intended for generating demo data for testing deployments. For production deployments, ensure that each SuperNode has access to its own local data partition.

Using the Flower Datasets CLI¶

The flwr-datasets create command enables you to download a dataset, partition it, and save each partition to disk in a single step. For complete details on all available options, see the Flower Datasets CLI reference.

For example, to generate demo data from the MNIST dataset with five partitions and store the result in the ./demo_data directory (it will be created if it doesn’t exist), run the following command in your terminal:

# flwr-datasets create <dataset> --num-partitions <n> --out-dir <dir>
flwr-datasets create ylecun/mnist --num-partitions 5 --out-dir demo_data

# The output will look similar to this:
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 3085.94 examples/s]
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 4006.59 examples/s]
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 4001.21 examples/s]
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 4010.60 examples/s]
Saving the dataset (1/1 shards): 100%|████████████| 12000/12000 [00:00<00:00, 3990.48 examples/s]
🎊 Created 5 partitions for 'ylecun/mnist' in '/path/to/demo_data'

The above command generates the following directory structure:

demo_data/
├── partition_0/
│   ├── data-00000-of-00001.arrow
│   ├── dataset_info.json
│   └── state.json
...
└── partition_4/
    ├── data-00000-of-00001.arrow
    ├── dataset_info.json
    └── state.json

Using Generated Demo Data in SuperNodes¶

Once you have generated the partitions, each SuperNode can be configured to load its designated partition. The recommended approach is to pass the partition path as a node configuration parameter when starting the SuperNode.

Passing the Data Path to a SuperNode¶

Use the --node-config flag to specify the path to the partition when launching a SuperNode. In the example below, the selected key data-path is arbitrary and provided for illustration only; any application-appropriate key may be used.

flower-supernode \
    --insecure \
    --superlink <SUPERLINK-FLEET-API> \
    --node-config="data-path=/path/to/demo_data/partition_0"

Loading the Dataset in Your ClientApp¶

In your ClientApp, you can access the configured data path through the Context and load the dataset using the load_from_disk function from the Huggingface datasets module:

from flwr.app import Context, Message
from flwr.clientapp import ClientApp
from datasets import load_from_disk

app = ClientApp()


@app.train()
def train(msg: Message, context: Context) -> Message:
    """Train the model on local data."""
    # Retrieve the data path from node configuration
    dataset_path = context.node_config["data-path"]

    # Load the partition from disk
    partition = load_from_disk(dataset_path)

    # Use the dataset for training
    # ...

Tip

For a complete guide on how to run Flower SuperNodes, refer to the Deployment Runtime Documentation.