Generate Demo Data for SuperNodesΒΆ
In Flower simulations, datasets are downloaded and partitioned on-the-fly. While convenient for prototyping, production deployments require SuperNodes to have pre-existing data on disk. This ensures immediate startup, data persistence across restarts, and a setup that mirrors real-world federated AI where each node owns its local data.
Flower Datasets enables you to generate pre-partitioned datasets for deployment prototyping using the Flower Datasets CLI. By materializing partitions to disk ahead of time, each SuperNode can read from its designated partitionβjust as it would in production.
Note
This guide is intended for generating demo data for testing deployments. For production deployments, ensure that each SuperNode has access to its own local data partition.
Using the Flower Datasets CLIΒΆ
The flwr-datasets create command enables you to download a dataset,
partition it, and save each partition to disk in a single step. For complete
details on all available options, see the Flower Datasets CLI reference.
For example, to generate demo data from the MNIST dataset with five
partitions and store the result in the ./demo_data directory (it will be created if it doesnβt exist), run the
following command in your terminal:
# flwr-datasets create <dataset> --num-partitions <n> --out-dir <dir>
flwr-datasets create ylecun/mnist --num-partitions 5 --out-dir demo_data
# The output will look similar to this:
Saving the dataset (1/1 shards): 100%|ββββββββββββ| 12000/12000 [00:00<00:00, 3085.94 examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββββ| 12000/12000 [00:00<00:00, 4006.59 examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββββ| 12000/12000 [00:00<00:00, 4001.21 examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββββ| 12000/12000 [00:00<00:00, 4010.60 examples/s]
Saving the dataset (1/1 shards): 100%|ββββββββββββ| 12000/12000 [00:00<00:00, 3990.48 examples/s]
π Created 5 partitions for 'ylecun/mnist' in '/path/to/demo_data'
The above command generates the following directory structure:
demo_data/
βββ partition_0/
β βββ data-00000-of-00001.arrow
β βββ dataset_info.json
β βββ state.json
...
βββ partition_4/
βββ data-00000-of-00001.arrow
βββ dataset_info.json
βββ state.json
Using Generated Demo Data in SuperNodesΒΆ
Once you have generated the partitions, each SuperNode can be configured to load its designated partition. The recommended approach is to pass the partition path as a node configuration parameter when starting the SuperNode.
Passing the Data Path to a SuperNodeΒΆ
Use the --node-config flag to specify the path to the partition when
launching a SuperNode. In the example below, the selected key data-path
is arbitrary and provided for illustration only; any application-appropriate
key may be used.
flower-supernode \
--insecure \
--node-config="data-path=/path/to/demo_data/partition_0"
Loading the Dataset in Your ClientAppΒΆ
In your ClientApp, you can access the configured data path through the
Context and load the dataset using the
load_from_disk function from the Huggingface datasets module:
from flwr.app import Context, Message
from flwr.clientapp import ClientApp
from datasets import load_from_disk
app = ClientApp()
@app.train()
def train(msg: Message, context: Context) -> Message:
"""Train the model on local data."""
# Retrieve the data path from node configuration
dataset_path = context.node_config["data-path"]
# Load the partition from disk
partition = load_from_disk(dataset_path)
# Use the dataset for training
# ...
Tip
For a complete guide on how to run Flower SuperNodes, refer to the Deployment Runtime Documentation.