Run simulations¶

Simulating Federated Learning workloads is useful for a multitude of use cases: you might want to run your workload on a large cohort of clients without having to source, configure, and manage a large number of physical devices; you might want to run your FL workloads as fast as possible on the compute systems you have access to without going through a complex setup process; you might want to validate your algorithm in different scenarios at varying levels of data and system heterogeneity, client availability, privacy budgets, etc. These are among some of the use cases where simulating FL workloads makes sense.

Tip

The Flower AI Simulation 2025 tutorial series is available on YouTube. You can find all the videos here or by clicking on the video previews below. The associated code for the tutorial can be found in the Flower Github repository

Flower’s Simulation Engine schedules, launches, and manages ClientApp instances. It does so through a Backend, which contains several workers (i.e., Python processes) that can execute a ClientApp by passing it a Context and a Message. These ClientApp objects are identical to those used by Flower’s Deployment Engine, making alternating between simulation and deployment an effortless process. The execution of ClientApp objects through Flower’s Simulation Engine is:

  • Resource-aware: Each backend worker executing ClientApps gets assigned a portion of the compute and memory on your system. You can define these at the beginning of the simulation, allowing you to control the degree of parallelism of your simulation. For a fixed total pool of resources, the fewer the resources per backend worker, the more ClientApps can run concurrently on the same hardware.

  • Batchable: When there are more ClientApps to execute than backend workers, ClientApps are queued and executed as soon as resources are freed. This means that ClientApps are typically executed in batches of N, where N is the number of backend workers.

  • Self-managed: This means that you, as a user, do not need to launch ClientApps manually; instead, the Simulation Engine’s internals orchestrates the execution of all ClientApps.

  • Ephemeral: This means that a ClientApp is only materialized when it is required by the application (e.g., to do fit()). The object is destroyed afterward, releasing the resources it was assigned and allowing other clients to participate.

Note

You can preserve the state (e.g., internal variables, parts of an ML model, intermediate results) of a ClientApp by saving it to its Context. Check the Designing Stateful Clients guide for a complete walkthrough.

The Simulation Engine delegates to a Backend the role of spawning and managing ClientApps. The default backend is the RayBackend, which uses Ray, an open-source framework for scalable Python workloads. In particular, each worker is an Actor capable of spawning a ClientApp given its Context and a Message to process.

Launch your Flower simulation¶

Running a simulation is straightforward; in fact, it is the default mode of operation for flwr run. Therefore, running Flower simulations primarily requires you to first define a ClientApp and a ServerApp. A convenient way to generate a minimal but fully functional Flower app is by means of the flwr new command. There are multiple templates to choose from. The example below uses the PyTorch template.

Tip

If you haven’t already, install Flower via pip install -U flwr in a Python environment.

# or simply execute `flwr run` for a fully interactive process
flwr new my-app --framework="PyTorch" --username="alice"

Then, follow the instructions shown after completing the flwr new command. When you execute flwr run, you’ll be using the Simulation Engine.

If we take a look at the pyproject.toml that was generated from the flwr new command (and loaded upon flwr run execution), we see that a default federation is defined. It sets the number of supernodes to 10.

[tool.flwr.federations]
default = "local-simulation"

[tool.flwr.federations.local-simulation]
options.num-supernodes = 10

You can modify the size of your simulations by adjusting options.num-supernodes.

Simulation examples¶

In addition to the quickstart tutorials in the documentation (e.g., quickstart PyTorch Tutorial, quickstart JAX Tutorial), most examples in the Flower repository are simulation-ready.

The complete list of examples can be found in the Flower GitHub.

Defining ClientApp resources¶

By default, the Simulation Engine assigns two CPU cores to each backend worker. This means that if your system has 10 CPU cores, five backend workers can be running in parallel, each executing a different ClientApp instance.

More often than not, you would probably like to adjust the resources your ClientApp gets assigned based on the complexity (i.e., compute and memory footprint) of your workload. You can do so by adjusting the backend resources for your federation.

Caution

Note that the resources the backend assigns to each worker (and hence to each ClientApp being executed) are assigned in a soft manner. This means that the resources are primarily taken into account in order to control the degree of parallelism at which ClientApp instances should be executed. Resource assignment is not strict, meaning that if you specified your ClientApp is assumed to make use of 25% of the available VRAM but it ends up using 50%, it might cause other ClientApp instances to crash throwing an out-of-memory (OOM) error.

Customizing resources can be done directly in the pyproject.toml of your app.

[tool.flwr.federations.local-simulation]
options.num-supernodes = 10
options.backend.client-resources.num-cpus = 1 # each ClientApp assumes to use 1 CPU (default is 2)
options.backend.client-resources.num-gpus = 0.0 # no GPU access to the ClientApp (default is 0.0)

With the above backend settings, your simulation will run as many ClientApps in parallel as CPUs you have in your system. GPU resources for your ClientApp can be assigned by specifying the ratio of VRAM each should make use of.

[tool.flwr.federations.local-simulation]
options.num-supernodes = 10
options.backend.client-resources.num-cpus = 1 # each ClientApp assumes to use 1 CPU (default is 2)
options.backend.client-resources.num-gpus = 0.25 # each ClientApp uses 25% of VRAM (default is 0.0)

Note

If you are using TensorFlow, you need to enable memory growth so multiple ClientApp instances can share a GPU. This needs to be done before launching the simulation. To do so, set the environment variable TF_FORCE_GPU_ALLOW_GROWTH="1".

Let’s see how the above configuration results in a different number of ClientApps running in parallel depending on the resources available in your system. If your system has:

  • 10x CPUs and 1x GPU: at most 4 ClientApps will run in parallel since each requires 25% of the available VRAM.

  • 10x CPUs and 2x GPUs: at most 8 ClientApps will run in parallel (VRAM-limited).

  • 6x CPUs and 4x GPUs: at most 6 ClientApps will run in parallel (CPU-limited).

  • 10x CPUs but 0x GPUs: you won’t be able to run the simulation since not even the resources for a single ClientApp can be met.

A generalization of this is given by the following equation. It gives the maximum number of ClientApps that can be executed in parallel on available CPU cores (SYS_CPUS) and VRAM (SYS_GPUS).

\[N = \min\left(\left\lfloor \frac{\text{SYS_CPUS}}{\text{num_cpus}} \right\rfloor, \left\lfloor \frac{\text{SYS_GPUS}}{\text{num_gpus}} \right\rfloor\right)\]

Both num_cpus (an integer higher than 1) and num_gpus (a non-negative real number) should be set on a per ClientApp basis. If, for example, you want only a single ClientApp to run on each GPU, then set num_gpus=1.0. If, for example, a ClientApp requires access to two whole GPUs, you’d set num_gpus=2.

While the options.backend.client-resources can be used to control the degree of concurrency in your simulations, this does not stop you from running hundreds or even thousands of clients in the same round and having orders of magnitude more dormant (i.e., not participating in a round) clients. Let’s say you want to have 100 clients per round but your system can only accommodate 8 clients concurrently. The Simulation Engine will schedule 100 ClientApps to run and then will execute them in a resource-aware manner in batches of 8.

Simulation Engine resources¶

By default, the Simulation Engine has access to all system resources (i.e., all CPUs, all GPUs). However, in some settings, you might want to limit how many of your system resources are used for simulation. You can do this in the pyproject.toml of your app by setting the options.backend.init_args variable.

[tool.flwr.federations.local-simulation]
options.num-supernodes = 10
options.backend.client-resources.num-cpus = 1 # Each ClientApp will get assigned 1 CPU core
options.backend.client-resources.num-gpus = 0.5 # Each ClientApp will get 50% of each available GPU
options.backend.init_args.num_cpus = 1 # Only expose 1 CPU to the simulation
options.backend.init_args.num_gpus = 1 # Expose a single GPU to the simulation

With the above setup, the Backend will be initialized with a single CPU and GPU. Therefore, even if more CPUs and GPUs are available in your system, they will not be used for the simulation. The example above results in a single ClientApp running at any given point.

For a complete list of settings you can configure, check the ray.init documentation.

For the highest performance, do not set options.backend.init_args.

Simulation in Colab/Jupyter¶

The preferred way of running simulations should always be flwr run. However, the core functionality of the Simulation Engine can be used from within a Google Colab or Jupyter environment by means of run_simulation.

from flwr.simulation import run_simulation

# Construct the ClientApp passing the client generation function
client_app = ClientApp(client_fn=client_fn)

# Create your ServerApp passing the server generation function
server_app = ServerApp(server_fn=server_fn)

run_simulation(
    server_app=server_app,
    client_app=client_app,
    num_supernodes=10,  # equivalent to setting `num-supernodes` in the pyproject.toml
)

With run_simulation, you can also control the amount of resources for your ClientApp instances. Do so by setting backend_config. If unset, the default resources are assigned (i.e., 2xCPUs per ClientApp and no GPU).

run_simulation(
    # ...
    backend_config={"client_resources": {"num_cpus": 2, "num_gpus": 0.25}}
)

Refer to the 30 minutes Federated AI Tutorial for a complete example on how to run Flower Simulations in Colab.

Multi-node Flower simulations¶

Flower’s Simulation Engine allows you to run FL simulations across multiple compute nodes so that you’re not restricted to running simulations on a _single_ machine. Before starting your multi-node simulation, ensure that you:

  1. Have the same Python environment on all nodes.

  2. Have a copy of your code on all nodes.

  3. Have a copy of your dataset on all nodes. If you are using partitions from Flower Datasets, ensure the partitioning strategy its parameterization are the same. The expectation is that the i-th dataset partition is identical in all nodes.

  4. Start Ray on your head node: on the terminal, type ray start --head. This command will print a few lines, one of which indicates how to attach other nodes to the head node.

  5. Attach other nodes to the head node: copy the command shown after starting the head and execute it on the terminal of a new node (before executing flwr run). For example: ray start --address='192.168.1.132:6379'. Note that to be able to attach nodes to the head node they should be discoverable by each other.

With all the above done, you can run your code from the head node as you would if the simulation were running on a single node. In other words:

# From your head node, launch the simulation
flwr run

Once your simulation is finished, if you’d like to dismantle your cluster, you simply need to run the command ray stop in each node’s terminal (including the head node).

Note

When attaching a new node to the head, all its resources (i.e., all CPUs, all GPUs) will be visible by the head node. This means that the Simulation Engine can schedule as many ClientApp instances as that node can possibly run. In some settings, you might want to exclude certain resources from the simulation. You can do this by appending --num-cpus=<NUM_CPUS_FROM_NODE> and/or --num-gpus=<NUM_GPUS_FROM_NODE> in any ray start command (including when starting the head).

FAQ for Simulations¶

Can I make my ClientApp instances stateful?

Yes. Use the state attribute of the Context object that is passed to the ClientApp to save variables, parameters, or results to it. Read the Designing Stateful Clients guide for a complete walkthrough.

Can I run multiple simulations on the same machine?

Yes, but bear in mind that each simulation isn’t aware of the resource usage of the other. If your simulations make use of GPUs, consider setting the CUDA_VISIBLE_DEVICES environment variable to make each simulation use a different set of the available GPUs. Export such an environment variable before starting flwr run.

Do the CPU/GPU resources set for each ClientApp restrict how much compute/memory these make use of?

No. These resources are exclusively used by the simulation backend to control how many workers can be created on startup. Let’s say N backend workers are launched, then at most N ClientApp instances will be running in parallel. It is your responsibility to ensure ClientApp instances have enough resources to execute their workload (e.g., fine-tune a transformer model).

My ClientApp is triggering OOM on my GPU. What should I do?

It is likely that your num_gpus setting, which controls the number of ClientApp instances that can share a GPU, is too low (meaning too many ClientApps share the same GPU). Try the following:

  1. Set your num_gpus=1. This will make a single ClientApp run on a GPU.

  2. Inspect how much VRAM is being used (use nvidia-smi for this).

  3. Based on the VRAM you see your single ClientApp using, calculate how many more would fit within the remaining VRAM. One divided by the total number of ClientApps is the num_gpus value you should set.

Refer to Defining ClientApp resources for more details.

If your ClientApp is using TensorFlow, make sure you are exporting TF_FORCE_GPU_ALLOW_GROWTH="1" before starting your simulation. For more details, check.

How do I know what’s the right num_cpus and num_gpus for my ClientApp?

A good practice is to start by running the simulation for a few rounds with higher num_cpus and num_gpus than what is really needed (e.g., num_cpus=8 and, if you have a GPU, num_gpus=1). Then monitor your CPU and GPU utilization. For this, you can make use of tools such as htop and nvidia-smi. If you see overall resource utilization remains low, try lowering num_cpus and num_gpus (recall this will make more ClientApp instances run in parallel) until you see a satisfactory system resource utilization.

Note that if the workload on your ClientApp instances is not homogeneous (i.e., some come with a larger compute or memory footprint), you’d probably want to focus on those when coming up with a good value for num_gpus and num_cpus.

Can I assign different resources to each ClientApp instance?

No. All ClientApp objects are assumed to make use of the same num_cpus and num_gpus. When setting these values (refer to Defining ClientApp resources for more details), ensure the ClientApp with the largest memory footprint (either RAM or VRAM) can run in your system with others like it in parallel.

Can I run single simulation accross multiple compute nodes (e.g. GPU servers)?

Yes. If you are using the RayBackend (the default backend) you can first interconnect your nodes through Ray’s cli and then launch the simulation. Refer to Multi-node Flower simulations for a step-by-step guide.

My ServerApp also needs to make use of the GPU (e.g., to do evaluation of the global model after aggregation). Is this GPU usage taken into account by the Simulation Engine?

No. The Simulation Engine only manages ClientApps and therefore is only aware of the system resources they require. If your ServerApp makes use of substantial compute or memory resources, factor that into account when setting num_cpus and num_gpus.

Can I indicate on what resource a specific instance of a ClientApp should run? Can I do resource placement?

Currently, the placement of ClientApp instances is managed by the RayBackend (the only backend available as of flwr==1.13.0) and cannot be customized. Implementing a custom backend would be a way of achieving resource placement.