Run simulations¶
Simulating Federated Learning workloads is useful for a multitude of use cases: you might want to run your workload on a large cohort of clients without having to source, configure, and manage a large number of physical devices; you might want to run your FL workloads as fast as possible on the compute systems you have access to without going through a complex setup process; you might want to validate your algorithm in different scenarios at varying levels of data and system heterogeneity, client availability, privacy budgets, etc. These are among some of the use cases where simulating FL workloads makes sense.
Flower’s Simulation Engine
schedules, launches, and manages ClientApp
instances. It does so through a Backend
, which contains several workers (i.e.,
Python processes) that can execute a ClientApp
by passing it a Context
and a
Message
. These ClientApp
objects are identical to those used by Flower’s
Deployment Engine, making alternating
between simulation and deployment an effortless process. The execution of
ClientApp
objects through Flower’s Simulation Engine
is:
Resource-aware: Each backend worker executing
ClientApp
s gets assigned a portion of the compute and memory on your system. You can define these at the beginning of the simulation, allowing you to control the degree of parallelism of your simulation. For a fixed total pool of resources, the fewer the resources per backend worker, the moreClientApps
can run concurrently on the same hardware.Batchable: When there are more
ClientApps
to execute than backend workers,ClientApps
are queued and executed as soon as resources are freed. This means thatClientApps
are typically executed in batches of N, where N is the number of backend workers.Self-managed: This means that you, as a user, do not need to launch
ClientApps
manually; instead, theSimulation Engine
’s internals orchestrates the execution of allClientApp
s.Ephemeral: This means that a
ClientApp
is only materialized when it is required by the application (e.g., to do fit()). The object is destroyed afterward, releasing the resources it was assigned and allowing other clients to participate.
Note
You can preserve the state (e.g., internal variables, parts of an ML model,
intermediate results) of a ClientApp
by saving it to its Context
. Check the
Designing Stateful Clients guide for a
complete walkthrough.
The Simulation Engine
delegates to a Backend
the role of spawning and managing
ClientApps
. The default backend is the RayBackend
, which uses Ray, an open-source framework for scalable Python workloads. In
particular, each worker is an Actor capable of spawning a
ClientApp
given its Context
and a Message
to process.
Launch your Flower simulation¶
Running a simulation is straightforward; in fact, it is the default mode of operation
for flwr run
. Therefore, running Flower simulations primarily requires you to
first define a ClientApp
and a ServerApp
. A convenient way to generate a minimal
but fully functional Flower app is by means of the flwr new
command. There are
multiple templates to choose from. The example below uses the PyTorch
template.
Tip
If you haven’t already, install Flower via pip install -U flwr
in a Python
environment.
# or simply execute `flwr run` for a fully interactive process
flwr new my-app --framework="PyTorch" --username="alice"
Then, follow the instructions shown after completing the flwr new
command. When
you execute flwr run
, you’ll be using the Simulation Engine
.
If we take a look at the pyproject.toml
that was generated from the flwr new
command (and loaded upon flwr run
execution), we see that a default federation
is defined. It sets the number of supernodes to 10.
[tool.flwr.federations]
default = "local-simulation"
[tool.flwr.federations.local-simulation]
options.num-supernodes = 10
You can modify the size of your simulations by adjusting options.num-supernodes
.
Simulation examples¶
In addition to the quickstart tutorials in the documentation (e.g., quickstart PyTorch Tutorial, quickstart JAX Tutorial), most examples in the Flower repository are simulation-ready.
The complete list of examples can be found in the Flower GitHub.
Defining ClientApp
resources¶
By default, the Simulation Engine
assigns two CPU cores to each backend worker. This
means that if your system has 10 CPU cores, five backend workers can be running in
parallel, each executing a different ClientApp
instance.
More often than not, you would probably like to adjust the resources your ClientApp
gets assigned based on the complexity (i.e., compute and memory footprint) of your
workload. You can do so by adjusting the backend resources for your federation.
Caution
Note that the resources the backend assigns to each worker (and hence to each
ClientApp
being executed) are assigned in a soft manner. This means that the
resources are primarily taken into account in order to control the degree of
parallelism at which ClientApp
instances should be executed. Resource assignment
is not strict, meaning that if you specified your ClientApp
is assumed to
make use of 25% of the available VRAM but it ends up using 50%, it might cause other
ClientApp
instances to crash throwing an out-of-memory (OOM) error.
Customizing resources can be done directly in the pyproject.toml
of your app.
[tool.flwr.federations.local-simulation]
options.num-supernodes = 10
options.backend.client-resources.num-cpus = 1 # each ClientApp assumes to use 1 CPU (default is 2)
options.backend.client-resources.num-gpus = 0.0 # no GPU access to the ClientApp (default is 0.0)
With the above backend settings, your simulation will run as many ClientApps
in
parallel as CPUs you have in your system. GPU resources for your ClientApp
can be
assigned by specifying the ratio of VRAM each should make use of.
[tool.flwr.federations.local-simulation]
options.num-supernodes = 10
options.backend.client-resources.num-cpus = 1 # each ClientApp assumes to use 1 CPU (default is 2)
options.backend.client-resources.num-gpus = 0.25 # each ClientApp uses 25% of VRAM (default is 0.0)
Note
If you are using TensorFlow, you need to enable memory growth so multiple
ClientApp
instances can share a GPU. This needs to be done before launching the
simulation. To do so, set the environment variable
TF_FORCE_GPU_ALLOW_GROWTH="1"
.
Let’s see how the above configuration results in a different number of ClientApps
running in parallel depending on the resources available in your system. If your system
has:
10x CPUs and 1x GPU: at most 4
ClientApps
will run in parallel since each requires 25% of the available VRAM.10x CPUs and 2x GPUs: at most 8
ClientApps
will run in parallel (VRAM-limited).6x CPUs and 4x GPUs: at most 6
ClientApps
will run in parallel (CPU-limited).10x CPUs but 0x GPUs: you won’t be able to run the simulation since not even the resources for a single
ClientApp
can be met.
A generalization of this is given by the following equation. It gives the maximum number
of ClientApps
that can be executed in parallel on available CPU cores (SYS_CPUS) and
VRAM (SYS_GPUS).
Both num_cpus
(an integer higher than 1) and num_gpus
(a non-negative real
number) should be set on a per ClientApp
basis. If, for example, you want only a
single ClientApp
to run on each GPU, then set num_gpus=1.0
. If, for example, a
ClientApp
requires access to two whole GPUs, you’d set num_gpus=2
.
While the options.backend.client-resources
can be used to control the degree of
concurrency in your simulations, this does not stop you from running hundreds or even
thousands of clients in the same round and having orders of magnitude more dormant
(i.e., not participating in a round) clients. Let’s say you want to have 100 clients per
round but your system can only accommodate 8 clients concurrently. The Simulation
Engine
will schedule 100 ClientApps
to run and then will execute them in a
resource-aware manner in batches of 8.
Simulation Engine resources¶
By default, the Simulation Engine
has access to all system resources (i.e., all
CPUs, all GPUs). However, in some settings, you might want to limit how many of your
system resources are used for simulation. You can do this in the pyproject.toml
of
your app by setting the options.backend.init_args
variable.
[tool.flwr.federations.local-simulation]
options.num-supernodes = 10
options.backend.client-resources.num-cpus = 1 # Each ClientApp will get assigned 1 CPU core
options.backend.client-resources.num-gpus = 0.5 # Each ClientApp will get 50% of each available GPU
options.backend.init_args.num_cpus = 1 # Only expose 1 CPU to the simulation
options.backend.init_args.num_gpus = 1 # Expose a single GPU to the simulation
With the above setup, the Backend will be initialized with a single CPU and GPU.
Therefore, even if more CPUs and GPUs are available in your system, they will not be
used for the simulation. The example above results in a single ClientApp
running at
any given point.
For a complete list of settings you can configure, check the ray.init documentation.
For the highest performance, do not set options.backend.init_args
.
Simulation in Colab/Jupyter¶
The preferred way of running simulations should always be flwr run
. However, the
core functionality of the Simulation Engine
can be used from within a Google Colab
or Jupyter environment by means of run_simulation.
from flwr.simulation import run_simulation
# Construct the ClientApp passing the client generation function
client_app = ClientApp(client_fn=client_fn)
# Create your ServerApp passing the server generation function
server_app = ServerApp(server_fn=server_fn)
run_simulation(
server_app=server_app,
client_app=client_app,
num_supernodes=10, # equivalent to setting `num-supernodes` in the pyproject.toml
)
With run_simulation
, you can also control the amount of resources for your
ClientApp
instances. Do so by setting backend_config
. If unset, the default
resources are assigned (i.e., 2xCPUs per ClientApp
and no GPU).
run_simulation(
# ...
backend_config={"client_resources": {"num_cpus": 2, "num_gpus": 0.25}}
)
Refer to the 30 minutes Federated AI Tutorial for a complete example on how to run Flower Simulations in Colab.
Multi-node Flower simulations¶
Flower’s Simulation Engine
allows you to run FL simulations across multiple compute
nodes so that you’re not restricted to running simulations on a _single_ machine. Before
starting your multi-node simulation, ensure that you:
Have the same Python environment on all nodes.
Have a copy of your code on all nodes.
Have a copy of your dataset on all nodes. If you are using partitions from Flower Datasets, ensure the partitioning strategy its parameterization are the same. The expectation is that the i-th dataset partition is identical in all nodes.
Start Ray on your head node: on the terminal, type
ray start --head
. This command will print a few lines, one of which indicates how to attach other nodes to the head node.Attach other nodes to the head node: copy the command shown after starting the head and execute it on the terminal of a new node (before executing
flwr run
). For example:ray start --address='192.168.1.132:6379'
. Note that to be able to attach nodes to the head node they should be discoverable by each other.
With all the above done, you can run your code from the head node as you would if the simulation were running on a single node. In other words:
# From your head node, launch the simulation
flwr run
Once your simulation is finished, if you’d like to dismantle your cluster, you simply
need to run the command ray stop
in each node’s terminal (including the head node).
Note
When attaching a new node to the head, all its resources (i.e., all CPUs, all GPUs)
will be visible by the head node. This means that the Simulation Engine
can
schedule as many ClientApp
instances as that node can possibly run. In some
settings, you might want to exclude certain resources from the simulation. You can
do this by appending --num-cpus=<NUM_CPUS_FROM_NODE>
and/or
--num-gpus=<NUM_GPUS_FROM_NODE>
in any ray start
command (including when
starting the head).
FAQ for Simulations¶
Can I make my ClientApp
instances stateful?
Yes. Use the state
attribute of the Context
object that is passed to the ClientApp
to save variables, parameters, or results to it. Read the Designing Stateful Clients guide for a complete walkthrough.
Can I run multiple simulations on the same machine?
Yes, but bear in mind that each simulation isn’t aware of the resource usage of the other. If your simulations make use of GPUs, consider setting the CUDA_VISIBLE_DEVICES
environment variable to make each simulation use a different set of the available GPUs. Export such an environment variable before starting flwr run
.
Do the CPU/GPU resources set for each ClientApp
restrict how much compute/memory these make use of?
No. These resources are exclusively used by the simulation backend to control how many workers can be created on startup. Let’s say N backend workers are launched, then at most N ClientApp
instances will be running in parallel. It is your responsibility to ensure ClientApp
instances have enough resources to execute their workload (e.g., fine-tune a transformer model).
My ClientApp
is triggering OOM on my GPU. What should I do?
It is likely that your num_gpus setting, which controls the number of ClientApp
instances that can share a GPU, is too low (meaning too many ClientApps
share the same GPU). Try the following:
Set your
num_gpus=1
. This will make a singleClientApp
run on a GPU.Inspect how much VRAM is being used (use
nvidia-smi
for this).Based on the VRAM you see your single
ClientApp
using, calculate how many more would fit within the remaining VRAM. One divided by the total number ofClientApps
is thenum_gpus
value you should set.
Refer to Defining ClientApp resources for more details.
If your ClientApp
is using TensorFlow, make sure you are exporting TF_FORCE_GPU_ALLOW_GROWTH="1"
before starting your simulation. For more details, check.
How do I know what’s the right num_cpus
and num_gpus
for my ClientApp
?
A good practice is to start by running the simulation for a few rounds with higher num_cpus
and num_gpus
than what is really needed (e.g., num_cpus=8
and, if you have a GPU, num_gpus=1
). Then monitor your CPU and GPU utilization. For this, you can make use of tools such as htop
and nvidia-smi
. If you see overall resource utilization remains low, try lowering num_cpus
and num_gpus
(recall this will make more ClientApp
instances run in parallel) until you see a satisfactory system resource utilization.
Note that if the workload on your ClientApp
instances is not homogeneous (i.e., some come with a larger compute or memory footprint), you’d probably want to focus on those when coming up with a good value for num_gpus
and num_cpus
.
Can I assign different resources to each ClientApp
instance?
No. All ClientApp
objects are assumed to make use of the same num_cpus
and num_gpus
. When setting these values (refer to Defining ClientApp resources for more details), ensure the ClientApp
with the largest memory footprint (either RAM or VRAM) can run in your system with others like it in parallel.
Can I run single simulation accross multiple compute nodes (e.g. GPU servers)?
Yes. If you are using the RayBackend
(the default backend) you can first interconnect your nodes through Ray’s cli and then launch the simulation. Refer to Multi-node Flower simulations for a step-by-step guide.
My ServerApp
also needs to make use of the GPU (e.g., to do evaluation of the global model after aggregation). Is this GPU usage taken into account by the Simulation Engine
?
No. The Simulation Engine
only manages ClientApps
and therefore is only aware of the system resources they require. If your ServerApp
makes use of substantial compute or memory resources, factor that into account when setting num_cpus
and num_gpus
.
Can I indicate on what resource a specific instance of a ClientApp
should run? Can I do resource placement?
Currently, the placement of ClientApp
instances is managed by the RayBackend
(the only backend available as of flwr==1.13.0
) and cannot be customized. Implementing a custom backend would be a way of achieving resource placement.