运行模拟

模拟联邦学习工作负载可用于多种案例:您可能希望在大量客户端上运行您的工作负载,但无需采购、配置和管理大量物理设备;您可能希望在您可以访问的计算系统上尽可能快地运行您的 FL 工作负载,而无需经过复杂的设置过程;您可能希望在不同数据和系统异构性、客户端可用性、隐私预算等不同水平的场景中验证您的算法。这些都是模拟 FL 工作负载的一些案例。Flower 可以通过其 "虚拟客户端引擎"(VirtualClientEngine)<contributor-explanation-architecture.html#virtual-client-engine>_或 VCE 来匹配这些情况。

The VirtualClientEngine schedules, launches and manages virtual clients. These clients are identical to non-virtual clients (i.e. the ones you launch via the command flwr.client.start_client) in the sense that they can be configure by creating a class inheriting, for example, from flwr.client.NumPyClient and therefore behave in an identical way. In addition to that, clients managed by the VirtualClientEngine are:

  • 资源感知:这意味着每个客户端都会分配到系统中的一部分计算和内存。作为用户,您可以在模拟开始时对其进行控制,从而控制 Flower FL 模拟的并行程度。每个客户端的资源越少,在同一硬件上并发运行的客户端就越多。

  • self-managed: this means that you as a user do not need to launch clients manually, instead this gets delegated to VirtualClientEngine's internals.

  • 即时性:这意味着客户端只有在 FL 进程中需要它时才会被实体化(例如执行 fit() )。之后该对象将被销毁,释放分配给它的资源,并允许其他客户端以这种方式参与。

The VirtualClientEngine implements virtual clients using Ray, an open-source framework for scalable Python workloads. In particular, Flower's VirtualClientEngine makes use of Actors to spawn virtual clients and run their workload.

启动 Flower 模拟

运行 Flower 模拟器仍然需要定义客户端类、策略以及下载和加载(可能还需要分割)数据集的实用程序。在完成这些工作后,就可以使用 "start_simulation <ref-api-flwr.html#flwr.simulation.start_simulation>`_" 来启动模拟了,一个最简单的示例如下:

import flwr as fl
from flwr.server.strategy import FedAvg


def client_fn(cid: str):
    # Return a standard Flower client
    return MyFlowerClient().to_client()


# Launch the simulation
hist = fl.simulation.start_simulation(
    client_fn=client_fn,  # A function to run a _virtual_ client when required
    num_clients=50,  # Total number of clients available
    config=fl.server.ServerConfig(num_rounds=3),  # Specify number of FL rounds
    strategy=FedAvg(),  # A Flower strategy
)

虚拟客户端引擎资源

By default the VCE has access to all system resources (i.e. all CPUs, all GPUs, etc) since that is also the default behavior when starting Ray. However, in some settings you might want to limit how many of your system resources are used for simulation. You can do this via the ray_init_args input argument to start_simulation which the VCE internally passes to Ray's ray.init command. For a complete list of settings you can configure check the ray.init documentation. Do not set ray_init_args if you want the VCE to use all your system's CPUs and GPUs.

import flwr as fl

# Launch the simulation by limiting resources visible to Flower's VCE
hist = fl.simulation.start_simulation(
    # ...
    # Out of all CPUs and GPUs available in your system,
    # only 8xCPUs and 1xGPUs would be used for simulation.
    ray_init_args={"num_cpus": 8, "num_gpus": 1}
)

分配客户端资源

By default the VirtualClientEngine assigns a single CPU core (and nothing else) to each virtual client. This means that if your system has 10 cores, that many virtual clients can be concurrently running.

通常情况下,您可能希望根据 FL 工作负载的复杂性(即计算和内存占用)来调整分配给客户端的资源。您可以在启动模拟时将参数 client_resources 设置为 start_simulation 。Ray 内部使用两个键来调度和生成工作负载(在我们的例子中是 Flower 客户端):

  • num_cpus indicates the number of CPU cores a client would get.

  • num_gpus indicates the ratio of GPU memory a client gets assigned.

让我们来看几个例子:

import flwr as fl

# each client gets 1xCPU (this is the default if no resources are specified)
my_client_resources = {"num_cpus": 1, "num_gpus": 0.0}
# each client gets 2xCPUs and half a GPU. (with a single GPU, 2 clients run concurrently)
my_client_resources = {"num_cpus": 2, "num_gpus": 0.5}
# 10 client can run concurrently on a single GPU, but only if you have 20 CPU threads.
my_client_resources = {"num_cpus": 2, "num_gpus": 0.1}

# Launch the simulation
hist = fl.simulation.start_simulation(
    # ...
    client_resources=my_client_resources  # A Python dict specifying CPU/GPU resources
)

While the client_resources can be used to control the degree of concurrency in your FL simulation, this does not stop you from running dozens, hundreds or even thousands of clients in the same round and having orders of magnitude more dormant (i.e. not participating in a round) clients. Let's say you want to have 100 clients per round but your system can only accommodate 8 clients concurrently. The VirtualClientEngine will schedule 100 jobs to run (each simulating a client sampled by the strategy) and then will execute them in a resource-aware manner in batches of 8.

要了解资源如何用于调度 FL 客户端以及如何定义自定义资源的所有复杂细节,请查看 Ray 文档

模拟示例

在 Tensorflow/Keras 和 PyTorch 中进行 Flower 模拟的几个可随时运行的完整示例已在 Flower 库 中提供。您也可以在 Google Colab 上运行它们:

多节点 Flower 模拟

Flower's VirtualClientEngine allows you to run FL simulations across multiple compute nodes. Before starting your multi-node simulation ensure that you:

  1. 所有节点都有相同的 Python 环境。

  2. 在所有节点上都有一份代码副本(例如整个软件包)。

  3. 在所有节点中都有一份数据集副本(更多相关信息请参阅 模拟注意事项

  4. Pass ray_init_args={"address"="auto"} to start_simulation so the VirtualClientEngine attaches to a running Ray instance.

  5. Start Ray on you head node: on the terminal type ray start --head. This command will print a few lines, one of which indicates how to attach other nodes to the head node.

  6. Attach other nodes to the head node: copy the command shown after starting the head and execute it on terminal of a new node: for example ray start --address='192.168.1.132:6379'

完成上述所有操作后,您就可以在头部节点上运行代码了,就像在单个节点上运行模拟一样。

Once your simulation is finished, if you'd like to dismantle your cluster you simply need to run the command ray stop in each node's terminal (including the head node).

了解多节点模拟

在此,我们列举了运行多节点 FL 模拟时的一些有趣功能:

User ray status to check all nodes connected to your head node as well as the total resources available to the VirtualClientEngine.

When attaching a new node to the head, all its resources (i.e. all CPUs, all GPUs) will be visible by the head node. This means that the VirtualClientEngine can schedule as many virtual clients as that node can possible run. In some settings you might want to exclude certain resources from the simulation. You can do this by appending --num-cpus=<NUM_CPUS_FROM_NODE> and/or --num-gpus=<NUM_GPUS_FROM_NODE> in any ray start command (including when starting the head)

模拟的注意事项

Note

我们正在积极开展这些方面的工作,以便使 FL 工作负载与 Flower 模拟的运行变得轻而易举。

当前的 VCE 允许您在模拟模式下运行联邦学习工作负载,无论您是在个人笔记本电脑上建立简单的场景原型,还是要在多个高性能 GPU 节点上训练复杂的 FL情景。虽然我们为 VCE 增加了更多的功能,但以下几点强调了在使用 Flower 设计 FL 时需要注意的一些事项。我们还强调了我们的实现中目前存在的一些局限性。

GPU 资源

The VCE assigns a share of GPU memory to a client that specifies the key num_gpus in client_resources. This being said, Ray (used internally by the VCE) is by default:

  • not aware of the total VRAM available on the GPUs. This means that if you set num_gpus=0.5 and you have two GPUs in your system with different (e.g. 32GB and 8GB) VRAM amounts, they both would run 2 clients concurrently.

  • 不知道 GPU 上正在运行其他无关(即不是由 VCE 创建)的工作负载。从中可以得到以下两点启示:

    • 您的 Flower 服务器可能需要 GPU 来评估聚合后的 "全局模型"(例如在使用 "评估方法"<how-to-implement-strategies.html#the-evaluate-method>`_时)

    • If you want to run several independent Flower simulations on the same machine you need to mask-out your GPUs with CUDA_VISIBLE_DEVICES="<GPU_IDs>" when launching your experiment.

In addition, the GPU resource limits passed to client_resources are not enforced (i.e. they can be exceeded) which can result in the situation of client using more VRAM than the ratio specified when starting the simulation.

使用 GPU 的 TensorFlow

在 TensorFlow <https://www.tensorflow.org/guide/gpu>`_ 中使用 GPU 时,几乎所有进程可见的 GPU 内存都将被映射。TensorFlow 这样做是出于优化目的。然而,在 FL 模拟等设置中,我们希望将 GPU 分割成多个 "虚拟 "客户端,这并不是一个理想的机制。幸运的是,我们可以通过 `启用内存增长 <https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth>`_来禁用这一默认行为。

This would need to be done in the main process (which is where the server would run) and in each Actor created by the VCE. By means of actor_kwargs we can pass the reserved key "on_actor_init_fn" in order to specify a function to be executed upon actor initialization. In this case, to enable GPU growth for TF workloads. It would look as follows:

import flwr as fl
from flwr.simulation.ray_transport.utils import enable_tf_gpu_growth

# Enable GPU growth in the main thread (the one used by the
# server to quite likely run global evaluation using GPU)
enable_tf_gpu_growth()

# Start Flower simulation
hist = fl.simulation.start_simulation(
    # ...
    actor_kwargs={
        "on_actor_init_fn": enable_tf_gpu_growth  # <-- To be executed upon actor init.
    },
)

这正是 "Tensorflow/Keras 模拟 <https://github.com/adap/flower/tree/main/examples/simulation-tensorflow>`_"示例中使用的机制。

多节点设置

  • VCE 目前不提供控制特定 "虚拟 "客户端在哪个节点上执行的方法。换句话说,如果不止一个节点拥有客户端运行所需的资源,那么这些节点中的任何一个都可能被调度到客户端工作负载上。在 FL 进程的稍后阶段(即在另一轮中),同一客户端可以由不同的节点执行。根据客户访问数据集的方式,这可能需要在所有节点上复制所有数据集分区,或采用数据集服务机制(如使用 nfs 或数据库)来避免数据重复。

  • 根据定义,虚拟客户端是 "无状态 "的,因为它们具有即时性。客户机状态可以作为 Flower 客户机类的一部分来实现,但用户需要确保将其保存到持久存储(如数据库、磁盘)中,而且无论客户机在哪个节点上运行,都能在以后检索到。这也与上述观点有关,因为在某种程度上,客户端的数据集可以被视为一种 "状态"。