Open in Colab

Get started with Flower

Bienvenue au tutoriel sur l’apprentissage fédéré de la fleur !

In this notebook, we’ll build a federated learning system using the Flower framework, Flower Datasets and PyTorch. In part 1, we use PyTorch for the model training pipeline and data loading. In part 2, we federate the PyTorch project using Flower.

Star Flower on GitHub ⭐️ and join the Flower community on Flower Discuss and the Flower Slack to connect, ask questions, and get help: - Join Flower Discuss We’d love to hear from you in the Introduction topic! If anything is unclear, post in Flower Help - Beginners. - Join Flower Slack We’d love to hear from you in the #introductions channel! If anything is unclear, head over to the #questions channel.

Let’s get started! 🌼

Étape 0 : Préparation

Avant de commencer à coder, assurons-nous que nous disposons de tout ce dont nous avons besoin.

Install dependencies

Next, we install the necessary packages for PyTorch (torch and torchvision), Flower Datasets (flwr-datasets) and Flower (flwr):

[ ]:
!pip install -q flwr[simulation] flwr-datasets[vision] torch torchvision matplotlib

Maintenant que toutes les dépendances sont installées, nous pouvons importer tout ce dont nous avons besoin pour ce tutoriel :

[ ]:
from collections import OrderedDict
from typing import List, Tuple

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from datasets.utils.logging import disable_progress_bar
from torch.utils.data import DataLoader

import flwr
from flwr.client import Client, ClientApp, NumPyClient
from flwr.common import Metrics, Context
from flwr.server import ServerApp, ServerConfig, ServerAppComponents
from flwr.server.strategy import FedAvg
from flwr.simulation import run_simulation
from flwr_datasets import FederatedDataset

DEVICE = torch.device("cpu")  # Try "cuda" to train on GPU
print(f"Training on {DEVICE}")
print(f"Flower {flwr.__version__} / PyTorch {torch.__version__}")
disable_progress_bar()

It is possible to switch to a runtime that has GPU acceleration enabled (on Google Colab: Runtime > Change runtime type > Hardware accelerator: GPU > Save). Note, however, that Google Colab is not always able to offer GPU acceleration. If you see an error related to GPU availability in one of the following sections, consider switching back to CPU-based execution by setting DEVICE = torch.device("cpu"). If the runtime has GPU acceleration enabled, you should see the output Training on cuda, otherwise it’ll say Training on cpu.

Load the data

Federated learning can be applied to many different types of tasks across different domains. In this tutorial, we introduce federated learning by training a simple convolutional neural network (CNN) on the popular CIFAR-10 dataset. CIFAR-10 can be used to train image classifiers that distinguish between images from ten different classes: “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship”, and “truck”.

We simulate having multiple datasets from multiple organizations (also called the « cross-silo » setting in federated learning) by splitting the original CIFAR-10 dataset into multiple partitions. Each partition will represent the data from a single organization. We’re doing this purely for experimentation purposes, in the real world there’s no need for data splitting because each organization already has their own data (the data is naturally partitioned).

Each organization will act as a client in the federated learning system. Having ten organizations participate in a federation means having ten clients connected to the federated learning server.

We use the Flower Datasets library (flwr-datasets) to partition CIFAR-10 into ten partitions using FederatedDataset. We will create a small training and test set for each of the ten organizations and wrap each of these into a PyTorch DataLoader:

[ ]:
NUM_CLIENTS = 10
BATCH_SIZE = 32


def load_datasets(partition_id: int):
    fds = FederatedDataset(dataset="cifar10", partitioners={"train": NUM_CLIENTS})
    partition = fds.load_partition(partition_id)
    # Divide data on each node: 80% train, 20% test
    partition_train_test = partition.train_test_split(test_size=0.2, seed=42)
    pytorch_transforms = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )

    def apply_transforms(batch):
        # Instead of passing transforms to CIFAR10(..., transform=transform)
        # we will use this function to dataset.with_transform(apply_transforms)
        # The transforms object is exactly the same
        batch["img"] = [pytorch_transforms(img) for img in batch["img"]]
        return batch

    # Create train/val for each partition and wrap it into DataLoader
    partition_train_test = partition_train_test.with_transform(apply_transforms)
    trainloader = DataLoader(
        partition_train_test["train"], batch_size=BATCH_SIZE, shuffle=True
    )
    valloader = DataLoader(partition_train_test["test"], batch_size=BATCH_SIZE)
    testset = fds.load_split("test").with_transform(apply_transforms)
    testloader = DataLoader(testset, batch_size=BATCH_SIZE)
    return trainloader, valloader, testloader

We now have a function that can return a training set and validation set (trainloader and valloader) representing one dataset from one of ten different organizations. Each trainloader/valloader pair contains 4000 training examples and 1000 validation examples. There’s also a single testloader (we did not split the test set). Again, this is only necessary for building research or educational systems, actual federated learning systems have their data naturally distributed across multiple partitions.

Let’s take a look at the first batch of images and labels in the first training set (i.e., trainloader from partition_id=0) before we move on:

[ ]:
trainloader, _, _ = load_datasets(partition_id=0)
batch = next(iter(trainloader))
images, labels = batch["img"], batch["label"]

# Reshape and convert images to a NumPy array
# matplotlib requires images with the shape (height, width, 3)
images = images.permute(0, 2, 3, 1).numpy()

# Denormalize
images = images / 2 + 0.5

# Create a figure and a grid of subplots
fig, axs = plt.subplots(4, 8, figsize=(12, 6))

# Loop over the images and plot them
for i, ax in enumerate(axs.flat):
    ax.imshow(images[i])
    ax.set_title(trainloader.dataset.features["label"].int2str([labels[i]])[0])
    ax.axis("off")

# Show the plot
fig.tight_layout()
plt.show()

The output above shows a random batch of images from the trainloader from the first of ten partitions. It also prints the labels associated with each image (i.e., one of the ten possible labels we’ve seen above). If you run the cell again, you should see another batch of images.

Étape 1 : Formation centralisée avec PyTorch

Ensuite, nous allons utiliser PyTorch pour définir un simple réseau neuronal convolutif. Cette introduction suppose une familiarité de base avec PyTorch, elle ne couvre donc pas en détail les aspects liés à PyTorch. Si tu veux plonger plus profondément dans PyTorch, nous te recommandons DEEP LEARNING WITH PYTORCH : A 60 MINUTE BLITZ.

Define the model

Nous utilisons le CNN simple décrit dans le tutoriel PyTorch :

[ ]:
class Net(nn.Module):
    def __init__(self) -> None:
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Poursuivons avec les fonctions habituelles de formation et de test :

[ ]:
def train(net, trainloader, epochs: int, verbose=False):
    """Train the network on the training set."""
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters())
    net.train()
    for epoch in range(epochs):
        correct, total, epoch_loss = 0, 0, 0.0
        for batch in trainloader:
            images, labels = batch["img"].to(DEVICE), batch["label"].to(DEVICE)
            optimizer.zero_grad()
            outputs = net(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            # Metrics
            epoch_loss += loss
            total += labels.size(0)
            correct += (torch.max(outputs.data, 1)[1] == labels).sum().item()
        epoch_loss /= len(trainloader.dataset)
        epoch_acc = correct / total
        if verbose:
            print(f"Epoch {epoch+1}: train loss {epoch_loss}, accuracy {epoch_acc}")


def test(net, testloader):
    """Evaluate the network on the entire test set."""
    criterion = torch.nn.CrossEntropyLoss()
    correct, total, loss = 0, 0, 0.0
    net.eval()
    with torch.no_grad():
        for batch in testloader:
            images, labels = batch["img"].to(DEVICE), batch["label"].to(DEVICE)
            outputs = net(images)
            loss += criterion(outputs, labels).item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    loss /= len(testloader.dataset)
    accuracy = correct / total
    return loss, accuracy

Train the model

We now have all the basic building blocks we need: a dataset, a model, a training function, and a test function. Let’s put them together to train the model on the dataset of one of our organizations (partition_id=0). This simulates the reality of most machine learning projects today: each organization has their own data and trains models only on this internal data:

[ ]:
trainloader, valloader, testloader = load_datasets(partition_id=0)
net = Net().to(DEVICE)

for epoch in range(5):
    train(net, trainloader, 1)
    loss, accuracy = test(net, valloader)
    print(f"Epoch {epoch+1}: validation loss {loss}, accuracy {accuracy}")

loss, accuracy = test(net, testloader)
print(f"Final test set performance:\n\tloss {loss}\n\taccuracy {accuracy}")

Training the simple CNN on our CIFAR-10 split for 5 epochs should result in a test set accuracy of about 41%, which is not good, but at the same time, it doesn’t really matter for the purposes of this tutorial. The intent was just to show a simple centralized training pipeline that sets the stage for what comes next - federated learning!

Étape 2 : Apprentissage fédéré avec Flower

L’étape 1 a montré un simple pipeline de formation centralisé. Toutes les données étaient au même endroit (c’est-à-dire un seul trainloader et un seul valloader). Ensuite, nous allons simuler une situation où nous avons plusieurs ensembles de données dans plusieurs organisations et où nous formons un modèle sur ces organisations à l’aide de l’apprentissage fédéré.

Update model parameters

In federated learning, the server sends global model parameters to the client, and the client updates the local model with parameters received from the server. It then trains the model on the local data (which changes the model parameters locally) and sends the updated/changed model parameters back to the server (or, alternatively, it sends just the gradients back to the server, not the full model parameters).

Nous avons besoin de deux fonctions d’aide pour mettre à jour le modèle local avec les paramètres reçus du serveur et pour obtenir les paramètres mis à jour du modèle local : set_parameters et get_parameters. Les deux fonctions suivantes font exactement cela pour le modèle PyTorch ci-dessus.

The details of how this works are not really important here (feel free to consult the PyTorch documentation if you want to learn more). In essence, we use state_dict to access PyTorch model parameter tensors. The parameter tensors are then converted to/from a list of NumPy ndarray’s (which the Flower NumPyClient knows how to serialize/deserialize):

[ ]:
def set_parameters(net, parameters: List[np.ndarray]):
    params_dict = zip(net.state_dict().keys(), parameters)
    state_dict = OrderedDict({k: torch.Tensor(v) for k, v in params_dict})
    net.load_state_dict(state_dict, strict=True)


def get_parameters(net) -> List[np.ndarray]:
    return [val.cpu().numpy() for _, val in net.state_dict().items()]

Define the Flower ClientApp

With that out of the way, let’s move on to the interesting part. Federated learning systems consist of a server and multiple clients. In Flower, we create a ServerApp and a ClientApp to run the server-side and client-side code, respectively.

The first step toward creating a ClientApp is to implement a subclasses of flwr.client.Client or flwr.client.NumPyClient. We use NumPyClient in this tutorial because it is easier to implement and requires us to write less boilerplate. To implement NumPyClient, we create a subclass that implements the three methods get_parameters, fit, and evaluate:

  • get_parameters : renvoie les paramètres du modèle local actuel

  • fit: Receive model parameters from the server, train the model on the local data, and return the updated model parameters to the server

  • evaluate: Receive model parameters from the server, evaluate the model on the local data, and return the evaluation result to the server

Nous avons mentionné que nos clients utiliseront les composants PyTorch définis précédemment pour la formation et l’évaluation des modèles. Voyons une simple mise en œuvre du client Flower qui réunit tout cela :

[ ]:
class FlowerClient(NumPyClient):
    def __init__(self, net, trainloader, valloader):
        self.net = net
        self.trainloader = trainloader
        self.valloader = valloader

    def get_parameters(self, config):
        return get_parameters(self.net)

    def fit(self, parameters, config):
        set_parameters(self.net, parameters)
        train(self.net, self.trainloader, epochs=1)
        return get_parameters(self.net), len(self.trainloader), {}

    def evaluate(self, parameters, config):
        set_parameters(self.net, parameters)
        loss, accuracy = test(self.net, self.valloader)
        return float(loss), len(self.valloader), {"accuracy": float(accuracy)}

Our class FlowerClient defines how local training/evaluation will be performed and allows Flower to call the local training/evaluation through fit and evaluate. Each instance of FlowerClient represents a single client in our federated learning system. Federated learning systems have multiple clients (otherwise, there’s not much to federate), so each client will be represented by its own instance of FlowerClient. If we have, for example, three clients in our workload, then we’d have three instances of FlowerClient (one on each of the machines we’d start the client on). Flower calls FlowerClient.fit on the respective instance when the server selects a particular client for training (and FlowerClient.evaluate for evaluation).

In this notebook, we want to simulate a federated learning system with 10 clients on a single machine. This means that the server and all 10 clients will live on a single machine and share resources such as CPU, GPU, and memory. Having 10 clients would mean having 10 instances of FlowerClient in memory. Doing this on a single machine can quickly exhaust the available memory resources, even if only a subset of these clients participates in a single round of federated learning.

In addition to the regular capabilities where server and clients run on multiple machines, Flower, therefore, provides special simulation capabilities that create FlowerClient instances only when they are actually necessary for training or evaluation. To enable the Flower framework to create clients when necessary, we need to implement a function that creates a FlowerClient instance on demand. We typically call this function client_fn. Flower calls client_fn whenever it needs an instance of one particular client to call fit or evaluate (those instances are usually discarded after use, so they should not keep any local state). In federated learning experiments using Flower, clients are identified by a partition ID, or partition-id. This partition-id is used to load different local data partitions for different clients, as can be seen below. The value of partition-id is retrieved from the node_config dictionary in the Context object, which holds the information that persists throughout each training round.

With this, we have the class FlowerClient which defines client-side training/evaluation and client_fn which allows Flower to create FlowerClient instances whenever it needs to call fit or evaluate on one particular client. Last, but definitely not least, we create an instance of ClientApp and pass it the client_fn. ClientApp is the entrypoint that a running Flower client uses to call your code (as defined in, for example, FlowerClient.fit).

[ ]:
def client_fn(context: Context) -> Client:
    """Create a Flower client representing a single organization."""

    # Load model
    net = Net().to(DEVICE)

    # Load data (CIFAR-10)
    # Note: each client gets a different trainloader/valloader, so each client
    # will train and evaluate on their own unique data partition
    # Read the node_config to fetch data partition associated to this node
    partition_id = context.node_config["partition-id"]
    trainloader, valloader, _ = load_datasets(partition_id=partition_id)

    # Create a single Flower client representing a single organization
    # FlowerClient is a subclass of NumPyClient, so we need to call .to_client()
    # to convert it to a subclass of `flwr.client.Client`
    return FlowerClient(net, trainloader, valloader).to_client()


# Create the ClientApp
client = ClientApp(client_fn=client_fn)

Define the Flower ServerApp

On the server side, we need to configure a strategy which encapsulates the federated learning approach/algorithm, for example, Federated Averaging (FedAvg). Flower has a number of built-in strategies, but we can also use our own strategy implementations to customize nearly all aspects of the federated learning approach. For this example, we use the built-in FedAvg implementation and customize it using a few basic parameters:

[ ]:
# Create FedAvg strategy
strategy = FedAvg(
    fraction_fit=1.0,  # Sample 100% of available clients for training
    fraction_evaluate=0.5,  # Sample 50% of available clients for evaluation
    min_fit_clients=10,  # Never sample less than 10 clients for training
    min_evaluate_clients=5,  # Never sample less than 5 clients for evaluation
    min_available_clients=10,  # Wait until all 10 clients are available
)

Similar to ClientApp, we create a ServerApp using a utility function server_fn. In server_fn, we pass an instance of ServerConfig for defining the number of federated learning rounds (num_rounds) and we also pass the previously created strategy. The server_fn returns a ServerAppComponents object containing the settings that define the ServerApp behaviour. ServerApp is the entrypoint that Flower uses to call all your server-side code (for example, the strategy).

[ ]:
def server_fn(context: Context) -> ServerAppComponents:
    """Construct components that set the ServerApp behaviour.

    You can use the settings in `context.run_config` to parameterize the
    construction of all elements (e.g the strategy or the number of rounds)
    wrapped in the returned ServerAppComponents object.
    """

    # Configure the server for 5 rounds of training
    config = ServerConfig(num_rounds=5)

    return ServerAppComponents(strategy=strategy, config=config)


# Create the ServerApp
server = ServerApp(server_fn=server_fn)

Run the training

In simulation, we often want to control the amount of resources each client can use. In the next cell, we specify a backend_config dictionary with the client_resources key (required) for defining the amount of CPU and GPU resources each client can access.

[ ]:
# Specify the resources each of your clients need
# By default, each client will be allocated 1x CPU and 0x GPUs
backend_config = {"client_resources": {"num_cpus": 1, "num_gpus": 0.0}}

# When running on GPU, assign an entire GPU for each client
if DEVICE.type == "cuda":
    backend_config = {"client_resources": {"num_cpus": 1, "num_gpus": 1.0}}
    # Refer to our Flower framework documentation for more details about Flower simulations
    # and how to set up the `backend_config`

The last step is the actual call to run_simulation which - you guessed it - runs the simulation. run_simulation accepts a number of arguments: - server_app and client_app: the previously created ServerApp and ClientApp objects, respectively - num_supernodes: the number of SuperNodes to simulate which equals the number of clients for Flower simulation - backend_config: the resource allocation used in this simulation

[ ]:
# Run simulation
run_simulation(
    server_app=server,
    client_app=client,
    num_supernodes=NUM_CLIENTS,
    backend_config=backend_config,
)

Dans les coulisses

Alors, comment cela fonctionne-t-il ? Comment Flower exécute-t-il cette simulation ?

When we call run_simulation, we tell Flower that there are 10 clients (num_supernodes=10, where 1 SuperNode launches 1 ClientApp). Flower then goes ahead an asks the ServerApp to issue an instructions to those nodes using the FedAvg strategy. FedAvg knows that it should select 100% of the available clients (fraction_fit=1.0), so it goes ahead and selects 10 random clients (i.e., 100% of 10).

Flower then asks the selected 10 clients to train the model. Each of the 10 ClientApp instances receives a message, which causes it to call client_fn to create an instance of FlowerClient. It then calls .fit() on each the FlowerClient instances and returns the resulting model parameter updates to the ServerApp. When the ServerApp receives the model parameter updates from the clients, it hands those updates over to the strategy (FedAvg) for aggregation. The strategy aggregates those updates and returns the new global model, which then gets used in the next round of federated learning.

Où est la précision ?

Tu as peut-être remarqué que toutes les mesures, à l’exception de pertes_distribuées, sont vides. Où est passée la {"précision" : float(précision)} ?

Flower peut automatiquement agréger les pertes renvoyées par les clients individuels, mais il ne peut pas faire la même chose pour les mesures dans le dictionnaire de mesures générique (celui avec la clé accuracy). Les dictionnaires de mesures peuvent contenir des types de mesures très différents et même des paires clé/valeur qui ne sont pas des mesures du tout, donc le cadre ne sait pas (et ne peut pas) savoir comment les gérer automatiquement.

En tant qu’utilisateurs, nous devons indiquer au framework comment gérer/agréger ces métriques personnalisées, et nous le faisons en passant des fonctions d’agrégation de métriques à la stratégie. La stratégie appellera alors ces fonctions chaque fois qu’elle recevra des métriques d’ajustement ou d’évaluation de la part des clients. Les deux fonctions possibles sont fit_metrics_aggregation_fn et evaluate_metrics_aggregation_fn.

Créons une simple fonction de calcul de la moyenne pondérée pour agréger la mesure de « précision » que nous renvoie evaluate :

[ ]:
def weighted_average(metrics: List[Tuple[int, Metrics]]) -> Metrics:
    # Multiply accuracy of each client by number of examples used
    accuracies = [num_examples * m["accuracy"] for num_examples, m in metrics]
    examples = [num_examples for num_examples, _ in metrics]

    # Aggregate and return custom metric (weighted average)
    return {"accuracy": sum(accuracies) / sum(examples)}
[ ]:
def server_fn(context: Context) -> ServerAppComponents:
    """Construct components that set the ServerApp behaviour.

    You can use settings in `context.run_config` to parameterize the
    construction of all elements (e.g the strategy or the number of rounds)
    wrapped in the returned ServerAppComponents object.
    """

    # Create FedAvg strategy
    strategy = FedAvg(
        fraction_fit=1.0,
        fraction_evaluate=0.5,
        min_fit_clients=10,
        min_evaluate_clients=5,
        min_available_clients=10,
        evaluate_metrics_aggregation_fn=weighted_average,  # <-- pass the metric aggregation function
    )

    # Configure the server for 5 rounds of training
    config = ServerConfig(num_rounds=5)

    return ServerAppComponents(strategy=strategy, config=config)


# Create a new server instance with the updated FedAvg strategy
server = ServerApp(server_fn=server_fn)

# Run simulation
run_simulation(
    server_app=server,
    client_app=client,
    num_supernodes=NUM_CLIENTS,
    backend_config=backend_config,
)

Nous avons maintenant un système complet qui effectue la formation fédérée et l’évaluation fédérée. Il utilise la fonction moyenne pondérée pour agréger les mesures d’évaluation personnalisées et calcule une seule mesure de précision pour tous les clients du côté du serveur.

Les deux autres catégories de mesures (pertes_centralisées et métriques_centralisées) sont toujours vides car elles ne s’appliquent que lorsque l’évaluation centralisée est utilisée. La deuxième partie du tutoriel sur les fleurs couvrira l’évaluation centralisée.

Remarques finales

Félicitations, tu viens d’entraîner un réseau neuronal convolutif, fédéré sur 10 clients ! Avec ça, tu comprends les bases de l’apprentissage fédéré avec Flower. La même approche que tu as vue peut être utilisée avec d’autres cadres d’apprentissage automatique (pas seulement PyTorch) et d’autres tâches (pas seulement la classification des images CIFAR-10), par exemple le NLP avec Hugging Face Transformers ou la parole avec SpeechBrain.

Dans le prochain cahier, nous allons aborder des concepts plus avancés. Tu veux personnaliser ta stratégie ? Initialiser des paramètres côté serveur ? Ou évaluer le modèle agrégé côté serveur ? Nous aborderons tout cela et bien plus encore dans le prochain tutoriel.

Prochaines étapes

Before you continue, make sure to join the Flower community on Flower Discuss (Join Flower Discuss) and on Slack (Join Slack).

Il existe un canal dédié aux questions si vous avez besoin d’aide, mais nous aimerions aussi savoir qui vous êtes dans #introductions !

The Flower Federated Learning Tutorial - Part 2 goes into more depth about strategies and all the advanced things you can build with them.


Open in Colab