OpenFL Migration Guide

It was recently announced that The Open Federated Learning project (formerly known as OpenFL) is no longer being developed or maintained. This guide, written in collaboration with the OpenFL developers, aims to create an easy path for OpenFL users to bring their workloads into Flower.

Creating a Flower App for OpenFL code

Let’s start by creating a Flower app where the OpenFL code can be migrated to.

Install dependencies

First, we install the Flower package flwr:

# In a new Python environment
$ pip install -U "flwr[simulation]"

Then, we create a new Flower app called flower-tutorial using the PyTorch template. We also specify a username (flwrlabs) for the project:

$ flwr new flower-tutorial --framework pytorch --username flwrlabs

After running the command, a new directory called flower-tutorial will be created. Here is a comparison between it and the relevant files in a typical openfl-example folder:

OpenFL Example
openfl-example
├── requirements.txt
├── .workspace
├── plan
│   ├── plan.yaml
│   ├── cols.yaml
│   ├── defaults
│   └── data.yaml
├── logs
├── cert
├── save
├── data
└── src
    ├── __init__.py
    ├── taskrunner.py
    ├── utils.py
    └── dataloader.py
Flower Tutorial
flower-tutorial
├── flower_tutorial
│   ├── __init__.py
│   ├── client_app.py
│   ├── server_app.py
│   └── task.py
├── pyproject.toml
└── README.md

Let’s start with an overview of which areas of OpenFL and Flower directory structures you’ll want to focus on. We will go through these in depth in later sections of the guide:

  • Model: In OpenFL, the model is usually defined in taskrunner.py. In Flower, the model definition is usually located in task.py.

  • Train and Evaluate Functions: In OpenFL, these are part of the TaskRunner subclass in taskrunner.py. For Flower, you’ll find these in client_app.py and identified beneath the @app.train() and @app.evaluate decorators.

  • Aggregation Functions: In OpenFL, most examples use the WeightedAverage() aggregation algorithm by default. If you’re using a different aggregation algorithm, you’ll find it in plan.yaml by searching for aggregation_type. In Flower, the aggregation algorithm is defined as a Strategy.

Migrate your model

The model is very straightforward to port from OpenFL to Flower. If you are working with a PyTorch model, OpenFL has a PyTorchTaskRunner that inherits from nn.module (in taskrunner.py) - and includes other things like the train and validate functions. Flower assumes you bring a standard PyTorch model, so it’s as easy as moving the model definition to task.py in the flower_tutorial directory, and changing the inheritance of the Net back to nn.module. For a concrete example, see the following OpenFL TaskRunner code snippet:

# OpenFL PyTorch TaskRunner
class PyTorchCNN(PyTorchTaskRunner):
    """
    Simple CNN for classification.

    PyTorchTaskRunner inherits from nn.module, so you can define your model
    in the same way that you would for PyTorch
    """

    def __init__(self, device="cpu", **kwargs):
        """Initialize.

        Args:
            device: The hardware device to use for training (Default = "cpu")
            **kwargs: Additional arguments to pass to the function

        """
        super().__init__(device=device, **kwargs)

        # Define the model
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.to(device)

        # `self.optimizer` must be set for optimizer weights to be federated
        self.optimizer = optim.Adam(self.parameters(), lr=1e-4)

        # Set the loss function
        self.loss_fn = F.cross_entropy

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x: Data input to the model for the forward pass
        """
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

    def train_(
        self, train_dataloader: Iterator[Tuple[np.ndarray, np.ndarray]]
    ) -> Metric:
        """TaskRunner train function"""
        ...

    def validate_(
        self, valid_dataloader: Iterator[Tuple[np.ndarray, np.ndarray]]
    ) -> Metric:
        """TaskRunner validation function"""
        ...

And the corresponding PyTorch model used by Flower:

# Standard PyTorch model definition in Flower (Found in task.py)
class Net(nn.Module):
    """Model (simple CNN adapted from 'PyTorch: A 60 Minute Blitz')"""

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

Migrate your training and test functions

Recent versions of OpenFL had a simple way of defining training and evaluation functions. The setting and extraction of model weights was hidden from users, and a list of Metric values resulting from training or validation could be explicitly returned from the function. To make migration easy, see the highlighted blocks that can carry over directly to the Flower client_app.py file:

from openfl.federated import PyTorchTaskRunner
from openfl.utilities import Metric


class PyTorchCNN(PyTorchTaskRunner):
    """
    Simple CNN for classification.

    """

    def __init__(self, device="cpu", **kwargs):
        # Model definition
        ...

    def forward(self, x):
        # Forward function definition
        ...

    def train_(
        self, train_dataloader: Iterator[Tuple[np.ndarray, np.ndarray]]
    ) -> Metric:
        """
        Train single epoch.

        Override this function in order to use custom training.

        Args:
            train_dataloader: Train dataset batch generator. Yields (samples, targets) tuples of
            size = `self.data_loader.batch_size`.
        Returns:
            Metric: An object containing name and np.ndarray value.
        """
        losses = []
        for data, target in train_dataloader:
            data, target = data.to(self.device), target.to(self.device)
            self.optimizer.zero_grad()
            output = self(data)
            loss = self.loss_fn(output, target)
            loss.backward()
            self.optimizer.step()
            losses.append(loss.detach().cpu().numpy())
        loss = np.mean(losses)
        return Metric(name=self.loss_fn.__name__, value=np.array(loss))

    def validate_(
        self, validation_dataloader: Iterator[Tuple[np.ndarray, np.ndarray]]
    ) -> Metric:
        """
        Perform validation on PyTorch Model

        Override this function for your own custom validation function

        Args:
            validation_dataloader: Validation dataset batch generator.
                                   Yields (samples, targets) tuples
        Returns:
            Metric: An object containing name and np.ndarray value
        """

        total_samples = 0
        val_score = 0
        with torch.no_grad():
            for data, target in validation_dataloader:
                samples = target.shape[0]
                total_samples += samples
                data, target = data.to(self.device), target.to(
                    self.device, dtype=torch.int64
                )
                output = self(data)
                # get the index of the max log-probability
                pred = output.argmax(dim=1)
                val_score += pred.eq(target).sum().cpu().numpy()

        accuracy = val_score / total_samples
        return Metric(name="accuracy", value=np.array(accuracy))

In Flower more control is given to users by default. With the introduction of the Message API, the training and validation functions are assumed to be stateless, so there is some initialization that must be handled by user code. The good news is that this setup is standard and quite reusable across examples. Let’s see how the relevant OpenFL train_ function fits into Flower:

# client_app.py

...


@app.train()
def train(msg: Message, context: Context):
    """Train the model on local data."""

    # Load the model and initialize it with the received weights
    model = Net()
    model.load_state_dict(msg.content["arrays"].to_torch_state_dict())
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)

    # Load the data
    partition_id = context.node_config["partition-id"]
    num_partitions = context.node_config["num-partitions"]
    batch_size = context.run_config["batch-size"]
    trainloader, _ = load_data(partition_id, num_partitions, batch_size)

    # Adapt the OpenFL training function here
    ##############################################
    criterion = torch.nn.CrossEntropyLoss().to(device)
    lr = msg.content["config"]["lr"]
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.9)

    losses = []
    for data, target in trainloader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        losses.append(loss.detach().cpu().numpy())
    train_loss = np.mean(losses)
    #############################################

    # Construct and return reply Message
    model_record = ArrayRecord(model.state_dict())
    metrics = {
        "train_loss": train_loss,
        "num-examples": len(trainloader.dataset),
    }
    metric_record = MetricRecord(metrics)
    content = RecordDict({"arrays": model_record, "metrics": metric_record})
    return Message(content=content, reply_to=msg)

Notice the model is reininitialized, the dataloader is initialized and configured, and hyperparameters are each set before the core training operation begins. At the conclusion of the training, the model weights are extracted and packed into an ArrayRecord and the model metrics are captured in a MetricRecord. It’s necessary to also send the num-examples as a metric, as this is needed for capturing the weight to give to the model parameters for FedAvg.

Here is the corresponding evaluation function, with the highlighted area representing the migrated code from OpenFL:

@app.evaluate()
def evaluate(msg: Message, context: Context):
    """Evaluate the model on local data."""

    # Load the model and initialize it with the received weights
    model = Net()
    model.load_state_dict(msg.content["arrays"].to_torch_state_dict())
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)

    # Load the data
    partition_id = context.node_config["partition-id"]
    num_partitions = context.node_config["num-partitions"]
    batch_size = context.run_config["batch-size"]
    _, valloader = load_data(partition_id, num_partitions, batch_size)

    # Adapt the OpenFL evaluation function here
    ########################################################
    total_samples = 0
    val_score = 0
    with torch.no_grad():
        for data, target in valloader:
            samples = target.shape[0]
            total_samples += samples
            data, target = data.to(device), target.to(self.device, dtype=torch.int64)
            output = model(data)
            # get the index of the max log-probability
            pred = output.argmax(dim=1)
            val_score += pred.eq(target).sum().cpu().numpy()

    eval_acc = val_score / total_samples
    ########################################################

    # Construct and return reply Message
    metrics = {
        "eval_acc": eval_acc,
        "num-examples": len(valloader.dataset),
    }
    metric_record = MetricRecord(metrics)
    content = RecordDict({"metrics": metric_record})
    return Message(content=content, reply_to=msg)

The code can be mostly pasted in unmodified! There are a few references to cleanup (i.e. changing self to model) to fit with the Flower variables, but the logic remains the same.

Migrating the Data Loaders

Unlike OpenFL, Flower does not require that you use their own Dataloaders when developing your application. This means you can simply DataLoaders in the same way that you would for PyTorch, Tensorflow, or any other framework. For research and experimentation purposes, a single dataset can be sharded into multiple partitions. This information is passed to each ClientApp through the Context:

# In client_app.py
@app.train()
def train(msg: Message, context: Context):
    ...

    # Load the data
    partition_id = context.node_config["partition-id"]
    num_partitions = context.node_config["num-partitions"]
    trainloader, _ = load_data(partition_id, num_partitions)

Flower also has its own library for partitioning single datasets in distributions representative of what can be expected in real world settings. For more information, see the flwr-datasets documentation for details.

Client-side Code

In OpenFL, the client side code was known as a Collaborator. In Flower, the application that data owners operate is referred to as a ClientApp. Each of the files referred to so far (client_app.py, task.py) are launched by the clients using the flwr run command. Beyond the code that is defined, Flower has the ability to insert dynamic changes through a configuration file, called pyproject.toml. This can include application specific changes like hyperparameters, but also other information like ServerApp address, etc. Importantly, this file is shared between parties operating the ClientApp and ServerApp. This concept directly maps to the Federated Learning Plan (FLPlan) concept in OpenFL captured in the plan.yaml file of every workspace.

# Flower pyproject.toml

...

[tool.flwr.app.config]
num-server-rounds = 3
fraction-evaluate = 0.5
local-epochs = 1
learning-rate = 0.1
batch-size = 32

...

Server-side Code

In OpenFL, all of the aggregator-side code is configured via the plan.yaml file through the specification of different arguments. In Flower, the exact tasks performed by the server are more configurable through code. For example, aggregation algorithms are added through a Strategy, and the logic to save models is added explictly. Here is a ServerApp (akin to an OpenFL Aggregator) compatible with the prior code snippets:

import torch
from flwr.app import ArrayRecord, ConfigRecord, Context, MetricRecord
from flwr.serverapp import Grid, ServerApp
from flwr.serverapp.strategy import FedAvg

from pytorchexample.task import Net, load_centralized_dataset, test

# Create ServerApp
app = ServerApp()


@app.main()
def main(grid: Grid, context: Context) -> None:
    """Main entry point for the ServerApp."""

    # Read run config
    fraction_evaluate: float = context.run_config["fraction-evaluate"]
    num_rounds: int = context.run_config["num-server-rounds"]
    lr: float = context.run_config["learning-rate"]

    # Load global model
    global_model = Net()
    arrays = ArrayRecord(global_model.state_dict())

    # Initialize FedAvg strategy
    strategy = FedAvg(fraction_evaluate=fraction_evaluate)

    # Start strategy, run FedAvg for `num_rounds`
    result = strategy.start(
        grid=grid,
        initial_arrays=arrays,
        train_config=ConfigRecord({"lr": lr}),
        num_rounds=num_rounds,
        evaluate_fn=global_evaluate,
    )

    # Save final model to disk
    print("\nSaving final model to disk...")
    state_dict = result.arrays.to_torch_state_dict()
    torch.save(state_dict, "final_model.pt")

You’ll notice that most ServerApp examples have specific logic for working with a given deep learning framework (in this case PyTorch) due to the saving of a final model. This functionality is optional, but mirrors the automatic saving of a model at the end of an OpenFL experiment. This ServerApp change requires only a few lines of modifications, and Flower has support for an extensive set of deep learning frameworks in it’s examples (Tensorflow, FastAI, Huggingface, etc.) should you need reference code.

추가 도움말

For a complete PyTorch example that goes into depth on various Flower components, see the Get started with Flower tutorial. While we expect this guide will help most users get migrated quickly to the Flower ecosystem, certain complex OpenFL workloads may require more clarification or help. If you have further questions, join the Flower Slack (and use the channel #questions) or join our OpenFL Continuity Program to get in touch with our team!

중요

As we work with the OpenFL community, we’ll be periodically updating this guide. Please feel free to share any feedback with us!

행복한 마이그레이션! 🚀