Custom Metrics for Federated Learning with TensorFlow and FlowerΒΆ

View on GitHub

This simple example demonstrates how to calculate custom metrics over multiple clients beyond the traditional ones available in the ML frameworks. In this case, it demonstrates the use of ready-available scikit-learn metrics: accuracy, recall, precision, and f1-score.

Once both the test values (y_test) and the predictions (y_pred) are available on the client side (client_app.py), other metrics or custom ones are possible to be calculated.

The main takeaways of this implementation are:

  • the return of multiple evaluation metrics generated at the evaluate method on client_app.py

  • the use of the evaluate_metrics_aggregation_fn - to aggregate the metrics on the server side, part of the strategy on server_app.py

This example is based on the quickstart-tensorflow with CIFAR-10, source here, with the addition of Flower Datasets to retrieve the CIFAR-10.

Using the CIFAR-10 dataset for classification, this is a multi-class classification problem, thus some changes on how to calculate the metrics using average='micro' and np.argmax is required. For binary classification, this is not required. Also, for unsupervised learning tasks, such as using a deep autoencoder, a custom metric based on reconstruction error could be implemented on client side.

Set up the projectΒΆ

Clone the projectΒΆ

Start by cloning the example project:

git clone --depth=1 https://github.com/adap/flower.git _tmp \
              && mv _tmp/examples/custom-metrics . \
              && rm -rf _tmp && cd custom-metrics

This will create a new directory called custom-metrics containing the following files:

custom-metrics
β”œβ”€β”€ README.md
β”œβ”€β”€ custommetrics_example
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ client_app.py   # Defines your ClientApp
β”‚   β”œβ”€β”€ server_app.py   # Defines your ServerApp
β”‚   └── task.py         # Defines your model and dataloading functions
└── pyproject.toml      # Project metadata like dependencies and configs

Install dependencies and projectΒΆ

Install the dependencies defined in pyproject.toml as well as the custommetrics_example package.

pip install -e .

Run the ExampleΒΆ

You can run your Flower project in both simulation and deployment mode without making changes to the code. If you are starting with Flower, we recommend you using the simulation mode as it requires fewer components to be launched manually. By default, flwr run will make use of the Simulation Engine.

Run with the Simulation EngineΒΆ

flwr run .

You can also override some of the settings for your ClientApp and ServerApp defined in pyproject.toml. For example:

flwr run . --run-config num-server-rounds=5

Run with the Deployment EngineΒΆ

[!NOTE] An update to this example will show how to run this Flower application with the Deployment Engine and TLS certificates, or with Docker.