Federated Finetuning of a Vision Transformer with Flower¶

This example shows how to use Flower’s Simulation Engine to federate the finetuning of a Vision Transformer (ViT-Base-16) that has been pretrained on ImageNet. To keep things simple we’ll be finetuning it to Oxford Flower-102 datasset, creating 20 partitions using Flower Datasets. We’ll be finetuning just the exit head of the ViT, this means that the training is not that costly and each client requires just ~1GB of VRAM (for a batch size of 32 images) if you choose to use a GPU.

Set up the project¶

Clone the project¶

Start by cloning the example project:

git clone --depth=1 https://github.com/adap/flower.git _tmp \
        && mv _tmp/examples/flowertune-vit . \
        && rm -rf _tmp \
        && cd flowertune-vit

This will create a new directory called flowertune-vit with the following structure:

flowertune-vit
├── vitexample
│   ├── __init__.py
│   ├── client_app.py   # Defines your ClientApp
│   ├── server_app.py   # Defines your ServerApp
│   └── task.py         # Defines your model, training and data loading
├── pyproject.toml      # Project metadata like dependencies and configs
└── README.md

Install dependencies and project¶

Install the dependencies defined in pyproject.toml as well as the vitexample package.

pip install -e .

Run the project¶

You can run your Flower project in both simulation and deployment mode without making changes to the code. If you are starting with Flower, we recommend you using the simulation mode as it requires fewer components to be launched manually. By default, flwr run will make use of the Simulation Engine.

Run with the Simulation Engine¶

[!TIP] This example runs faster when the ClientApps have access to a GPU. If your system has one, you can make use of it by configuring the backend.client-resources component in your Flower Configuration. Check the Simulation Engine documentation to learn more about Flower simulations and how to optimize them.

flwr run .

You can also override some of the settings for your ClientApp and ServerApp defined in pyproject.toml. For example:

flwr run . --run-config "num-server-rounds=5 batch-size=64"

You can add a new connection in your Flower Configuration (find if via flwr config list):

[superlink.local-gpu]
options.num-supernodes = 10
options.backend.client-resources.num-cpus = 2 # each ClientApp assumes to use 2CPUs
options.backend.client-resources.num-gpus = 0.2 # at most 5 ClientApp will run in a given GPU

And then run the app

# Run with the `local-gpu` connection
flwr run . local-gpu

Running the example as-is on an RTX 3090Ti should take ~15s/round running 5 clients in parallel (plus the global model during centralized evaluation stages) in a single GPU. Note that more clients could fit in VRAM, but since the GPU utilization is high (99%-100%) we are probably better off not doing that (at least in this case).

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090 Ti     Off | 00000000:0B:00.0 Off |                  Off |
| 44%   74C    P2             441W / 450W |   7266MiB / 24564MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    173812      C   python                                     1966MiB |
|    0   N/A  N/A    174510      C   ray::ClientAppActor.run                    1056MiB |
|    0   N/A  N/A    174512      C   ray::ClientAppActor.run                    1056MiB |
|    0   N/A  N/A    174513      C   ray::ClientAppActor.run                    1056MiB |
|    0   N/A  N/A    174514      C   ray::ClientAppActor.run                    1056MiB |
|    0   N/A  N/A    174516      C   ray::ClientAppActor.run                    1056MiB |
+---------------------------------------------------------------------------------------+

Run with the Deployment Engine¶

Follow this how-to guide to run the same app in this example but with Flower’s Deployment Engine. After that, you might be intersted in setting up secure TLS-enabled communications and SuperNode authentication in your federation.

If you are already familiar with how the Deployment Engine works, you may want to learn how to run it using Docker. Check out the Flower with Docker documentation.