Flower AI Summit 2026·April 15–16·London

@heart-ai-lab/fedllm-medical-biomedllama

Federated LLM Fine-tuning for Medical Question-answering (Llama-3-8B)

Publisher@heart-ai-lab
Downloads0
Runs0

Quickstart

flwr new @heart-ai-lab/fedllm-medical-biomedllama

Readme

FlowerTune LLM on Medical Dataset

This directory conducts federated instruction tuning with DoRA a pretrained ContactDoctor/Bio-Medical-Llama-3-8B model on a Medical dataset. We use Flower Datasets to download, partition and preprocess the dataset. Flower's Simulation Engine is used to simulate the LLM fine-tuning process in federated way, which allows users to perform the training on a single GPU.

PEFT Adapter

The fine-tuning results have been submitted as a PEFT adapter and can be accessed here:

Methodology

This experiment performs federated LLM fine-tuning using the 🤗PEFT library.

The clients' models are aggregated with FedProx strategy.

Bio-Medical-Llama-3-8B

For the Bio-Medical-Llama-3-8B Instruct model, we adopted the following fine-tuning methodology:

  • Precision: bf16 for model weights, tf32 for gradients and optimizer states.
  • Quantization: 4-bit quantization for reduced memory usage.
  • Optimizer: Paged AdamW 8-bit for effective optimization under constrained resources.
  • LoRA Configuration:
    • Rank (r): 32
    • Alpha: 64
    • Target Modules: q_proj, v_proj
  • Training Configuration:
    • Batch size: 16
    • Maximum number of steps: 6
    • Warmup steps: 2
    • Total number of rounds: 10
    • Fraction fit per round: 0.15
  • Learning Rate Scheduler: Constant learning rate scheduler with warmup steps, where:
    • Maximum LR: 5e-5
    • Minimum LR: 1e-6
  • Strategy: FedProx

When bf16 and tf32 are enabled, model weights are stored in bf16 format, while gradients are computed in half-precision and converted to full 32-bit precision for updates.

Hardware: NVIDIA A100 (1x GPU)

  • pubmedqa: 0.6960
  • medmcqa: 0.5912
  • medqa: 0.6394
  • careqa: 0.5318
  • average: 0.6146

Communication Budget

3000.96 Megabytes

Fetch the app

Install Flower:

pip install flwr

Fetch the app:

flwr new @heart-ai-lab/fedllm-medical-biomedllama

Environments setup

Project dependencies are defined in pyproject.toml. Install them in an activated Python environment with:

pip install -e .

Tip: Learn how to configure your pyproject.toml file for Flower apps in this guide.

Experimental setup

The dataset is divided into 20 partitions in an IID fashion, a partition is assigned to each ClientApp. We randomly sample a fraction (0.15) of the total nodes to participate in each round, for a total of 10 rounds. All settings are defined in pyproject.toml.

Before proceeding you need to create a new SuperLink connection and define 20 virtual SuperNodes. To do this, let's first locate the Flower Configuration file and then edit it.

Locate the Flower Configuration file:

flwr config list
# Example output:
Flower Config file: /path/to/your/.flwr/config.toml
SuperLink connections:
 supergrid
 local (default)

Add a new connection named flowertune and make it the default.

[superlink.flowertune]
options.num-supernodes = 20
options.backend.client-resources.num-cpus = 6
options.backend.client-resources.num-gpus = 1.0

IMPORTANT

Please note that [tool.flwr.app.config.static] are not allowed to be modified for fair competition if you plan to participated in the LLM leaderboard. Additionally, the number of supernodes (i.e. options.num-supernodes) must be 20.

Running the app

First make sure that you have got the access to Bio-Medical-Llama-3-8B model with your Hugging-Face account. You can request access directly from the Hugging-Face website. Then, follow the instruction here to log in your account. Note you only need to complete this stage once in your development machine:

hf auth login

Run the challenge with default config values. The configs are defined in [tool.flwr.app.config] entry of pyproject.toml, and are loaded automatically.

flwr run

You can adjust the CPU/GPU resources you assign to each of the clients based on your device, which are specified with options.backend.client-resources.num-cpus and options.backend.client-resources.num-gpus in your flowertune connection in your config.toml.

Run with the Deployment Engine

To run this App using Flower's Deployment Engine we recommend first creating some demo data using Flower Datasets. For example:

# Install Flower datasets
pip install "flwr-datasets"

# Create dataset partitions and save them to disk
flwr-datasets create flwrlabs/medical-meadow-medical-flashcards --num-partitions 20 --out-dir demo_data

The above command will create 20 IID partitions of the medical-flashcards dataset and save them in a demo_data directory. Next, you can pass one partition to each of your SuperNodes like this:

flower-supernode \
    --insecure \
    --superlink <SUPERLINK-FLEET-API> \
    --node-config="data-path=/path/to/demo_data/partition_0"

Finally, ensure the environment of each SuperNode has all dependencies installed. Then, launch the run via flwr run but pointing to a SuperLink connection that specifies the SuperLink your SuperNode is connected to:

flwr run . <SUPERLINK-CONNECTION> --stream

TIP

Follow this how-to guide to run the same app in this example but with Flower's Deployment Engine. After that, you might be interested in setting up secure TLS-enabled communications and SuperNode authentication in your federation.

Model saving

The global PEFT model checkpoints are saved every 5 rounds after aggregation on the sever side as default, which can be specified with train.save-every-round under [tool.flwr.app.config] entry in pyproject.toml.

NOTE

Please provide the last PEFT checkpoint if you plan to participated in the LLM leaderboard.