FlowerTune LLM: Federated LLM Fine-tuning with FlowerΒΆ
Large language models (LLMs), which have been trained on vast amounts of publicly accessible data, have shown remarkable effectiveness in a wide range of areas. However, despite the fact that more data typically leads to improved performance, there is a concerning prospect that the supply of high-quality public data will deplete within a few years. Federated LLM training could unlock access to an endless pool of distributed private data by allowing multiple data owners to collaboratively train a shared model without the need to exchange raw data.
This introductory example conducts federated instruction tuning with pretrained OpenLLaMA models on Alpaca-GPT4 dataset. We implement FlowerTune LLM by integrating a bundle of techniques: 1) We use Flower Datasets to download, partition and preprocess the dataset. 2) The fine-tuning is done using the π€PEFT library. 3) We use Flowerβs Simulation Engine to simulate the LLM fine-tuning process in federated way, which allows users to perform the training on a single GPU.
Set up the projectΒΆ
Start by cloning the example project:
git clone --depth=1 https://github.com/adap/flower.git _tmp \
&& mv _tmp/examples/flowertune-llm . \
&& rm -rf _tmp \
&& cd flowertune-llm
This will create a new directory called flowertune-llm
with the following structure:
flowertune-llm
βββ flowertune_llm
β βββ __init__.py
β βββ client_app.py # Defines your ClientApp
β βββ server_app.py # Defines your ServerApp
β βββ dataset.py # Defines your dataset and tokenizer
β βββ models.py # Defines your models
β
βββ pyproject.toml # Project metadata like dependencies and configs
βββ test.py # Test pre-trained model
βββ README.md
Install dependencies and projectΒΆ
Install the dependencies defined in pyproject.toml
as well as the flowertune_llm
package.
pip install -e .
Run the projectΒΆ
You can run your Flower project in both simulation and deployment mode without making changes to the code. If you are starting with Flower, we recommend you using the simulation mode as it requires fewer components to be launched manually. By default, flwr run
will make use of the Simulation Engine.
Run with the Simulation EngineΒΆ
flwr run .
This command will run FL simulations with a 4-bit OpenLLaMA 3Bv2 model involving 2 clients per rounds for 100 FL rounds. You can override configuration parameters directly from the command line. Below are a few settings you might want to test:
# Use OpenLLaMA-7B instead of 3B and 8-bits quantization
flwr run . --run-config "model.name='openlm-research/open_llama_7b_v2' model.quantization=8"
# Run for 50 rounds but increasing the fraction of clients that participate per round to 25%
flwr run . --run-config "num-server-rounds=50 strategy.fraction-fit=0.25"
Run with the Deployment EngineΒΆ
[!NOTE] An update to this example will show how to run this Flower application with the Deployment Engine and TLS certificates, or with Docker.
Expected resultsΒΆ
As expected, OpenLLaMA-7B model works better than its 3B version with lower training loss. With the hyperparameters tested, the 8-bit model seems to deliver lower training loss for the smaller 3B model compared to its 4-bit version.
VRAM consumptionΒΆ
Models |
7-billion (8-bit) |
7-billion (4-bit) |
3-billion (8-bit) |
3-billion (4-bit) |
---|---|---|---|---|
VRAM |
~22.00 GB |
~16.50 GB |
~13.50 GB |
~10.60 GB |
We make use of the bitsandbytes library in conjunction with PEFT to derive LLMs that can be fine-tuned efficiently.
The above table shows the VRAM consumption per client for the different models considered in this example.
You can adjust the CPU/GPU resources you assign to each of the clients based on your device.
For example, it is easy to train 2 concurrent clients on each GPU (24 GB VRAM) if you choose 3-billion (4-bit) model.
Assigning 50% of the GPUβs VRAM to each client by setting options.backend.clientapp-gpus = 0.5
under [tool.flwr.federations.local-simulation]
in pyproject.toml
.
Test with your QuestionsΒΆ
We provide a script to test your trained model by passing your specified questions. For example:
python test.py --peft-path=/path/to/trained-model-dir/ \
--question="What is the ideal 1-day plan in London?"
An answer generated from federated trained 7-billion (8-bit) OpenLLaMA model:
Great choice.
London has so much to offer, and you can really soak up all the sights and sounds in just a single day.
Here's a suggested itinerary for you.
Start your day off with a hearty breakfast at an authentic British diner.
Then head to the iconic Big Ben and the Houses of Parliament to learn about the history of the city.
Next, make your way to Westminster Abbey to see the many historical monuments and memorials.
From there, cross the river Thames to the Tower of London, which is home to the Crown Jewels of England and Scotland.
Finally, end your day with a relaxing visit to the London Eye, the tallest Ferris wheel in Europe, for a beautiful view of the city.
The Vicuna
template we used in this example is for a chat assistant.
The generated answer is expected to be a multi-turn conversations. Feel free to try more interesting questions!