LLM Leaderboard

Explore the latest rankings and performance metrics of large language models evaluated with federated benchmarks.

How to Participate

Check out the four Leaderboards

How to Participate

Follow the instructions to take part in the FlowerTune LLM Leaderboard:

Choose your challenge: Join the Flower Slack and say "Hi" in channel #flowertune-llm-leaderboard.
Check out the challenge boards, then choose a challenge.
Follow the instructions in the challenge page to start your project.
Design your LLM: Propose and implement your own approach to federated fine-tuning on top of the provided template code.
Run fine-tuning experiments using flwr run ( instruction ).
Evaluate your model: Run the evaluation on your fine-tuned LLM ( instruction ).
Check out the baseline performance and current ranking status on the challenge pages.
Submit your project: Submit your project through the provided Google Form or Tencent Survey.
Secure your spot on the leaderboard!

DeepLearning.AI Courses with Andrew Ng

Learn the basics of Federated AI in our 1-hour course “Intro to Federated Learning” and expand to “Federated Fine-Tuning of LLMs.“

Enroll Now

DeepLearning.AI Courses with Andrew Ng and Flower Labs

Frequently Asked Questions

What is FlowerTune?: FlowerTune encompasses a suite of techniques designed to enable efficient fine-tuning of pre-trained foundation models in FL environments using Flower. Specifically, for the FlowerTune LLM, it incorporates several methods, including PEFT, quantization, Flower Datasets, and Flower Simulation, among others.
Can a contribution be a team effort?: Absolutely. If some colleagues want to team up and work towards implementing a challenge that is allowed, just make sure you provide the information of a primary contact person when making the submission.
Can I choose multiple challenges?: Absolutely. We encourage participants to attempt as many challenges as they wish. Additionally, for the same challenge, you can even submit multiple entries using different methods. Please note that each challenge/method requires a separate submission.
How many submissions can I make?: While there is no overall limit on the number of submissions, each team is restricted to a maximum of two submissions per week to ensure the quality of both the submissions and the review process.
What LLM base models can I use for the fine-tuning?: We do not have any restrictions on the types of base LLMs, but the parameter limit for base models is 13 billion.
How many clients can be selected for training per round?: There is no cap on the number of clients selected per round; however, there is a total communication limit of 200 gigabytes over FL rounds, which correlates with the number of clients chosen. We aim to evaluate solutions under a realistic FL environment, which is why we have a number of real-world constraints.
Given each challenge has multiple evaluation datasets, how do we rank the submissions?: We rank the submissions based on the average value derived from different evaluation datasets for each challenge. Please refer to the baseline results for detailed information.
Do I need to provide checkpoints of trained LLM?: Yes. The PEFT model checkpoints are saved on disk within the template code generated by flwr new. Please upload the PEFT checkpoint used for evaluation to a cloud drive and include the link with your submission.
How can I cite this LLM leaderboard website?: If you find our LLM leaderboard useful, please consider citing it as follows:
@misc{FlowerTune_LLM_Leaderboard, author = {Flower Team}, title = {FlowerTune LLM Leaderboard}, year = {2024}, publisher = {Flower Labs}, howpublished = {\url{https://flower.ai/benchmarks/llm-leaderboard}} } @article{2025flowertune, title = {FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models}, author = {Gao, Yan and Scamarcia, Massimo Roberto and Fernandez-Marques, Javier and others}, journal = {arXiv preprint arXiv:2506.02961}, year = {2025} }
My question is not covered here, what should I do?: Please join the Flower Slack and ask your questions in the #flowertune-llm-leaderboard channel (not in #general, please!). You can also check our dedicated Flower Discuss category to see if your question was already answered.