Code LLM Leaderboard

Embrace Federated LLM Fine-Tuning and Secure Your Spot on the Leaderboard!

← Scroll →

Rank	Team	Base Model	Comm. Costs	Average (↑)	MBPP	HumanEval	MultiPL-E (JS)	MultiPL-E (C++)	Code	Date
1	Massimo R. Scamarcia	Qwen3-4B	5.3 GB	60.45	54.80	64.63	61.49	60.87	link	10.05.25
2	CAR@AIML	deepseek-coder-7b-instruct-v1.5	2.9 GB	58.77	56.80	64.63	55.90	57.76	link	25.04.25
3	FL-finetune-JB-DC	Qwen2.5-Coder-7B-Instruct	1.5 GB	56.08	64.00	27.43	72.67	60.24	link	20.04.25
4	FL-finetune-JB-DC	Qwen2.5-Coder-7B-Instruct	1.5 GB	54.55	64.00	23.17	70.18	60.86	link	20.04.25
5	Massimo R. Scamarcia	Phi-4-mini-instruct	3.6 GB	49.00	46.20	59.76	52.79	37.27	link	14.03.25
6	CAR@AIML	starcoder2-7b	35.6 GB	44.08	48.60	45.73	38.5	43.48	link	25.04.25
7	Massimo R. Scamarcia	Qwen2.5-7B-Instruct	2.0 GB	34.40	48.00	25.00	10.56	54.04	link	20.03.25
8	Massimo R. Scamarcia	Qwen2.5-Coder-0.5B-Instruct	8.7 GB	34.34	25.60	37.81	41.00	32.92	link	27.01.25
9	CAR@AIML	CodeLlama-7b-hf	25.0 GB	33.78	38.26	31.71	33.54	31.68	link	25.04.25
10	Baseline	Llama-3.2-3B	27.4 GB	28.33	33.80	31.71	24.84	22.98	link	09.12.24
11	Baseline	Mistral-7B-v0.3	40.7 GB	27.36	31.60	23.78	28.57	25.47	link	01.10.24
12	Massimo R. Scamarcia	SmolLM2-1.7B-Instruct	16.7 GB	26.93	35.00	30.49	18.63	23.60	link	20.03.25
13	Massimo R. Scamarcia	Qwen2.5-1.5B-Instruct	3.7 GB	17.82	23.60	7.93	13.66	26.09	link	20.03.25
14	Massimo R. Scamarcia	Qwen2.5-0.5B-Instruct	4.0 GB	16.49	20.40	22.56	6.83	16.15	link	20.03.25
15	T-IoI@UNR	SmolLM2-360M-Instruct	2.3 GB	14.65	18.60	17.68	12.42	9.93	link	14.04.25
16	T-IoI@UNR	SmolLM2-360M	2.3 GB	12.92	17.20	14.02	11.80	8.69	link	14.04.25
17	CAR@AIML	TinyLlama-1.1B-Chat-v1.0	3.4 GB	8.67	2.60	12.20	9.94	9.94	link	25.04.25
18	T-IoI@UNR	SmolLM2-135M-Instruct	1.4 GB	5.80	7.20	5.48	5.59	4.96	link	14.04.25
19	T-IoI@UNR	SmolLM2-135M	1.4 GB	4.51	2.60	3.04	6.21	6.21	link	14.04.25

Software development and programming are increasingly complex and diverse, requiring tools that understand code context, syntax, and semantics. Federated LLM fine-tuning on coding tasks enables the collaborative improvement of models that assist in code generation, bug fixing, and even educational purposes across various programming languages and development environments. By training models across a federation of data sources from different coding projects and repositories, we ensure that the resulting coding assistants are versatile and sensitive to the subtleties of programming paradigms and practices.

👉 Check out the other Leaderboards

General NLP

Finance

Medical

Back to overview

Code LLM Leaderboard

👉 Check out the other Leaderboards

General NLP

Finance

Medical

Documentation

Events

Other

Social

Legal