@cuberick/tinyllama-1-1b-chat-v1-0

Model Details

Our method is based on TinyLlama/TinyLlama-1.1B-Chat-v1.0.

How to Get Started with the Model

First, set up the enviroment following the main README.md file.

Then use the code below to get started with the model.

flwr run .

Training Details

Training Data

We train with the default supplied data as:

dataset.name = "flwrlabs/code-alpaca-20k"

Training Hyperparameters

model.name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model.quantization = 4
model.gradient-checkpointing = true
model.lora.peft-lora-r = 8
model.lora.peft-lora-alpha = 16 
train.save-every-round = 5
train.learning-rate-max = 5e-5
train.learning-rate-min = 1e-6
train.seq-length = 512
train.training-arguments.output-dir = ""
train.training-arguments.learning-rate = ""
train.training-arguments.per-device-train-batch-size = 16 
train.training-arguments.gradient-accumulation-steps = 1
train.training-arguments.logging-steps = 10
train.training-arguments.num-train-epochs = 3
train.training-arguments.max-steps = 10
train.training-arguments.save-steps = 1000
train.training-arguments.save-total-limit = 10
train.training-arguments.gradient-checkpointing = true
train.training-arguments.lr-scheduler-type = "constant"
strategy.fraction-fit = 0.2
strategy.fraction-evaluate = 0.0
num-server-rounds = 200

Communication Cost

3448 MB

Evaluation

Download the checkpoints at this link.

Procedures

See this for downloading the necessary packages and eval script. Below, we provide the commands to run the evaluations on each metric respectively.

For TinyLlama/TinyLlama-1.1B-Chat-v1.0 results:

# humaneval
# pass@1": 0.12195121951219512
python main.py \
--model=TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
--peft_model=path_to_the_model/peft_200  \
--max_length_generation=1024  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=humaneval \
--metric_output_path=./tinyllama1.1b/evaluation_results_humaneval.json

# mbpp
#    "pass@1": 0.026
python main.py \
--model=TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
--peft_model=path_to_the_model/peft_200  \
--max_length_generation=2048  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=mbpp \
--metric_output_path=./tinyllama1.1b/evaluation_results_mbpp.json


# multiple-js
# "pass@1": 0.09937888198757763
python main.py \
--model=TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
--peft_model=path_to_the_model/peft_200  \
--max_length_generation=1024  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=multiple-js \
--metric_output_path=./tinyllama1.1b/evaluation_results_multiple_js.json


# multiple-cpp
# "pass@1": 0.09937888198757763
python main.py \
--model=TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
--peft_model=path_to_the_model/peft_200  \
--max_length_generation=1024  \
--batch_size=4 \
--use_auth_token \
--allow_code_execution \
--save_generations  \
--save_references \
--tasks=multiple-cpp \
--metric_output_path=./tinyllama1.1b/evaluation_results_multiple_cpp.json

Results

Average: 8.67

MBPP: 2.60

HumanEval: 12.20

MultiPL-E (JS): 9.94

MultiPL-E (C++): 9.94

Framework versions

PEFT 0.14.0