Open-Source Toolkits
And what do we do at Arcee when an exciting new model drops? We MERGE IT on MergeKit! We walk you through the process and share the initial results.
🦙 Meta just released Llama-3 and its evals are epic!
🚀 Training and Fine-Tuning: Trained on 15T tokens and further refined with 10M human-annotated samples.🦙 Model Variants: 8B and 70B versions with both Instruct and Base formats (Note: No Mixture of Experts (MoE) versions).
📏 Benchmark Achievement: Llama-3-70B is the best open large language model on MMLU benchmark (> 80% 🤯).
💻 Coding Proficiency: Instruct 8B achieves a 62.2% score and Instruct 70B scores 81.7% on the HumanEval coding benchmark.
✍🏻 Tokenizer and Vocabulary: Utilizes a Tiktoken-based tokenizer with a vocabulary size of 128k.
🪟 Context Window: Features a default context window of 8,192, which can be extended as needed.
📐 Alignment Techniques: Employs SFT, PPO, and DPO for alignment post-training.
✅ Commercial Use: Fully permitted for commercial applications.
🤗 Available on Hugging Face.
Shout out to Philipp Schmid for doing the homework for us!
With the Llama-3 release, what do we want to do at Arcee? Merge it!🚀 Although there are no fine-tuned versions of Llama-3, we want to validate that MergeKit will support merging this great new model. We can experiment and validate by merging the Llama-3 8B Base with the Llama-3 8B Instruct versions!
First, we set up our server to be ready to go for our merge experiments:
Now, we authenticate with the HF hub:
Then, we authenticate with the HuggingFace CLI:
Enter your HF token here again when prompted.
Now, we create our merge config file. We will go with slerp merge method as an example for this experiment.
Let’s create a python file that we can run on our server to create our yaml file:
First, we can create create_yaml.py with the following code:
Now, we can run the above script by executing python create_yaml.py.
Our config file is now created.
We will now run our merge command to perform our merge:
Epic, the merge was successful!
But, Arcee has made life easier.
Alternatively, you can use our Mergekit Config Generator UI to generate your merge config. First, enter the name of the models, and their number of layers. Then, select the merge method, as well as the data type (dtype). Finally, click on the “Create config.yaml” button and there you go. The config file is ready!
Next, copy the config file and paste it to our Mergekit GUI. Enter your HF Write Token, and select a name for the merged model. Then, click on the “Merge” button. Now, the merged model is available on your repo. Congratulations!
Now, we want to create our model card in the HF hub:
Paste in the following code in the python file:
card.save('merge/README.md')
Now, to create the model card we will run:
Our model card is now created.
Now, we will push our model to the HF hub.
First, we will create our upload_to_hf.py python script:
Paste in the following code:
Now, run our upload script:
Now, we have a merged Llama-3 model in our Hugging Face hub.
You can follow the same steps to try different merge methods (e.g., Linear, SLERP, DARE-TIES, Passthrough, and Franken merging) available on MergeKit. Here is a collection of merged models using Llama-3 variants:
Our experiment aimed to showcase the seamless compatibility of MergeKit with Llama-3-based model variants. It's time to elevate our game! Start fine-tuning and merging with this phenomenal model to unlock new potentials. Get ready to dive into a world where innovation meets efficiency. Let's merge into the future with Llama-3! 🌟