Try our new intelligent model routing solution, Arcee Conductor. Sign up today and get a $200 credit (~400M free tokens).

Return to blog

Open-Source Toolkits

23
Apr
2024
-
5
min read

Evolutionary Model Merging For All

We've been focused on developing this groundbreaking technique for the community, and we're now excited to announce the launch of this state-of-the-art functionality in MergeKit.

Charles Goddard
,

Sakana.ai made a very big splash about a month ago, releasing a paper on Evolutionary Model Merging, and the subsequent model and eval results of this game-changing merge method. Unfortunately for the community, they never released the algorithm behind these amazing results!

Since this release, here at Arcee we've been fully focused on developing this groundbreaking technique for the community. We're now excited to announce the launch of this state-of-the-art functionality in MergeKit.

Evolutionary Model Merging lets people target specific competencies or qualities in their merges. Without it, Model Merging is an extremely manual exploratory process–trying dozens of merges, manually evaluating them, and trying to come up with a mental framework that explains how the merging parameters are related to the performance of the final model. With Evolutionary Model Merging, we can instead specify what qualities we want a model to have, and optimization will take care of it for us.

Tutorial: How to get started with Evolutionary Model Merging

I've created a tutorial to help you get started: <span class="encased">mergekit-evolve</span>.

Evolutionary Model Merging with <span class="encased">mergekit-evolve</span>

Hardware Requirements

<span class="encased">mergekit-evolve</span> needs at least one GPU. It doesn't necessarily need a huge one! You need to be able to inference a model in FP16. If you're working with models in the 7B size range, 24GB of VRAM will do just fine. If you're a big spender then you can use a Ray cluster with however many GPUs you want. For this little demo I'm using a RunPod instance with a single A100.

Installing

First let's set up our environment with an installation of mergekit. We need to use the <span class="encased">evolve</span> feature flag, and I'm using <span class="encased">vllm</span> as well because it's faster.

Defining Tasks

To optimize a merge recipe we need to first decide what exactly to optimize.<span class="encased">mergekit-evolve</span> uses EleutherAI's language model evaluation harness as a backend, so in theory all of the benchmarks supported by <span class="encased">lm-eval</span> can be used. Since I'm not an evil little gnome I'm going to define some custom tasks instead of directly optimizing against Open LLM Leaderboard scores.

Let's say that we want some spatial awareness in our model. The spartqa-mchoice dataset is a set of synthetic question-and-answer pairs involving the arrangement of objects that aims to test the spatial reasoning capabilities of language models. Let's take a random sampling of their training split and use that for one part of our scoring.

Now we need to define an <span class="encased">lm-eval</span> task that scores against this data. This can be done by writing a YAML file (and any necessary helper code). For more details on how to do this look at the New Task Guide.

In <span class="encased">/workspace/eval_tasks/spartqa_1k_train.yaml</span>:

And in <span class="encased">/workspace/eval_tasks/preprocess_spartqa.py</span>:

One common problem with merges is that the result often doesn't conform to any one particular prompting style. When manually creating merge recipes it's fairly easy to get the behavior you want by varying weights across layers, but since we're letting an algorithm optimize things let's make a silly little task for it instead. Alpaca is a very common standard and all it really needs from the model is to correctly output an EOS token after a completed response.

First, let's put together another tiny set of data for evaluating our metric with. I'll use a few hundred prompts from <span class="encased">vicgalle/alpaca-gpt4</span>.

And now the actual task definition is quite simple:

In <span class="encased">/workspace/eval_tasks/alpaca_prompt_format.yaml</span>:

There are definitely more robust ways to evaluate this but the multiple choice setup is nice in that it evaluates really quickly. Experiment at will!

Writing an Evolutionary Merge Config

We now have all the parts in place needed to actually define the merge we want to optimize. <span class="encased">mergekit-evolve</span> takes a YAML configuration file that defines what models we want to include, what merge method to use, and what tasks to optimize against.

For this example, I'm going to throw three models into the soup:

  • Hermes 2 Pro Mistral 7B, because it's a generally good model
  • Dan's Adventurous Winds Mk2 7B, because it's a really fun model but answers to no prompt format
  • Zephyr 7B beta for its quality instruction following

Most of the methods implemented by mergekit can be used. I chose Task Arithmetic pretty much arbitrarily.

asks can be weighted arbitrarily - I made the <span class="encased">spartqa_train</span> task slightly more important than <span class="encased">alpaca_prompt_format</span> purely as an example.

Running the Merge

Now we finally have all the pieces set up to actually run <span class="encased">mergekit-evolve</span>. Here's the command I used:

This will kick off the process and start merging and evaluating models. If you used the <span class="encased">--wand option</span> then metrics on the evaluated models will be reported to Weights & Biases. This can be useful to obsessively refresh while you should be doing other things.

By default <span class="encased">mergekit-evolve</span> will keep going until it has evaluated over 100 merges or you stop it with CTRL+C. You can increase this limit by passing the <span class="encased">--max-fevals</span> argument. Once the script has terminated, the mergekit configuration for the best-scoring merge will be written to <span class="encased">/workspace/evol_merge_storage/best_config.yaml</span>. You can get your final model by running it through <span class="encased">mergekit-yaml</span>, like so:

We are thrilled this novel merging technique is now available to everyone through MergeKit. We will also be integrating evolutionary model merging into the core Arcee product, which will provide a complete compute backend and remove the need to secure your own GPU's, so stay tuned for that!

Let us know how you're using <span class="encased">evolve-merge–</span> and happy merging!

Give Arcee a Try

Lorem ipsum dolor sit amet consectetur. Vitae enim libero lectus urna blandit sapien. In egestas ac dolor dictum.
Book a Demo

Sign up for the Arcee AI newsletter

Subscribe to get the latest news and insights on SLM-powered AI agents

Thank you!

We will get back
to you soon.
Oops! Something went wrong while submitting the form.