Arcee AI | Evolutionary Model Merging For All

Evolutionary Model Merging For All

Charles Goddard

•

April 23, 2024

We've been focused on developing this groundbreaking technique for the community, and we're now excited to announce the launch of this state-of-the-art functionality in MergeKit.

Sakana.ai made a very big splash about a month ago, releasing a paper on Evolutionary Model Merging, and the subsequent model and eval results of this game-changing merge method. Unfortunately for the community, they never released the algorithm behind these amazing results!

Since this release, here at Arcee we've been fully focused on developing this groundbreaking technique for the community. We're now excited to announce the launch of this state-of-the-art functionality in MergeKit.

Evolutionary Model Merging lets people target specific competencies or qualities in their merges. Without it, Model Merging is an extremely manual exploratory process–trying dozens of merges, manually evaluating them, and trying to come up with a mental framework that explains how the merging parameters are related to the performance of the final model. With Evolutionary Model Merging, we can instead specify what qualities we want a model to have, and optimization will take care of it for us.

Tutorial: How to get started with Evolutionary Model Merging

I've created a tutorial to help you get started:

<span class="encased">mergekit-evolve</span>.

Evolutionary Model Merging with `mergekit-evolve`

Hardware Requirements

mergekit-evolve needs at least one GPU. It doesn't necessarily need a huge one! You need to be able to inference a model in FP16. If you're working with models in the 7B size range, 24GB of VRAM will do just fine. If you're a big spender then you can use a Ray cluster with however many GPUs you want. For this little demo I'm using a RunPod instance with a single A100.

Installing

First let's set up our environment with an installation of mergekit. We need to use the evolve feature flag, and I'm using vllm as well because it's faster.

Defining Tasks

To optimize a merge recipe we need to first decide what exactly to optimize.mergekit-evolve uses EleutherAI's language model evaluation harness as a backend, so in theory all of the benchmarks supported by lm-eval can be used. Since I'm not an evil little gnome I'm going to define some custom tasks instead of directly optimizing against Open LLM Leaderboard scores.

Let's say that we want some spatial awareness in our model. The spartqa-mchoice dataset is a set of synthetic question-and-answer pairs involving the arrangement of objects that aims to test the spatial reasoning capabilities of language models. Let's take a random sampling of their training split and use that for one part of our scoring.

Now we need to define an lm-eval task that scores against this data. This can be done by writing a YAML file (and any necessary helper code). For more details on how to do this look at the New Task Guide.

<span class="encased">/workspace/eval_tasks/spartqa_1k_train.yaml</span>:

And in

<span class="encased">/workspace/eval_tasks/preprocess_spartqa.py</span>:

One common problem with merges is that the result often doesn't conform to any one particular prompting style. When manually creating merge recipes it's fairly easy to get the behavior you want by varying weights across layers, but since we're letting an algorithm optimize things let's make a silly little task for it instead. Alpaca is a very common standard and all it really needs from the model is to correctly output an EOS token after a completed response.

First, let's put together another tiny set of data for evaluating our metric with. I'll use a few hundred prompts from vicgalle/alpaca-gpt4.

And now the actual task definition is quite simple:

<span class="encased">/workspace/eval_tasks/alpaca_prompt_format.yaml</span>:

There are definitely more robust ways to evaluate this but the multiple choice setup is nice in that it evaluates really quickly. Experiment at will!

Writing an Evolutionary Merge Config

We now have all the parts in place needed to actually define the merge we want to optimize. mergekit-evolve takes a YAML configuration file that defines what models we want to include, what merge method to use, and what tasks to optimize against.

For this example, I'm going to throw three models into the soup:

Hermes 2 Pro Mistral 7B, because it's a generally good model
Dan's Adventurous Winds Mk2 7B, because it's a really fun model but answers to no prompt format
Zephyr 7B beta for its quality instruction following

Most of the methods implemented by mergekit can be used. I chose Task Arithmetic pretty much arbitrarily.

asks can be weighted arbitrarily - I made the spartqa_train task slightly more important than alpaca_prompt_format purely as an example.

Running the Merge

Now we finally have all the pieces set up to actually run mergekit-evolve. Here's the command I used:

This will kick off the process and start merging and evaluating models. If you used the --wand option then metrics on the evaluated models will be reported to Weights & Biases. This can be useful to obsessively refresh while you should be doing other things.

By default mergekit-evolve will keep going until it has evaluated over 100 merges or you stop it with CTRL+C. You can increase this limit by passing the --max-fevals argument. Once the script has terminated, the mergekit configuration for the best-scoring merge will be written to /workspace/evol_merge_storage/best_config.yaml. You can get your final model by running it through mergekit-yaml, like so:

We are thrilled this novel merging technique is now available to everyone through MergeKit. We will also be integrating evolutionary model merging into the core Arcee product, which will provide a complete compute backend and remove the need to secure your own GPU's, so stay tuned for that!

Let us know how you're using evolve-merge– and happy merging!

Related Blogs

Open-Source Toolkits

•

October 6, 2025

IBM Research Uses Arcee MergeKit in Granite 4.0 Model Development

MergeKit helped evaluate merged checkpoints and select top performers, reflecting a broader enterprise shift to open models & reproducible tooling.

Open-Source Toolkits

•

February 5, 2025

Meet MergeKit v0.1: Expanded Model Support, Arcee Fusion, & Multi-GPU Acceleration

MergeKit changed the game when it came to model merging, and today we're excited to bring you some game-changing updates to MergeKit–with what we're calling MergeKit v0.1. Starting today, you'll be able to unlock the power of model merging more than ever, with enterprise hosting, premium features, and expert support.

Open-Source Toolkits

•

August 1, 2024

DistillKit v0.1 by Arcee Labs: The Technical Paper

Read the DistillKit v0.1 by Arcee AI Technical Paper: our new open-source tool that's set to change how we create and distribute Small Language Models (SLMs).