Meet MergeKit v0.1: Expanded Model Support, Arcee Fusion, & Multi-GPU Acceleration

It's been just slightly over a year since Arcee AI acquired MergeKit and joined forces with its creator, Charles Goddard. Since then, we've had an incredible year of constant innovation, collaboration with the open-source community, and productizing of model merging as we built out our world-class model training pipeline.

To mark the one-year anniversary of Arcee AI + MergeKit, we're bringing you the most significant updates to MergeKit to date. Check them out and let us know what you think and what you build. And as always, happy merging!

‍
Expanded Model Support: Merge Anything, Merge Faster

Meet MergeKit v0.1, which dramatically expands the range of models you can merge. No longer are you limited to specific architectures explicitly supported by MergeKit. This release introduces two game-changing improvements:

Arbitrary transformers models: MergeKit now seamlessly handles any model architecture supported by the popular transformers library. This means you can merge cutting-edge models as soon as they're released, including vision-language models like LLaVa or QwenVL, alongside the diverse collection of decoder-only models already supported. No more waiting for MergeKit to "catch up"–you're empowered to merge immediately.
Raw PyTorch Models: Beyond transformers models, MergeKit now supports merging raw PyTorch models. This opens up a world of possibilities, allowing you to merge models serialized with either torch.save (pickle) or safetensors. The new mergekit-pytorch entrypoint lets you merge diffusion models (like Stable Diffusion or FLUX), audio models (like Whisper), computer vision models, and virtually any other PyTorch model you can imagine. As with transformers models, the models being merged must be of the same architecture and size.

Arcee Fusion: The Art of Selective Merging

‍This release also introduces the public availability of Arcee Fusion, a sophisticated merging method previously used internally to develop our Supernova, Medius, and Virtuoso series models. Arcee Fusion takes a more intelligent approach to merging, focusing on the importance of differences between models rather than simply merging everything indiscriminately. Arcee Fusion works in three key stages:

Importance Scoring: Instead of blindly merging all parameters, Arcee Fusion calculates an importance score for each parameter, combining the absolute difference between model parameters with a divergence measure based on softmax distributions and KL divergence. This ensures that only meaningful changes are considered.
Dynamic Thresholding: The algorithm analyzes the distribution of importance scores, calculating key quantiles (median, Q1, and Q3) and setting a dynamic threshold using median + 1.5 × IQR (a standard technique for outlier detection). This intelligently filters out less significant changes.
Selective Integration: A fusion mask is created based on the importance scores and the threshold. Only the most significant elements are incorporated into the base model, ensuring that the merge process is adaptive and selective. This preserves the base model's stability while integrating the most valuable updates from the other model.

Arcee Fusion avoids the pitfalls of over-updating that can occur with simple averaging, providing a more refined and controlled merging experience. You can activate this powerful new method by specifying merge_method: arcee_fusion in your merge configuration file.

‍
Multi-GPU Execution

‍MergeKit v0.1 introduces a new --parallel flag for multi-GPU execution. If you have access to a multi-GPU environment, this flag will unlock a near-linear speedup for your merge operations. The --parallel flag is compatible with all merge methods and model types, significantly reducing merge times and boosting your productivity.

‍
Licensing: Balancing Open Access with Continued Development

‍Finally, we want to address a change to our licensing model. While we are committed to open access, we also need to ensure the long-term sustainability of MergeKit's development. Therefore, we are transitioning to a Business Source License (BSL).

‍Why are we doing this?
The techniques and methods we've developed are valuable and unique. We believe in the power of open source and want the community to benefit from our work. However, unrestricted commercial use by large entities could jeopardize our ability to continue developing and improving MergeKit. The BSL is a balanced approach that allows us to share our innovations while protecting our long-term viability.

‍What does this mean for you?
‍For the vast majority of users (personal, research, and non-commercial), nothing changes. You retain unrestricted access to MergeKit. Even most commercial users will likely be unaffected. The BSL primarily applies to large corporations and highly successful startups using MergeKit in a production setting. If this applies to you, we'll simply need to discuss a commercial license (which includes direct access to Charles Goddard and the MergeKit development team).

We believe this approach strikes the right balance between fostering open innovation and ensuring the continued growth and development of MergeKit. We want to make it clear: we want you to use MergeKit! This licensing change is about ensuring we can continue to provide you with the best possible tool for model merging. You can find MergeKit 0.1 here, and to learn more about the BSL licensing, drop a note to our team at licensing@arcee.ai.

‍

Meet MergeKit v0.1: Expanded Model Support, Arcee Fusion, & Multi-GPU Acceleration

‍
Expanded Model Support: Merge Anything, Merge Faster

Arcee Fusion: The Art of Selective Merging

‍
Multi-GPU Execution

‍
Licensing: Balancing Open Access with Continued Development

Give Arcee a Try

Related Posts

Sign up for the Arcee AI newsletter

Products

Community

Company

Resources

Meet MergeKit v0.1: Expanded Model Support, Arcee Fusion, & Multi-GPU Acceleration

‍Expanded Model Support: Merge Anything, Merge Faster

Arcee Fusion: The Art of Selective Merging

‍Multi-GPU Execution

‍Licensing: Balancing Open Access with Continued Development

Give Arcee a Try

Related Posts

Smart Marketing With AI Agents for SEO, Email, and More

The Hidden Challenges of Domain-Adapting LLMs

How to Choose Between Open Source and Closed Source LLMs: A 2024 Guide

Sign up for the Arcee AI newsletter

Products

Community

Company

Resources

‍
Expanded Model Support: Merge Anything, Merge Faster

‍
Multi-GPU Execution

‍
Licensing: Balancing Open Access with Continued Development