Open-Source SLMs
Here at Arcee AI, we're going beyond the hype and speculation surrounding the DeepSeek R-1 release. We're doing what we always do: work hard on training models. Soon we will deliver you disillations of R-1, and in the meantime, we're bringing you two distilled versions (10B, 32B) of DeepSeek-V3.
Over the past few days, the reactions to DeepSeek-R1 have been dramatic—with some voices welcoming it, others resisting it, and many voices speculating wildly.
At Arcee AI, we’re not here to debate those reactions; we’re here to deliver something tangible.
Today, we’re excited to share two new small language models (SLMs) we’ve been working on that reflect our core commitment: turning top-tier research into reliable, effective AI tools for everyone. Today’s releases are both distilled versions of DeepSeek V3, and we’ll be sharing DeepSeek-R1 distillations coming very soon.
Virtuoso-Lite is a brand-new 10B-parameter model built on top of TII’s Falcon architecture. We took the following steps to bring Virtuoso-Lite to life:
The result is a streamlined 10B model that holds its ground against our original 32B Virtuoso-Medium. Virtuoso-Lite is released under the Apache-2.0 license, offering you the freedom to integrate it into your projects and workflows without restrictions.
We’re also releasing Virtuoso-Medium-v2, our 32B distillation of DeepSeek-V3, which has delivered our highest scores yet on public benchmarks.
In fact, in all tests measured, it surpasses our original Arcee-Nova 72B model released in 2024.
Like Virtuoso-Lite, Virtuoso-Medium-v2 is available under the Apache-2.0 license. Whether you’re developing retrieval pipelines, enterprise solutions, or generating data for new applications, this model is designed to provide reliable, high-quality outputs.
Unlike simple supervised fine-tuning, both of these models rely on a genuine logit-level distillation pipeline. We use a proprietary “fusion merging” technique combined with meticulous “tokenizer surgery” to achieve cross-architecture distillation. This careful approach is especially beneficial for math and code tasks, where model performance can suffer with more conventional methods.
With previous data pipelines limited to around 100M tokens from Llama-405B logits, we’ve now expanded to over 5B tokens from DeepSeek-v3. And that’s just the beginning: we’re already working on extracting billions more from DeepSeek-R1.
Below is a quick look at how Virtuoso-Lite and Virtuoso-Medium-v2 stack up across several well-known benchmarks:
You’ll see how Virtuoso-Lite (10B) compares favorably to other models in its class. In comparison, Virtuoso-Medium-v2 (32B) pushes performance closer to the top of the leaderboard, even challenging high-quality 70B models such as Meta’s. These results demonstrate that carefully orchestrated distillation can produce leaner models with high accuracy.
We plan to openly release two distillations of DeepSeek R1 into Qwen2.5-7B and Qwen2.5-14B (both without the 1M context length scale). Our 32B and 72B R1 distillations will also be available through Arcee AI’s Model Engine. If you’re looking for more power at scale, stay tuned—there’s more on the way.
Virtuoso-Lite and Virtuoso-Medium-v2 are both available on Hugging Face, and they’ll soon be on Arcee’s inference platform for easy integration and API access. We encourage the community to experiment, build, and push these models to their limits. With the Apache-2.0 license, you have the flexibility to apply these models in a broad range of commercial and non-commercial projects.
If you have any questions or would like to share your experience, reach out to us on X or Linkedin. If you’d like to understand how Arcee AI can help your organization build scalable and cost-efficient AI solutions, please get in touch at sales@arcee.ai or by booking a demo here. We can’t wait to see what you build with Virtuoso-Lite and Virtuoso-Medium-v2—and stay tuned for R1 distillations.👐