Try our new intelligent model routing solution, Arcee Conductor. Sign up today and get a $200 credit (~400M free tokens).

Return to blog

Product

17
Mar
2025
-
6
min read

Work with the Right AI Model, Every Time: Introducing Intelligent Model Routing with Arcee Conductor

Our intelligent model routing platform, Arcee Conductor, transforms your AI experience by ensuring you always get superior results at the lowest possible cost – and always tailored to each unique prompt or query.

Mark McQuade
,
Fernando Fernandes Neto
,
Mary MacCarthy
,

You’re a coder who’s a power user of Claude at work AND as a hobbyist. Or you’re a data analyst who has incorporated GPT-4o so thoroughly into your workflows that on some days it feels like an extension of your own brain. Or perhaps you’re a marketer who would like to switch from one LLM to another depending on the task – but you stick to just one, because in your current configuration switching models is extremely time-consuming or even impossible.

Regardless of your reasons for using AI, or the habits you’ve developed as you use AI models, there’s one thing we all have in common: we work without any real certainty that the model we’re using is the ideal model in terms of getting the most precise answer at the best cost. LLMs represent an incredible addition to how we work, and for many of us, they have transformed how we approach our key tasks. Yet the vast majority of us work without knowing if the model that we’re using is actually the best model for our purposes, and without any sense of whether we’re paying too much.

Even if we’ve mastered how to expertly prompt our model of choice, we’re aware that there might be a model that could offer a better response, and maybe even at a lower price. Let’s call it sort of an underlying FOMO – a nagging feeling that another AI model might do this better. But what can we do? Subscribe to ten different AI products, and ask ten different models the same question? We all know that’s neither practical nor cost-efficient.

From SLM Pioneers to Model Routing Innovators

We made our name as the pioneers of SLMs, and over the past year we have delivered dozens of small (72B or less) but powerful SLMs – models whose performance rivals and sometimes beats that of the leading LLMs, at a significantly lower price point.

We’ve delivered these models to our customers and also shared many of them with the open-source community (they’ve racked up more than 570K downloads on Hugging Face). The customers who use our SLMs have seen their AI costs diminish, while getting the same accuracy and nuance they had become accustomed to with leading closed-source LLMs.

Feedback from customers and from the community made us realize that what most AI users want is not necessarily an LLM, and sometimes – we had to admit – not even one of our SLMs. Rather, they want choice. They wish they could choose between different models depending on the task, the team, and other requirements that often evolve from one project to the next.

For some use cases, or for some teams, using an Arcee AI SLM is always the best solution: users know they’ll get the best answers, and that compute costs will be much lower versus using a third-party LLM. But for certain departments, or certain use cases, it’s imperative to also have access to a third-party LLM.

We became aware of this desire for access to multiple models, and we took it a step further – building a platform that offers not just easy access to multiple models, but that also provides an intelligent router that evaluates which of the available SLMs and LLMs is best-suited for each of your queries.

The Technology Behind Arcee Conductor's AI Model Routing

Using our renowned model training pipeline and innovative frameworks that include HuggingGPT, ModernBert, and Mixture of Domain Expert Models (MoDEM), we built an ultra-lightweight (150M parameter), low-latency AI-based routing model that can deeply understand your input.

This routing model evaluates each query sent by the user and sends it to the appropriate model, which answers the question accurately while consuming the least amount of resources (cost + tokens).

The router consists of several sub-components which include: 

  • Domain classification system
  • Task recognition
  • Complexity prediction
  • Language detection
  • Tool/Function calling.

We trained (where required) and evaluated each component against state-of-the-art datasets and benchmarks. The table below shows some of the metrics for some of the key sub-components. 


For key benchmarks, we know the ground truth answer – and we compare that to 1) the  answer we get from  the model selected by our router, 2) the answer we get from Claude Sonnet 3.5. Our results show that routing queries to SLMs – which theoretically are less powerful models but that have highly-specialized skills and are significantly cheaper – generally provides the same accuracy level as Sonnet.   

The Business Case for Intelligent AI Model Routing

Since the advent of easily-accessible generative AI in late 2022, businesses who’ve adopted the technology have been incurring unnecessary costs by defaulting to the large-scale, premium models – even for tasks that smaller, specialized models could handle more efficiently.

Arcee Conductor puts an end to this wasteful spending. Users of models like Sonnet and GPT-4o can expect to immediately see savings of about 65%, with the cost-per-million-tokens reduced by $11 versus a single-model approach.

When you also consider that Conductor is easy to implement (via the Conductor UI or API), making the switch becomes a no-brainer.

Read more about how Arcee Conductor can impact your business and bottom line here.



Experience Intelligent Model Routing Today – On Us

Arcee Conductor is more than just intelligent routing – it represents a paradigm shift in AI usage, emphasizing smarter, more strategic interactions with the language models we’ve grown to love and depend on.

Get started with Arcee Conductor today and we’ll cover your first 400 million in token usage. Once you’re in, play around with the “Compare” feature to judge for yourself how well the routing model evaluates your queries. Most importantly, send us your feedback!

You can get in touch with us on X or Linkedin, or drop us a note directly in the Support window in Conductor. We want to hear what you like, what we can improve, and what features you would like us to add. Happy prompting!


{{tips}}

Give Arcee a Try

Lorem ipsum dolor sit amet consectetur. Vitae enim libero lectus urna blandit sapien. In egestas ac dolor dictum.
Book a Demo

How We Used Arcee Conductor to Optimize this Blog

We used Conductor to do an SEO optimization of this blog that you’re reading right now. (SEO optimization is a common task for Marketing professionals – one that can be very time-consuming and that, prior to generative AI, required a Marketer with advanced SEO knowledge).

Conductor routed the query to the Arcee AI SLM called “Blitz,” providing the response (a full SEO-optimized blog) in 11.72 seconds. As you can see below, Conductor provides lots of other information about our query:
• the number of input tokens
• the number of output tokens
• the cost
• the task type, domain, and level of complexity as determined by our routing model.

It’s hard to get a sense of the cost without having anything to compare it to, so we used the “Compare” feature in Conductor that lets you submit the same prompt to different models. I sent the same prompt as above, this time directing Conductor to have both Claude and Blitz process it. Both models provided similar quality output, but the cost difference was absolutely staggering: the Claude cost was 212 times higher!

Of course, cost isn’t everything. What about quality? The primary difference between the output of the two models was formatting, with Claude providing a very nice format with formalized titles, bullet points, etc. For me as a Marketer, that’s not a difference that it’s worth paying 212x more for!

Sign up for the Arcee AI newsletter

Subscribe to get the latest news and insights on SLM-powered AI agents

Thank you!

We will get back
to you soon.
Oops! Something went wrong while submitting the form.