Research
Here at Arcee AI, we're the pioneers of training performant and efficient LLMs with Model Merging... And now we bring you *yet another* cutting-edge technique that also dramatically optimizes your training and improves your models.
At Arcee AI, we're committed to pioneering innovative strategies to enhance the efficiency and efficacy of models trained on our platform. One significant innovation we have integrated into our processes is Spectrum, a method for optimizing the training of large language models (LLMs). Here, we discuss what Spectrum is, its operational mechanics, and how it has helped evolve our model training methodologies.
Spectrum is a novel training methodology designed to optimize the training process of LLMs by selectively training specific layers based on their signal-to-noise ratio (SNR). The core concept of Spectrum is straightforward yet highly effective. Instead of updating every layer of the model during training, Spectrum identifies and prioritizes the layers that contribute most significantly to performance improvements (high SNR), while the layers with low SNR remain frozen.
This targeted training approach offers several key advantages:
The Spectrum methodology can be broken down into the following steps:
At Arcee AI, we have seamlessly integrated Spectrum into our model training pipeline to optimize both the Continual Pre-Training (CPT) and Supervised Fine-Tuning phases. Here's how Spectrum has transformed our training process:
One of the most compelling demonstrations of Spectrum's capabilities at Arcee AI is its application in the Continual Pre-Training of massive models like Qwen2-72B and Llama-3-70B on a single H100 node.
Traditionally, training such large models on a single node would necessitate significant performance trade-offs. However, with Spectrum, we have achieved this feat while maintaining performance.
To quantify the impact of Spectrum, we conducted extensive evaluations across various metrics. Here are some highlights:
Performance Metrics: Despite the reduced training time and memory usage, models trained with Spectrum showed no significant degradation in performance. Some models demonstrated improved performance due to the targeted training of high-impact layers. Our Arcee-Spark and Arcee-Agent models were trained entirely within our platform, using Spectrum to optimize training.
In their paper, the authors behind Spectrum (including Lucas Atkins and Fernando Fernandes Neto from Arcee) compared Spectrum against QLoRA and full fine-tuning techniques. We have found that these findings extend to Continual Pre-Training (CPT).
Spectrum will remain a vital component of our toolkit, empowering us to deliver top-tier models that meet our clients' needs. This optimization has streamlined our current training processes and paved the way for future advancements, ensuring we remain at the cutting-edge of LLM training.