Community
Arcee AI and Intel Gaudi2 make for a powerful combination when it comes to advancing financial insights via LLMs. Learn how the Arcee AI team used Intel's Habana Gaudi2 technology to train two advanced models with 10 billion tokens of financial data, leading to nuanced insights for analysts, investors, and other stakeholders.
Gaudi2’s purpose-built hardware excels at deep learning tasks, particularly when handling models that require immense processing power. Whether training with Llama 3.0 (known for its robustness and efficiency) or Qwen2 (a model optimized for diverse data patterns), Gaudi2 accelerates training, optimizes memory management, and enables efficient scaling–even when managing billions of parameters.
By supporting both models, Gaudi2 provided Arcee AI with the flexibility to build powerful language models tailored for financial insights.
To train your models using Intel Gaudi2 accelerators like we did, you can leverage the optimized deep learning frameworks and tools designed for the Gaudi2 architecture. These accelerators are tailored to enhance training performance for various machine learning workloads.
To get started, please refer to the official Intel Gaudi documentation, which provides detailed instructions on setting up your environment using Docker containers.
If you’re interested in training Llama 3.0 or Qwen2, the following tutorial is confirmed to work for those models.
To set up the Docker container which we will use to execute our training, you can run the command below (it's configured to target the Gaudi2 1.17.1 images, which is compatible for the various trains in this tutorial).
To set up the environment, we started by cloning the optimum-habana repository, which includes Habana-specific implementations for Hugging Face’s transformers library. This repository simplifies deploying Llama and other large models on Gaudi2, with scripts tailored for different tasks.
Open Source Training DatasetsThe Arcee AI team compiled a list of open source financial datasets which can be found below. These datasets were mixed with Arcee AI’s SEC datasets to create our final training set.
Gaudi2 requires specific configurations to maximize its performance. We used these key parameters:
Using the following command, we launched the training script with essential parameters to train Llama 3.0. This command initializes Gaudi2 with Deepspeed Zero-3, gradient checkpointing, and HPU graphs for efficient inference.
Running an 8B parameter model on a single Gaudi2 instance produces about 49 tokens per second and is comparable to a g6e.2xlarge
on AWS, but at about half the price–proving that the Gaudi2 is competitive for running small models in the 8B parameter range.
Using the following command, we launched the training script with essential parameters to train Qwen2. This command initializes Gaudi2 with DeepSpeed Zero-3, gradient checkpointing, and HPU graphs for efficient inference, optimizing performance for large-scale financial model training.
Running a 7B parameter model on a single Gaudi2 instance produces about 49 tokens per second and is comparable to a g6e.2xlarge
on AWS, but at about half the price. Just like with the Llama training, this again proves that the Gaudi2 is competitive for running small language models (SLMs) in the 7B parameter range.
After training with this setup, Arcee AI successfully built a model capable of delivering insights about financial documents, answering questions, and providing explanations tailored for both novice and advanced users. The model demonstrated strong accuracy in comprehending and summarizing financial statements, which can dramatically enhance accessibility to financial literacy for a variety of stakeholders.
Using Gaudi2 allowed us to scale our model training without sacrificing speed or efficiency. This project highlights the synergy between advanced hardware, optimized configurations, and high-quality data–underscoring how powerful these models can be in the finance domain. As we continue to push the boundaries of language models and guide our clients on their AI journey, our partnership with the Intel and Gaudi2/3 teams will become even more impactful and far-reaching.
In the quest to unlock powerful financial insights through large language models (LLMs), Arcee AI leveraged Habana Gaudi2 technology to train two advanced LLMs: Llama 3.0 and Qwen2. Using an impressive 10 billion tokens of financial and SEC data, the team developed models primed for financial literacy and investment insights.
This article details the steps and configurations we used to harness Gaudi2’s capabilities for large-scale model training, offering an overview for anyone interested in training state-of-the-art models.