Return to blog

Community

03
Jan
2025
-
6
min read

Arcee AI & Intel Gaudi2: Training a Financial Insights Model Using Open Source Models

Arcee AI and Intel Gaudi2 make for a powerful combination when it comes to advancing financial insights via LLMs. Learn how the Arcee AI team used Intel's Habana Gaudi2 technology to train two advanced models with 10 billion tokens of financial data, leading to nuanced insights for analysts, investors, and other stakeholders.

Tyler Odenthal
,
Mark McQuade
,

Why Gaudi2, Llama 3.0, and Qwen2?

Gaudi2’s purpose-built hardware excels at deep learning tasks, particularly when handling models that require immense processing power. Whether training with Llama 3.0 (known for its robustness and efficiency) or Qwen2 (a model optimized for diverse data patterns), Gaudi2 accelerates training, optimizes memory management, and enables efficient scaling–even when managing billions of parameters.

By supporting both models, Gaudi2 provided Arcee AI with the flexibility to build powerful language models tailored for financial insights.



Intel Reference Documentation

To train your models using Intel Gaudi2 accelerators like we did, you can leverage the optimized deep learning frameworks and tools designed for the Gaudi2 architecture. These accelerators are tailored to enhance training performance for various machine learning workloads.

To get started, please refer to the official Intel Gaudi documentation, which provides detailed instructions on setting up your environment using Docker containers.

If you’re interested in training Llama 3.0 or Qwen2, the following tutorial is confirmed to work for those  models.

Setting Up the Docker Container

To set up the Docker container which we will use to execute our training, you can run the command below (it's  configured to target the Gaudi2 1.17.1 images, which is compatible for the various trains in this tutorial).

Setting Up the Training Environment

To set up the environment, we started by cloning the optimum-habana repository, which includes Habana-specific implementations for Hugging Face’s transformers library. This repository simplifies deploying Llama and other large models on Gaudi2, with scripts tailored for different tasks.

Open Source Training Datasets

The Arcee AI team compiled a list of open source financial datasets which can be found below. These datasets were mixed with Arcee AI’s SEC datasets to create our final training set.

Configuring Gaudi-Specific Settings

Gaudi2 requires specific configurations to maximize its performance. We used these key parameters:

  • PT_HPU_MAX_COMPOUND_OP_SIZE=10: Manages operation sizes to enhance memory efficiency.
  • DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1: Optimizes deep learning workloads by synchronizing ZeroRedundancyOptimizer steps.
  • -use_lazy_mode: Reduces the computational load by deferring evaluations until they’re necessary.

Training Execution Command - Llama 3.0 Model

Using the following command, we launched the training script with essential parameters to train Llama 3.0. This command initializes Gaudi2 with Deepspeed Zero-3, gradient checkpointing, and HPU graphs for efficient inference.

Key Model Training Parameters

  • Model: We used Llama 3.0 (8B parameters), which is ideal for generating accurate responses in specialized domains.
  • Data: The Arcee AI SEC Datatrove dataset provided a rich collection of SEC filings and financial data, perfect for training a model on financial and investment-focused language.
  • Batch Size: Adjusted for Gaudi2’s memory capabilities, we set batch sizes to 8 and 4 for training and evaluation, respectively.
  • Gradient Checkpointing: Enabled to reduce memory footprint, allowing larger batch sizes and more complex model architectures.
  • Deepspeed Zero-3: Critical for memory efficiency in distributed environments, enabling us to fully leverage Gaudi2’s capabilities across multiple nodes.

Training Execution - Llama 3.0 Model - 8B - 8x Gaudi2

Inference Execution - Llama 3.0 Model - 8B - Single Guadi2

Running an 8B parameter model on a single Gaudi2 instance produces about 49 tokens per second and is comparable to a g6e.2xlarge on AWS, but at about half the price–proving that the Gaudi2 is competitive for running small models in the 8B parameter range.

Python Execution Command

Training Execution Command - Qwen2 - 7B

Using the following command, we launched the training script with essential parameters to train Qwen2. This command initializes Gaudi2 with DeepSpeed Zero-3, gradient checkpointing, and HPU graphs for efficient inference, optimizing performance for large-scale financial model training.

Key Model Training Parameters

  • Model: We used Qwen2 (7B parameters), an efficient and versatile model known for its ability to handle diverse language tasks, including specialized financial datasets.
  • Data: We used the Arcee AI SEC Datatrove dataset, which is rich with SEC filings and financial data, making itideal for fine-tuning the model to deliver insights in the financial and investment domain.
  • Batch Size: Optimized for Gaudi2’s memory capabilities, with batch sizes set to 8 for training and 4 for evaluation to ensure efficient use of resources.
  • Gradient Checkpointing: Enabled to reduce memory usage, allowing the model to handle larger batch sizes and more complex computations during training.
  • DeepSpeed Zero-3: Crucial for memory efficiency in distributed training, DeepSpeed Zero-3 allowed us to fully leverage Gaudi2’s performance across multiple nodes for effective large-scale training.

Inference Execution - Qwen2 - 7B - Single Guadi2

Running a 7B parameter model on a single Gaudi2 instance produces about 49 tokens per second and is comparable to a g6e.2xlarge on AWS, but at about half the price. Just like with the Llama training, this again proves that the Gaudi2 is competitive for running small language models (SLMs) in the 7B parameter range.

Python Execution Command

Results and Outcomes

After training with this setup, Arcee AI successfully built a model capable of delivering insights about financial documents, answering questions, and providing explanations tailored for both novice and advanced users. The model demonstrated strong accuracy in comprehending and summarizing financial statements, which can dramatically enhance accessibility to financial literacy for a variety of stakeholders.


Final Thoughts: With Gaudi2, Scale Your Training Without Compromises

Using Gaudi2 allowed us to scale our model training without sacrificing speed or efficiency. This project highlights the synergy between advanced hardware, optimized configurations, and high-quality data–underscoring how powerful these models can be in the finance domain. As we continue to push the boundaries of language models and guide our clients on their AI journey, our partnership with the Intel and  Gaudi2/3 teams will become even more impactful and far-reaching.


In the quest to unlock powerful financial insights through large language models (LLMs), Arcee AI leveraged Habana Gaudi2 technology to train two advanced LLMs: Llama 3.0 and Qwen2. Using an impressive 10 billion tokens of financial and SEC data, the team developed models primed for financial literacy and investment insights.

This article details the steps and configurations we used to harness Gaudi2’s capabilities for large-scale model training, offering an overview for anyone interested in training state-of-the-art models.

Give Arcee a Try

Lorem ipsum dolor sit amet consectetur. Vitae enim libero lectus urna blandit sapien. In egestas ac dolor dictum.
Book a Demo

Sign up for the Arcee AI newsletter

Subscribe to get the latest news and insights on SLM-powered AI agents

Thank you!

We will get back
to you soon.
Oops! Something went wrong while submitting the form.