Arcee AI & Intel Gaudi2: Training a Financial Insights Model Using Open Source Models

‍

Why Gaudi2, Llama 3.0, and Qwen2?

Gaudi2’s purpose-built hardware excels at deep learning tasks, particularly when handling models that require immense processing power. Whether training with Llama 3.0 (known for its robustness and efficiency) or Qwen2 (a model optimized for diverse data patterns), Gaudi2 accelerates training, optimizes memory management, and enables efficient scaling–even when managing billions of parameters.

By supporting both models, Gaudi2 provided Arcee AI with the flexibility to build powerful language models tailored for financial insights.

‍

Intel Reference Documentation

To train your models using Intel Gaudi2 accelerators like we did, you can leverage the optimized deep learning frameworks and tools designed for the Gaudi2 architecture. These accelerators are tailored to enhance training performance for various machine learning workloads.

To get started, please refer to the official Intel Gaudi documentation, which provides detailed instructions on setting up your environment using Docker containers.

If you’re interested in training Llama 3.0 or Qwen2, the following tutorial is confirmed to work for those models.

‍

Setting Up the Docker Container

To set up the Docker container which we will use to execute our training, you can run the command below (it's configured to target the Gaudi2 1.17.1 images, which is compatible for the various trains in this tutorial).
‍

‍

Setting Up the Training Environment

To set up the environment, we started by cloning the optimum-habana repository, which includes Habana-specific implementations for Hugging Face’s transformers library. This repository simplifies deploying Llama and other large models on Gaudi2, with scripts tailored for different tasks.

‍

`‍`Open Source Training Datasets

The Arcee AI team compiled a list of open source financial datasets which can be found below. These datasets were mixed with Arcee AI’s SEC datasets to create our final training set.

‍

Configuring Gaudi-Specific Settings

Gaudi2 requires specific configurations to maximize its performance. We used these key parameters:

PT_HPU_MAX_COMPOUND_OP_SIZE=10: Manages operation sizes to enhance memory efficiency.
DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED=1: Optimizes deep learning workloads by synchronizing ZeroRedundancyOptimizer steps.
-use_lazy_mode: Reduces the computational load by deferring evaluations until they’re necessary.

‍

Training Execution Command - Llama 3.0 Model

Using the following command, we launched the training script with essential parameters to train Llama 3.0. This command initializes Gaudi2 with Deepspeed Zero-3, gradient checkpointing, and HPU graphs for efficient inference.

‍

Key Model Training Parameters

Model: We used Llama 3.0 (8B parameters), which is ideal for generating accurate responses in specialized domains.
Data: The Arcee AI SEC Datatrove dataset provided a rich collection of SEC filings and financial data, perfect for training a model on financial and investment-focused language.
Batch Size: Adjusted for Gaudi2’s memory capabilities, we set batch sizes to 8 and 4 for training and evaluation, respectively.
Gradient Checkpointing: Enabled to reduce memory footprint, allowing larger batch sizes and more complex model architectures.
Deepspeed Zero-3: Critical for memory efficiency in distributed environments, enabling us to fully leverage Gaudi2’s capabilities across multiple nodes.
‍

Training Execution - Llama 3.0 Model - 8B - 8x Gaudi2

Inference Execution - Llama 3.0 Model - 8B - Single Guadi2

Running an 8B parameter model on a single Gaudi2 instance produces about 49 tokens per second and is comparable to a g6e.2xlarge on AWS, but at about half the price–proving that the Gaudi2 is competitive for running small models in the 8B parameter range.

‍

Python Execution Command

Training Execution Command - Qwen2 - 7B

Using the following command, we launched the training script with essential parameters to train Qwen2. This command initializes Gaudi2 with DeepSpeed Zero-3, gradient checkpointing, and HPU graphs for efficient inference, optimizing performance for large-scale financial model training.

‍

Key Model Training Parameters

Model: We used Qwen2 (7B parameters), an efficient and versatile model known for its ability to handle diverse language tasks, including specialized financial datasets.
Data: We used the Arcee AI SEC Datatrove dataset, which is rich with SEC filings and financial data, making itideal for fine-tuning the model to deliver insights in the financial and investment domain.‍
Batch Size: Optimized for Gaudi2’s memory capabilities, with batch sizes set to 8 for training and 4 for evaluation to ensure efficient use of resources.‍
Gradient Checkpointing: Enabled to reduce memory usage, allowing the model to handle larger batch sizes and more complex computations during training.‍
DeepSpeed Zero-3: Crucial for memory efficiency in distributed training, DeepSpeed Zero-3 allowed us to fully leverage Gaudi2’s performance across multiple nodes for effective large-scale training.

‍

Inference Execution - Qwen2 - 7B - Single Guadi2

Running a 7B parameter model on a single Gaudi2 instance produces about 49 tokens per second and is comparable to a g6e.2xlarge on AWS, but at about half the price. Just like with the Llama training, this again proves that the Gaudi2 is competitive for running small language models (SLMs) in the 7B parameter range.

‍

Python Execution Command

‍

Results and Outcomes

After training with this setup, Arcee AI successfully built a model capable of delivering insights about financial documents, answering questions, and providing explanations tailored for both novice and advanced users. The model demonstrated strong accuracy in comprehending and summarizing financial statements, which can dramatically enhance accessibility to financial literacy for a variety of stakeholders.

‍

Final Thoughts: With Gaudi2, Scale Your Training Without Compromises

Using Gaudi2 allowed us to scale our model training without sacrificing speed or efficiency. This project highlights the synergy between advanced hardware, optimized configurations, and high-quality data–underscoring how powerful these models can be in the finance domain. As we continue to push the boundaries of language models and guide our clients on their AI journey, our partnership with the Intel and Gaudi2/3 teams will become even more impactful and far-reaching.

In the quest to unlock powerful financial insights through large language models (LLMs), Arcee AI leveraged Habana Gaudi2 technology to train two advanced LLMs: Llama 3.0 and Qwen2. Using an impressive 10 billion tokens of financial and SEC data, the team developed models primed for financial literacy and investment insights.

This article details the steps and configurations we used to harness Gaudi2’s capabilities for large-scale model training, offering an overview for anyone interested in training state-of-the-art models.

Arcee AI & Intel Gaudi2: Training a Financial Insights Model Using Open Source Models

Why Gaudi2, Llama 3.0, and Qwen2?

‍

Intel Reference Documentation

Setting Up the Docker Container

Setting Up the Training Environment

`‍`Open Source Training Datasets

Configuring Gaudi-Specific Settings

Training Execution Command - Llama 3.0 Model

Key Model Training Parameters

Training Execution - Llama 3.0 Model - 8B - 8x Gaudi2

Inference Execution - Llama 3.0 Model - 8B - Single Guadi2

Python Execution Command

Training Execution Command - Qwen2 - 7B

Key Model Training Parameters

Inference Execution - Qwen2 - 7B - Single Guadi2

Python Execution Command

Results and Outcomes

Final Thoughts: With Gaudi2, Scale Your Training Without Compromises

Give Arcee a Try

Related Posts

Sign up for the Arcee AI newsletter

Products

Community

Company

Resources

Arcee AI & Intel Gaudi2: Training a Financial Insights Model Using Open Source Models

Why Gaudi2, Llama 3.0, and Qwen2?

‍Intel Reference Documentation

Setting Up the Docker Container

Setting Up the Training Environment

‍Open Source Training Datasets

Configuring Gaudi-Specific Settings

Training Execution Command - Llama 3.0 Model

Key Model Training Parameters

Training Execution - Llama 3.0 Model - 8B - 8x Gaudi2

Inference Execution - Llama 3.0 Model - 8B - Single Guadi2

Python Execution Command

Training Execution Command - Qwen2 - 7B

Key Model Training Parameters

Inference Execution - Qwen2 - 7B - Single Guadi2

Python Execution Command

Results and Outcomes

Final Thoughts: With Gaudi2, Scale Your Training Without Compromises

Give Arcee a Try

Related Posts

Arcee AI celebrates seed round & merger with mergekit

Introducing SuperNova-Medius: Arcee AI's 14B Small Language Model That Rivals a 70B

Arcee Orchestra: Elevating Enterprise Automation with Agentic AI Workflows

Sign up for the Arcee AI newsletter

Products

Community

Company

Resources

‍

Intel Reference Documentation

`‍`Open Source Training Datasets