Return to blog

Open-Source Toolkits

22
Jul
2024
-
1
min read

Arcee-AI Releases Two Open Datasets

Lucas Atkins
,

Today, we have made two important datasets publicly available:

  1. Agent Data: This dataset was instrumental in training Arcee-Agent. It contains Salesforce-xlam, agent-flan, and a custom version of Glaive-FC2 with 20k extended samples that call for the model to do tool use sequentially within the same response, along with Magpie-Pro for the sake of maintaining general capabilities and to prevent catastrophic forgetting.
  2. Tome Dataset: Used for training both Arcee-Spark and Arcee-Nova, this dataset comprises 1.75 million samples of highly filtered data for use in training generalist AI assistance.

These releases align with our commitment to transparency and collaborative advancement in AI research. By making these datasets accessible, we aim to facilitate further developments.

Researchers and developers interested in exploring these datasets can access them here.

We encourage the community to utilize these resources responsibly and look forward to seeing the innovative applications and insights that may arise from this data.

Give Arcee a Try

Lorem ipsum dolor sit amet consectetur. Vitae enim libero lectus urna blandit sapien. In egestas ac dolor dictum.
Book a Demo

Sign up for the Arcee AI newsletter

Subscribe to get the latest news and insights on SLM-powered AI agents

Thank you!

We will get back
to you soon.
Oops! Something went wrong while submitting the form.