Simplifying Our Jurassic-2 Offering

Three months ago, we announced an exciting milestone for AI21 Labs – the launch of Jurassic-2 (J2), our next generation foundation models. These models include instruct capabilities, allowing them to be steered with natural language instruction, also known as zero-shot instruction-following.

‍What we’ve learned:

We’ve spent the last three months gathering user feedback, and as always, are constantly on the lookout for new ways to improve our technology, as well as ease of use for our customers.

The most common issue we’ve found that our users face is deciding which language model they need for their specific use case.

First, having five different foundation models made it difficult for users to know which model to choose from. We offered both base and instruct versions of the same models, in order to provide maximum flexibility, but instead we found it caused confusion.

Second, the names of the models, Large, Grande and Jumbo, all describe what a Large Language Model is, which as the name implies is ‘large’. However, our users needed an easier way to differentiate the models by their relative sizes and capabilities.

‍Today, we’d like to change that.

We are excited to announce that we are making some adjustments to our Jurassic-2 offering based on our learnings, in order to make the decision making process for our users more simple and intuitive.

1. Narrowing it down to just three models

We are now offering three foundation models instead of the original five, and all of them include instruct capabilities, allowing for zero-shot prompting as well as few-shot prompting. According to tests we conducted on Stanford’s Holistic Evaluation of Language Models (HELM) and various few-shot datasets, our instruct models performed as well, or better, than our non-instruct models, for both zero-shot and few-shot prompting, allowing us to offer both prompt types within one model.

2. New model names

The new names are intended to help users easily understand the relative magnitude and attributes of each model.

Our new sizes, in ascending order are: Light, Mid and Ultra, replacing: Large, Grande and Jumbo (respectively).

Ultra: Jurassic-2 Ultra is our largest and most powerful foundation model for complex language generation tasks, producing the highest quality for any language comprehension or generation task. According to our internal evaluations from HELM, the leading benchmark for language models, Jurassic-2 Ultra scores a win-rate of 86.8%, solidifying it as a leader in the LLM space. This is also the most costly language model with the highest latency but most capable of carrying out complex generation and comprehension tasks.
Mid: Jurassic-2 Mid is our mid-sized model that is carefully designed to strike the right balance between exceptional quality and affordability. It lets you easily scale any language comprehension or generation task such as question answering, summarization, copy generation, advanced information extraction and many others.
Light: Jurassic-2 Light is our smallest, fastest and most cost efficient LLM. This model is ideal for simple tasks such as keyword extraction, sentence classification, named entity recognition (NER), short-form copy generation, sentiment analysis, and keyword extraction.

The diagram below shows an overview of the tradeoff between size, cost and latency of each model.

By streamlining our model offering, we hope our users can hit the ground running faster. Jurassic-2 Ultra, Mid and Light are continuously undergoing improvements as we learn more, so stay tuned!

Note: AI21 Studio users are not required to take any immediate action in response to these changes. Click here to learn more about updates in the API, including automatic rerouting.

Table of Contents

Simplifying Our Jurassic-2 Offering

‍What we’ve learned:

1. Narrowing it down to just three models

2. New model names

Products

Developers

Company

Resources

Trust Center

Table of Contents

‍What we’ve learned:

1. Narrowing it down to just three models

2. New model names

Discover more

LLM Judge Models: The AI Critics You Never Knew You Needed

RAG Evaluation: You’re Doing It Wrong

AI21 Joins NVIDIA Inception for Enterprise AI

Subscribe to our newsletter