Announcing Jurassic-2 and Task-Specific APIs

Announcing the launch of Jurassic-2, the latest generation of AI21 Studio’s foundation models, a game-changer in the field of AI, with top-tier quality and new capabilities. And that's not all - we're also releasing our task-specific APIs, with plug-and-play reading and writing capabilities that outperform competitors.

Our focus at AI21 Studio is to help developers and businesses leverage reading and writing AI to build real-world products with tangible value. Today marks two important milestones with the release of Jurassic-2 and Task-Specific APIs, empowering you to bring generative AI to production.

Jurassic-2 (or J2, as we like to call it) is the next generation of our foundation models with significant improvements in quality and new capabilities including zero-shot instruction-following, reduced latency, and multi-language support.

Task-specific APIs provide developers with industry-leading APIs that perform specialized reading and writing tasks out-of-the box.

Read on for an in-depth look at each.

Jurassic-2

We’re proud to present our brand new family of state-of-the-art Large Language Models. J2 not only improves upon Jurassic-1 (our previous generation models) in every aspect, but it also offers new features and capabilities that put it in a league of its own.

The Jurassic-2 family includes base language models in three different sizes: Large, Grande and Jumbo, alongside instruction-tuned language models for Jumbo and Grande.

‍

Jurassic is already making waves on Stanford’s Holistic Evaluation of Language Models (HELM), the leading benchmark for language models. Currently, J2 Jumbo ranks second (and climbing) according to an evaluation we conducted using HELM’s official repository. No less important, our mid-sized model (Grande) ranks significantly higher than models up to 30x larger in size, enabling users to optimize production costs and speed without needing to sacrifice quality.

What's new compared to Jurassic-1?

Improved quality

With cutting-edge pre-training methods combined with the latest data (current up to mid-2022), J2’s Jumbo model has scored an 86.8% win-rate on HELM by our internal evaluations, solidifying it as a top-tier option in the LLM space.

Instruct capabilities

J2’s best-in-class models offer zero-shot instruction capabilities, allowing them to be steered with natural language without the use of examples. J2’s Jumbo and Grande models have been adapted to include these capabilities. Here's an example:

Multilingual support

J2 supports several non-English languages, including Spanish, French, German, Portuguese, Italian and Dutch.

Performance

In terms of latency, J2’s models can perform up to 30% faster than our previous models.

Take it for a spin

Jurassic-2 will be available for free until May 1st, 2023. In addition, Jurassic-2 and Jurassic-1 models are now offered under our new reduced and simplified pricing model, based on the total length of text (input + output).

All Jurassic-2 models are now available for you on our playground and API. To help you get started, we’ve collected some tips and tricks for working with the new Instruct models here.

‍

Task-Specific APIs

Today, AI21 Labs is also proud to announce our new line of Task-Specific APIs, with the launch of the Wordtune API set, giving developers access to the language models behind our massively popular consumer-facing reading and writing apps.

‍Why do we need Task-Specific APIs?

General Large Language Models are incredibly powerful, and many of our customers have successfully customized them to power their applications. However, we’ve also seen that certain use-cases recur frequently among many users.

By providing developers with task-specific APIs, they can leap over much of the needed model training and fine-tuning stages, allowing them to take full advantage of our ready-made best-in-class language processing solutions.

‍Wordtune and Wordtune Read both use cutting-edge AI to assist users with writing and reading tasks – all while saving time and improving performance. With the release of Wordtune API, we’re giving developers access to the AI engine behind this award-winning line of applications, allowing them to take full advantage of Wordtune’s capabilities and integrate them into their own apps:

Paraphrase - Reword texts to fit any tone, length, or meaning.
Summarize - Condense lengthy texts into easy-to-read bite-sized summaries.
Grammatical Error Correction (GEC) - Catch and fix grammatical errors and typos on the fly.
Text Improvements - Get recommendations to increase text fluency, enhance vocabulary, and improve clarity.
Text Segmentation - Break down long pieces of text into paragraphs segmented by distinct topic.

Outperforming the Competition

When it comes to paraphrasing and summarizing capabilities, Wordtune API is truly a best-in-class performer.

Summarize API

Faithfulness rates measure how factually consistent a summary is with the original text. As you can see below, our new Summarize API has reached a faithfulness rate that outperforms OpenAI’s Davinci-003 by 19%.

Acceptance rates measure how satisfied human evaluators are with the quality of generated summaries, and we’re proud to say that our Summarize API has achieved an acceptance rate that is 18% higher than that of OpenAI’s.

Paraphrase API

Our Paraphrase API’s latency is approximately a 1/3 of OpenAI’s.

Our Paraphrase API outperforms OpenAI both in terms of diversity of results (33%) as well as meaning preservation (8%).

QQP benchmark:

STS-B benchmark:

The new releases of the Jurassic models and Task-Specific APIs both demonstrate our commitment to providing cutting-edge technology that enables our customers to build better language processing applications with ease, and deploy them into production in minutes.

ABOUT THE AUTHOR

Enjoyed this?

Stay up to date with the latest research and updates from AI21 Labs.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Back to Blog

Announcements

Announcing Jurassic-2 and Task-Specific APIs

MRKL Whitepaper

Paper: Standing on the Shoulders of Giant Language Models

What is a MRKL system?

In August 2021 we released Jurassic-1, a 178B-parameter autoregressive language model. We’re thankful for the reception it got – over 10,000 developers signed up, and hundreds of commercial applications are in various stages of development. Mega models such as Jurassic-1, GPT-3 and others are indeed amazing, and open up exciting opportunities. But these models are also inherently limited. They can’t access your company database, don’t have access to current information (for example, latest COVID numbers or dollar-euro exchange rate), can’t reason (for example, their arithmetic capabilities don’t come close to that of an HP calculator from the 1970s), and are prohibitively expensive to update.
A MRKL system such as Jurassic-X enjoys all the advantages of mega language models, with none of these disadvantages. Here’s how it works.

Compositive multi-expert problem: the list of “Green energy companies” is routed to Wiki API, “last month” dates are extracted from the calendar and “share prices” from the database. The “largest increase“ is computed by the calculator and finally, the answer is formatted by the language model.

There are of course many details and challenges in making all this work - training the discrete experts, smoothing the interface between them and the neural network, routing among the different modules, and more. To get a deeper sense for MRKL systems, how they fit in the technology landscape, and some of the technical challenges in implementing them, see our MRKL paper. For a deeper technical look at how to handle one of the implementation challenges, namely avoiding model explosion, see our paper on leveraging frozen mega LMs.

A further look at the advantages of Jurassic-X

Even without diving into technical details, it’s easy to get a sense for the advantages of Jurassic-X. Here are some of the capabilities it offers, and how these can be used for practical applications.

Reading and updating your databases in free language
AI-assisted content generation on current affairs
Performing simple and complex math operations.
Decomposing multi-step problems
Access to continuously changing information (weather, currency exchange rates)
Transparency and trust

Reading and updating your database in free language

Language models are closed boxes which you can use, but not change. However, in many practical cases you would want to use the power of a language model to analyze information you possess - the supplies in your store, your company’s payroll, the grades in your school and more. Jurassic-X can connect to your databases so that you can ‘talk’ to your data to explore what you need- “Find the cheapest Shampoo that has a rosy smell”, “Which computing stock increased the most in the last week?” and more. Furthermore, our system also enables joining several databases, and has the ability to update your database using free language (see figure below).

Jurassic-X enables you to plug in YOUR company's database (inventories, salary sheets, etc.) and extract information using free language

AI-assisted text generation on current affairs

Language models can generate text, yet can not be used to create text on current affairs, because their vast knowledge (historic dates, world leaders and more) represents the world as it was when they were trained. This is clearly (and somewhat embarrassingly) demonstrated when three of the world’s leading language models (including our own Jurassic-1) still claim Donald Trump is the US president more than a year after Joe Biden was sworn into office.
Jurassic-X solves this problem by simply plugging into resources such as Wikidata, providing it with continuous access to up-to-date knowledge. This opens up a new avenue for AI-assisted text generation on current affairs.

Who is the president of the United States?

Donald Trump

GPT-3

Donald Trump

Jurassic-1

Donald Trump

Google

Joe Biden

Jurassic-X

Joe Biden is the
46th and current
president

Jurassic-X can assist in text generation on up-to-date events by combining a powerful language model with access to Wikidata

Performing math operations

A 6 year old child learns math from rules, not only by memorizing examples. In contrast, language models are designed to learn from examples, and consequently are able to solve very basic math like 1-, 2-, and possibly 3- digit addition, but struggle with anything more complex. With increased training time, better data and larger models, the performance will improve, but will not reach the robustness of an HP calculator from the 1970s. Jurassic-X takes a different approach and calls upon a calculator whenever a math problem is identified by the router. The problem can be phrased in natural language and is converted by the language model to the format required by the calculator (numbers and math operations). The computation is performed and the answer is converted back into free language.
Importantly (see example below) the process is made transparent to the user by revealing the computation performed, thus increasing the trust in the system. In contrast, language models provide answers which might seem reasonable, but are wrong, making them impractical to use.

The company had 655400 shares which they divided equally among 94 employees. How many did each employee get?

94 employees.

GPT-3

Each employee got 7000 stocks

Jurassic-1

1.5

Google

(No answer provided)

Jurassic-X

6972.3
X= 655400/94

Jurassic-X can answer non-trivial math operations which are phrased in natural language, made possible by the combination of a language model and a calculator

Compositionality

Solving simple questions might require multiple steps, for example - “Do more people live in Tel Aviv or in Berlin?” requires answering: i. What is the population of Tel-Aviv? ii. What is the population of Berlin? iii. Which is larger? This is a highly non-trivial process for a language model, and language models fail to answer this question (see example). Moreover, the user can’t know the process leading to the answers, hence is unable to trust them. Jurassic-X can decompose such problems into the basic questions, route each to the relevant expert, and put together an answer in free language. Importantly, Jurassic-X not only provides the correct answer but also displays the steps taken to reach it, increasing the trust in the system.

Do more people live in Tel Aviv or in Berlin?

Berlin

GPT-3

There are more people living in Tel Aviv than in Berlin.

Jurassic-1

Berlin and Tel Aviv are roughly the same size

Google

(First hit is a comparison between Tel Aviv and Berlin)

Jurassic-X

More people live in Berlin than in Tel-Aviv

[‘Return population of Tel Aviv’; Return population of Berlin’; Return which is bigger between #1 and #2’]
Step 1: Population of Tel Aviv. Result - 451523.
Step 1: Population of Berlin. Result - 3664088.
Step 3: Which is bigger, #1 or #2. Result - Berlin.

Jurassic-X breaks down compositional questions, answers the basic sub-questions, and puts together the answer. Importantly, this process is transparent to the user greatly increasing the trust in the system

Dynamic information (like weather and currency exchange rates)

Certain types of information change continuously - weather, currency exchange rates, share values and more. Such information will never be captured by language models, yet can easily be handled by Jurassic-X by integrating it with a reliable source of information. We performed a proof-of-concept on two such features - weather and currency exchange rates, and the design enables quick integration with more sources to solve your use-case.
Weather - a loosely phrased question about the weather elicits an answer from all language models, where language models always return the same answer, regardless of when the question was asked (funny, right?), while Jurassic-X provides an answer based on the actual weather prediction.

I’m going to be in New-York in 3 days. Should I pack my umbrella?

GPT-3

Yes, you should pack your umbrella.

Jurassic-1

Yes, you should. The weather forecast is rain.

Google

(Links to weather websites)

Jurassic-X

Yes, you should pack your umbrella, because in New York in 3 days there will be broken clouds and the temperature will be -2 degrees.

Currency exchange rates change much faster than weather predictions, yet the Jurassic-X concept - a language model connected to a reliable source of information - easily solves this problem as well.

How much Moroccan money will I get for 100 bucks?

125 dirhams

GPT-3

100 moroccan dirhams is about 27$.

Jurassic-1

How much is 100 dollars in moroccan money?

Google

100 dirhams = 10.75 dollars

Jurassic-X

100 USD = 934.003 MAD

Jurassic-X combines a language model with access to APIs with continuously changing information. This is demonstrated for weather forecasts and currency exchange rates, and can easily be extended to other information sources

Transparency and trust

Transparency is a critical element that is lacking in language models, preventing a much wider adoption of these models. This lack of transparency is demonstrated by the answers to the question - “Was Clinton ever elected as president of the United States?”. The answer, of course, depends on which Clinton you have in mind, which is only made clear by Jurassic-X that has a component for disambiguation. More examples of Jurassic-X’s transparency were demonstrated above - displaying the math operation performed to the user, and the answer to the simple sub-questions in the multi-step setting.

Was Clinton ever elected president of the United States?

Yes

GPT-3

No, Clinton was never elected as president of the United States.

Jurassic-1

Google

Clinton was elected president in the 1992 presidential elections…

Jurassic-X

Bill Clinton was elected president.

Jurassic-X is designed to be more transparent by displaying which expert answered which part of the question, and by presenting the intermediate steps taken and not just the black-box response

Announcing Jurassic-2 and Task-Specific APIs