Back to Blog

Transforming the Future of AI with Long Context Jamba-Instruct

Yaniv Markovski
,
Head of Developer Relations
,
,
July 3, 2024
Blog posts

Our debut Jamba hackathon, in partnership with AGI House, brought together talented developers to explore use cases using Jamba’s 256K context window. Here are some highlights.

AI21 partnered with the AGI House last week to host our debut Jamba hackathon, bringing together over 100 Bay Area-developers to develop use cases using Jamba’s 256K context window. The energy at the venue was palpable, with attendees diving into intense coding sessions, collaborative discussions, and inspiring presentations over the course of a 12-hour in-person hack day.

Setting the Stage

The hackathon kicked off with insightful speaker sessions.

Mike Knoop, co-founder of Zapier and co-creator of the Arc Prize challenge, set the tone with an engaging opening talk about Artificial General Intelligence and its challenges. Or Dagan, AI21’s VP of Foundation Models, followed with an excellent overview of Jamba's development journey: why we decided to build using a hybrid architecture of Transformer + Mamba, and how we designed the model to optimize for long context use cases. The session concluded with a fantastic demo by Sebastian Leks, Principal Solution Architect at AI21, who showcased a term sheet generator built upon Jamba-Instruct and utilized Jamba’s long context.

After the presentations, the real action began. 

Participants formed groups and delved into hours of focused coding. The atmosphere was a blend of quiet concentration and dynamic collaboration as developers explored the possibilities of long context and other AI advancements. The AI21 team of solution architects and product leads were actively involved, answering questions about Jamba’s capabilities, highlighting features of how to build using the AI21 Studio developer platform, and engaging in deep technical discussions around how to get the most out of Jamba. For us, it was an incredible learning opportunity to hear from some of the brightest minds in the Bay Area’s AI community and better understand how developers are thinking about building with LLMs. 

The Winning Projects

As the night progressed, the excitement culminated in the announcement of the winners. Each project showcased innovative uses of AI, with long context being a key differentiator.

First Place:
FBI AGI – Real-Time Bad Actors Incrimination on Social Media by
Alex Sima and Apurva Mishra

The FBI AGI team stole the show with their groundbreaking project: FBI AGI. 

Their AI conversational agent, Emma, is designed to blend seamlessly into Discord servers, addressing the pressing issue of online harassment. Inspired by the quest for Artificial General Intelligence (AGI), the team created an AI system that learns about the environment and seemingly injects itself into conversations. Emma is designed to blend naturally, making its appearance in the channel, and conversations indistinguishable from those with real humans. Emma’s support system is sophisticated, one that requires many small components to do its job without being caught.

Emma's Capabilities

  • System Prompt: Defines Emma's identity as a 14-year-old girl on Discord. It also defines the rules of communications. For example, it tones down usual LLM use of punctuation that’s not suitable for social media chats. The system prompt also gathers information about the bad actors Emma is chatting with.
  • User Prompts: Continuously refine Emma's responses based on ongoing chats.
  • Server Agent: Initiates friend requests by responding to about 20% of server messages.
  • DM Agent: Engages in one-on-one conversations.
  • Information Agent: Gathers and identify important user characteristics from chats.

Jamba’s long context is extremely helpful on two separate occasions. First, when Emma is studying the conversations and before she injects itself into them, Emma needs as much context as she can get. The conversations can be long and rich. Secondly, the Information Agent captures data and metadata from the conversations between Emma and the bad actors.   

The demo phase revealed Emma's ability to connect with various chat channels, highlighting the project's potential to identify and incriminate predators. The team’s approach to moderation and verification ensures Emma's responses are up to standards, flagging harmful interactions for review. While tackling a disturbing problem, the FBI AGI team’s project was also fascinating to watch, especially when they showed the live demo and the examples, unfortunately, did not stop from coming.

Real World Impact

Policy enforcement agencies could use Emma to flag potential harassers and bad actors in real time. School districts or parents could get better handling of the environments their kids are participating in. Nevertheless, Emma is only one agent; FBI AGI’s goal is to be character agnostic and to have different agents with different characteristics deployed, ones that maximize their chances to be effective. After the hackathon I had a fascinating discussion with Apurva and Alex about how Emma or any other AI agent gets a mission, this time it was to find predators, next it can be finding bots or nation-state actors who push propaganda through social channels. Mind blowing.

Second Place:
Excel Mamba – AI Agent Task Force for Financial Spreadsheet Analysis by
Aniket Shirke, Bhavya Bahl, and Rohit Jena 

Excel Mamba impressed the judges with their AI-powered tool designed to revolutionize financial analysis. Recognizing the inefficiencies in manual financial analysis, the team aimed to automate spreadsheet analysis, saving time and reducing errors for analysts and investors. The breakthrough in using AI for this task was in understanding the Excel properties and data, to accurately translate analysis questions into formulas.

Key Features of Excel Mamba

  • Data Input: Users upload data from their portfolio management sheets.
  • Data Processing: The data is converted into JSON format.
  • AI Integration: AI agents generate necessary Excel formulas based on specified analysis.
  • Output: Structured outputs in the form of Excel formulas ready to be applied to the data.

During their demo, the team showcased the tool's ability to quickly understand the question, "What is the net P&L for each company in the spreadsheet?”, and generate insights into new columns. The demo was based on real financial data the team obtained. The process involved the AI understanding the semantic structure of Excel spreadsheet and providing computation snippets for user verification. Their approach ensures accuracy and reliability, addressing the common challenges faced by financial analysts.

Real World Impact

Financial analysts can use Excel Mamba to automate onerous, manual analysis tasks and free up time for more strategic projects.

Next Steps

Both winning teams demonstrated the transformative potential of AI in addressing real-world problems. While their projects may appear simple at first glance, they showcase innovative thinking and a deep understanding of artificial intelligence, key strengths that we have noticed throughout the day when we connected with the developer community at AGI House. We eagerly anticipate how these projects will evolve and continue to impact their fields.

The AGI House: A Hub for Innovation

The AGI House, our partner in this event, serves as a hub for developers and innovators. It fosters a collaborative environment where brilliant minds come together to push the boundaries of AI. This hackathon is just one example of the exciting work happening at AGI House, and we are thrilled to be part of this vibrant community.

Stay tuned for more exciting updates and join us in our next developer event!

If you build with AI, we would love to hear from you. Check out AI21's Overview Docs, API reference, try Jamba and our other products, or participate in discussions in our Discord server.

Discover more

What is a MRKL system?

In August 2021 we released Jurassic-1, a 178B-parameter autoregressive language model. We’re thankful for the reception it got – over 10,000 developers signed up, and hundreds of commercial applications are in various stages of development. Mega models such as Jurassic-1, GPT-3 and others are indeed amazing, and open up exciting opportunities. But these models are also inherently limited. They can’t access your company database, don’t have access to current information (for example, latest COVID numbers or dollar-euro exchange rate), can’t reason (for example, their arithmetic capabilities don’t come close to that of an HP calculator from the 1970s), and are prohibitively expensive to update.
A MRKL system such as Jurassic-X enjoys all the advantages of mega language models, with none of these disadvantages. Here’s how it works.

Compositive multi-expert problem: the list of “Green energy companies” is routed to Wiki API, “last month” dates are extracted from the calendar and “share prices” from the database. The “largest increase“ is computed by the calculator and finally, the answer is formatted by the language model.

There are of course many details and challenges in making all this work - training the discrete experts, smoothing the interface between them and the neural network, routing among the different modules, and more. To get a deeper sense for MRKL systems, how they fit in the technology landscape, and some of the technical challenges in implementing them, see our MRKL paper. For a deeper technical look at how to handle one of the implementation challenges, namely avoiding model explosion, see our paper on leveraging frozen mega LMs.

A further look at the advantages of Jurassic-X

Even without diving into technical details, it’s easy to get a sense for the advantages of Jurassic-X. Here are some of the capabilities it offers, and how these can be used for practical applications.

Reading and updating your database in free language

Language models are closed boxes which you can use, but not change. However, in many practical cases you would want to use the power of a language model to analyze information you possess - the supplies in your store, your company’s payroll, the grades in your school and more. Jurassic-X can connect to your databases so that you can ‘talk’ to your data to explore what you need-  “Find the cheapest Shampoo that has a rosy smell”, “Which computing stock increased the most in the last week?” and more. Furthermore, our system also enables joining several databases, and has the ability to update your database using free language (see figure below).

Jurassic-X enables you to plug in YOUR company's database (inventories, salary sheets, etc.) and extract information using free language

AI-assisted text generation on current affairs

Language models can generate text, yet can not be used to create text on current affairs, because their vast knowledge (historic dates, world leaders and more) represents the world as it was when they were trained. This is clearly (and somewhat embarrassingly) demonstrated when three of the world’s leading language models (including our own Jurassic-1) still claim Donald Trump is the US president more than a year after Joe Biden was sworn into office.
Jurassic-X solves this problem by simply plugging into resources such as Wikidata, providing it with continuous access to up-to-date knowledge. This opens up a new avenue for AI-assisted text generation on current affairs.

Who is the president of the United States?

T0
Donald Trump
GPT-3
Donald Trump
Jurassic-1
Donald Trump
Google
Joe Biden
Jurassic-X
Joe Biden is the
46th and current
president
Jurassic-X can assist in text generation on up-to-date events by combining a powerful language model with access to Wikidata

Performing math operations

A 6 year old child learns math from rules, not only by memorizing examples. In contrast, language models are designed to learn from examples, and consequently are able to solve very basic math like 1-, 2-, and possibly 3- digit addition, but struggle with anything more complex. With increased training time, better data and larger models, the performance will improve, but will not reach the robustness of an HP calculator from the 1970s. Jurassic-X takes a different approach and calls upon a calculator whenever a math problem is identified by the router. The problem can be phrased in natural language and is converted by the language model to the format required by the calculator (numbers and math operations). The computation is performed and the answer is converted back into free language.
Importantly (see example below) the process is made transparent to the user by revealing the computation performed, thus increasing the trust in the system. In contrast, language models provide answers which might seem reasonable, but are wrong, making them impractical to use.

The company had 655400 shares which they divided equally among 94 employees. How many did each employee get?

T0
94 employees.
GPT-3
Each employee got 7000 stocks
Jurassic-1
1.5
Google
(No answer provided)
Jurassic-X
6972.3
X= 655400/94
Jurassic-X can answer non-trivial math operations which are phrased in natural language, made possible by the combination of a language model and a calculator

Compositionality

Solving simple questions might require multiple steps, for example - “Do more people live in Tel Aviv or in Berlin?” requires answering: i. What is the population of Tel-Aviv? ii. What is the population of Berlin? iii. Which is larger? This is a highly non-trivial process for a language model, and language models fail to answer this question (see example). Moreover, the user can’t know the process leading to the answers, hence is unable to trust them. Jurassic-X can decompose such problems into the basic questions, route each to the relevant expert, and put together an answer in free language. Importantly, Jurassic-X not only provides the correct answer but also displays the steps taken to reach it, increasing the trust in the system.

Do more people live in Tel Aviv or in Berlin?

T0
Berlin
GPT-3
There are more people living in Tel Aviv than in Berlin.
Jurassic-1
Berlin and Tel Aviv are roughly the same size
Google
(First hit is a comparison between Tel Aviv and Berlin)
Jurassic-X
More people live in Berlin than in Tel-Aviv

[‘Return population of Tel Aviv’; Return population of Berlin’; Return which is bigger between #1 and #2’]
Step 1: Population of Tel Aviv. Result - 451523.
Step 1: Population of Berlin. Result - 3664088.
Step 3: Which is bigger,  #1 or #2. Result - Berlin.

Jurassic-X breaks down compositional questions, answers the basic sub-questions, and puts together the answer. Importantly, this process is transparent to the user greatly increasing the trust in the system

Dynamic information (like weather and currency exchange rates)

Certain types of information change continuously - weather, currency exchange rates, share values and more. Such information will never be captured by language models, yet can easily be handled by Jurassic-X by integrating it with a reliable source of information. We performed a proof-of-concept on two such features - weather and currency exchange rates, and the design enables quick integration with more sources to solve your use-case.
Weather - a loosely phrased question about the weather elicits an answer from all language models, where language models always return the same answer, regardless of when the question was asked (funny, right?), while Jurassic-X provides an answer based on the actual weather prediction.

I’m going to be in New-York in 3 days. Should I pack my umbrella?

T0
No
GPT-3
Yes, you should pack your umbrella.
Jurassic-1
Yes, you should. The weather forecast is rain.
Google
(Links to weather websites)
Jurassic-X
Yes, you should pack your umbrella, because in New York in 3 days there will be broken clouds and the temperature will be -2 degrees.

Currency exchange rates change much faster than weather predictions, yet the Jurassic-X concept - a language model connected to a reliable source of information - easily solves this problem as well.

How much Moroccan money will I get for 100 bucks?

T0
125 dirhams
GPT-3
100 moroccan dirhams is about 27$.
Jurassic-1
How much is 100 dollars in moroccan money?
Google
100 dirhams = 10.75 dollars
Jurassic-X
100 USD = 934.003 MAD
Jurassic-X combines a language model with access to APIs with continuously changing information. This is demonstrated for weather forecasts and currency exchange rates, and can easily be extended to other information sources

Transparency and trust

Transparency is a critical element that is lacking in language models, preventing a much wider adoption of these models. This lack of transparency is demonstrated by the answers to the question - “Was Clinton ever elected as president of the United States?”. The answer, of course, depends on which Clinton you have in mind, which is only made clear by Jurassic-X that has a component for disambiguation. More examples of Jurassic-X’s transparency were demonstrated above - displaying the math operation performed to the user, and the answer to the simple sub-questions in the multi-step setting.

Was Clinton ever elected president of the United States?

T0
Yes
GPT-3
No, Clinton was never elected as president of the United States.
Jurassic-1
No
Google
Clinton was elected president in the 1992 presidential elections…
Jurassic-X
Bill Clinton was elected president.
Jurassic-X is designed to be more transparent by displaying which expert answered which part of the question, and by presenting the intermediate steps taken and not just the black-box response

Your Turn

That's it, you get the picture. The use cases above give you a sense for some things you could do with Jurassic-X, but now it's your turn. A MRKL system such as Jurassic-X is as flexible as your imagination. What do you want to accomplish? Contact us for early access

Contact us below and we will get back to you shortly.

Thank you!

Your submission has been received!
Oops! Something went wrong while submitting the form.