How Latitude scaled production of their gaming worlds while reducing costs
AI21 Labs managed to build enterprise-level production models which served players at Latitude to elevate the gaming experience while at the same time improving cost.
Latitude is a pioneer in the AI gaming space, offering powerful AI-generated digital gamescapes for its players.
Latitude is most famously known for AI Dungeon, a text-based adventure game that uses artificial intelligence to generate unique and unpredictable stories, depending on the choices that a player makes. In the past, computer adventure games were limited in terms of the amount of actions and scenarios they offered. By using Large Language Models, AI Dungeon grants players the flexibility to perform virtually any action and have the game respond to that action.
Latitude got good traction with its early implementation of AI Dungeon on top of GPT-3, but when planning their future roadmap, they quickly realized that they needed an LLM partner who shared their vision of immersive AI-driven experiences. They found an eager partner in AI21 Labs, who worked collaboratively with Latitude to discover roadblocks and brainstorm solutions. This resulted in improvements to issues such as latency, flexibility, and user satisfaction, to name just a few.
So, how did they do it?
AI21 Labs managed to build enterprise-level production models which served players at Latitude to elevate the gaming experience while at the same time improving cost efficiency. Latitude found AI21 Labs to be a collaborative LLM partner, which wasn’t just a plug-and-play solution to a scaling problem, but rather a trusted partner with whom they could collaborate on AI solutions.
But what were the exact problems preventing Latitude from realizing its vision? And how did AI21 Labs overcome them? Read on for the full story.
Who is Latitude?
Latitude was never envisioned as a gaming company - it actually started as a Hackathon project that translated into a Dungeons and Dragons plugin. The team decided to go to market with the plugin, and within the first week, they had roughly 100,000 customers which quickly grew to a million in just the first month.
The verdict was in: there was a creative use case for large language models for game-like experiences and writing.
Searching for a collaborative solution
Latitude wanted to offer a stronger game experience, with richer text and better AI responses. For this, they partnered with the legacy GPT-3 enterprise, OpenAI.
But with rapidly evolving technology came growing pains, and OpenAI was not aligned with Latitude's vision - thus, the visionary gaming experience became impossible. It became clear to Latitude that they needed to seek out a new tech partner who was interested in solving the use case of creating an AI-Driven adventure role play game..
Switching from OpenAI to AI21 Labs
Latitude booked a call with AI21 Labs when they’d just launched their Jurassic-1 Model. A Jurassic prototype was set up for Latitude, and they started experimenting with the features and the accuracy of results. The final model was perfect for Latitude’s AI Dungeon use case. Soon after that, the team declared it an alpha model for their audience.
In order to get to that stage, the teams had to overcome two massive roadblocks.
The first roadblock: creative restrictions
OpenAI’s policy of non-violence restricted integral elements of Latitude’s AI Dungeon adventures — quests, swords, and fights. It was impossible to have a role playing expedition without weapons.
“We were interested in a partner who understands the nuances of the creative use case.” says Ryan Seamons, VP of Product at Latitude.
Latitude prioritized finding a partner who was not only aligned philosophically but willing to become a true partner and collaborator in building a quality AI-driven role-play experience.
“We need to offer AI-assisted role-play at scale to thousands of daily players. We need models that are fast, reliable, and creative.” says Seamons.
The collaborative opportunity: agility
AI21 Labs’ team of developers, creators, and stakeholders were analytical problem-solvers who created reliable production models for Latitude.
A precise production model is critical for a gaming company like Latitude. It defines the customer experience and determines whether users will come back. So, it needs to be hyper-accurate and reliable. To achieve this, there has to be excellent communication during prototyping.
Latitude and AI21 Labs had a system for rapid back-and-forth between teams and numerous collaborative checkpoints. As a result, they got to the production stage quickly.
Even after launching the model, the AI21 Labs team worked tirelessly to ship updates, create new features, and build on feedback.
“We’ve experimented with other tools, but they have been demanding and daunting with direct orders and no problem-solving. But not AI21 (Labs). They’re enthusiastic, collaborative, and innovative — a true gaming visionary.” says Seamons.
The second problem: exponential costs
Latitude offers AI-assisted role-play at scale to thousands of daily players.
Not only did Latitude want to offer existing models to new players, they also wanted to create more immersive and realistic worlds.
They wanted an LLM partner who offered customizable pricing because gaming models are extremely expensive to run at scale.
“AI costs are one of our most significant ongoing expenses, and so finding ways to reduce our AI costs help us to operate sustainably as a business.”
The collaborative opportunity: cost-effectiveness through optimization
AI21 Labs put in hours of skilled talent to make the large language model efficient enough to be cost-effective. They helped Latitude transition from Jurassic-1 Jumbo to Jurassic-1 Grande to scale performance at the right cost.
Jurassic-1 Jumbo, AI21’s biggest model, processes 178 billion parameters to generate an output. But after running multiple experiments, their team concluded that Latitude’s game did not need this many parameters. They switched from Jumbo to Grande (which uses 17B parameters), and the result was the perfect sweet spot for a compelling experience at a sustainable price with faster response time. The cost savings are great, but the faster response times are also something players can benefit from. The lower costs also let Latitude experiment with additional ways to improve output quality, such as Hydra mode which generates multiple responses and selects the best one for a player.
Being able to experiment with new data sets more quickly and affordably also yielded positive results and will be a major part of Latitude’s AI strategy this year.
“The biggest benefit of working with AI21 (Labs) is for agile companies who want more hands-on attention in service. What I’ve been most impressed with is the proactiveness in optimizing costs for multiple use cases. It’s been a breath of fresh air compared to competitors.” says Seamons.
The final results
Latitude discovered that the Grande model struck the right balance of cost vs. performance, all while generating higher quality results than small models such as GPT-J or GPT-3 Curie.
By using an ensemble model approach called Hydra, Latitude was able to achieve quality similar to that of larger models.
At every stage, the team at AI21 Labs devised innovative solutions to roadblocks and proactively and continuously improved cost efficiency.
Whether through large-scale prototyping or improving production after release, Latitude found AI21 Labs to be a creative, efficient, and reliable partner on every level.
ABOUT THE AUTHOR
Stay up to date with the latest research and updates from AI21 Labs.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
How Latitude scaled production of their gaming worlds while reducing costs
Two years ago, Latitude wanted to scale its imaginative AI use cases and the number of players. Latitude found a collaborative LLM partner in AI21 Labs. AI21 Labs wasn’t only a plug-and-play solution to a scaling problem, they became a specialized partner who proactively detected gaming roadblocks and brainstormed solution prototypes. AI21 managed to build enterprise-level production models for individual players at Latitude to elevate the gaming experience while at the same time improving cost efficiency for Latitude.
In August 2021 we released Jurassic-1, a 178B-parameter autoregressive language model. We’re thankful for the reception it got – over 10,000 developers signed up, and hundreds of commercial applications are in various stages of development. Mega models such as Jurassic-1, GPT-3 and others are indeed amazing, and open up exciting opportunities. But these models are also inherently limited. They can’t access your company database, don’t have access to current information (for example, latest COVID numbers or dollar-euro exchange rate), can’t reason (for example, their arithmetic capabilities don’t come close to that of an HP calculator from the 1970s), and are prohibitively expensive to update. A MRKL system such as Jurassic-X enjoys all the advantages of mega language models, with none of these disadvantages. Here’s how it works.
Compositive multi-expert problem: the list of “Green energy companies” is routed to Wiki API, “last month” dates are extracted from the calendar and “share prices” from the database. The “largest increase“ is computed by the calculator and finally, the answer is formatted by the language model.
There are of course many details and challenges in making all this work - training the discrete experts, smoothing the interface between them and the neural network, routing among the different modules, and more. To get a deeper sense for MRKL systems, how they fit in the technology landscape, and some of the technical challenges in implementing them, see our MRKL paper. For a deeper technical look at how to handle one of the implementation challenges, namely avoiding model explosion, see our paper on leveraging frozen mega LMs.
A further look at the advantages of Jurassic-X
Even without diving into technical details, it’s easy to get a sense for the advantages of Jurassic-X. Here are some of the capabilities it offers, and how these can be used for practical applications.
Reading and updating your database in free language
Language models are closed boxes which you can use, but not change. However, in many practical cases you would want to use the power of a language model to analyze information you possess - the supplies in your store, your company’s payroll, the grades in your school and more. Jurassic-X can connect to your databases so that you can ‘talk’ to your data to explore what you need- “Find the cheapest Shampoo that has a rosy smell”, “Which computing stock increased the most in the last week?” and more. Furthermore, our system also enables joining several databases, and has the ability to update your database using free language (see figure below).
Jurassic-X enables you to plug in YOUR company's database (inventories, salary sheets, etc.) and extract information using free language
AI-assisted text generation on current affairs
Language models can generate text, yet can not be used to create text on current affairs, because their vast knowledge (historic dates, world leaders and more) represents the world as it was when they were trained. This is clearly (and somewhat embarrassingly) demonstrated when three of the world’s leading language models (including our own Jurassic-1) still claim Donald Trump is the US president more than a year after Joe Biden was sworn into office. Jurassic-X solves this problem by simply plugging into resources such as Wikidata, providing it with continuous access to up-to-date knowledge. This opens up a new avenue for AI-assisted text generation on current affairs.
Who is the president of the United States?
Joe Biden is the 46th and current president
Jurassic-X can assist in text generation on up-to-date events by combining a powerful language model with access to Wikidata
Performing math operations
A 6 year old child learns math from rules, not only by memorizing examples. In contrast, language models are designed to learn from examples, and consequently are able to solve very basic math like 1-, 2-, and possibly 3- digit addition, but struggle with anything more complex. With increased training time, better data and larger models, the performance will improve, but will not reach the robustness of an HP calculator from the 1970s. Jurassic-X takes a different approach and calls upon a calculator whenever a math problem is identified by the router. The problem can be phrased in natural language and is converted by the language model to the format required by the calculator (numbers and math operations). The computation is performed and the answer is converted back into free language. Importantly (see example below) the process is made transparent to the user by revealing the computation performed, thus increasing the trust in the system. In contrast, language models provide answers which might seem reasonable, but are wrong, making them impractical to use.
The company had 655400 shares which they divided equally among 94 employees. How many did each employee get?
Each employee got 7000 stocks
(No answer provided)
6972.3 X= 655400/94
Jurassic-X can answer non-trivial math operations which are phrased in natural language, made possible by the combination of a language model and a calculator
Solving simple questions might require multiple steps, for example - “Do more people live in Tel Aviv or in Berlin?” requires answering: i. What is the population of Tel-Aviv? ii. What is the population of Berlin? iii. Which is larger? This is a highly non-trivial process for a language model, and language models fail to answer this question (see example). Moreover, the user can’t know the process leading to the answers, hence is unable to trust them. Jurassic-X can decompose such problems into the basic questions, route each to the relevant expert, and put together an answer in free language. Importantly, Jurassic-X not only provides the correct answer but also displays the steps taken to reach it, increasing the trust in the system.
Do more people live in Tel Aviv or in Berlin?
There are more people living in Tel Aviv than in Berlin.
Berlin and Tel Aviv are roughly the same size
(First hit is a comparison between Tel Aviv and Berlin)
More people live in Berlin than in Tel-Aviv
[‘Return population of Tel Aviv’; Return population of Berlin’; Return which is bigger between #1 and #2’] Step 1: Population of Tel Aviv. Result - 451523. Step 1: Population of Berlin. Result - 3664088. Step 3: Which is bigger, #1 or #2. Result - Berlin.
Jurassic-X breaks down compositional questions, answers the basic sub-questions, and puts together the answer. Importantly, this process is transparent to the user greatly increasing the trust in the system
Dynamic information (like weather and currency exchange rates)
Certain types of information change continuously - weather, currency exchange rates, share values and more. Such information will never be captured by language models, yet can easily be handled by Jurassic-X by integrating it with a reliable source of information. We performed a proof-of-concept on two such features - weather and currency exchange rates, and the design enables quick integration with more sources to solve your use-case. Weather - a loosely phrased question about the weather elicits an answer from all language models, where language models always return the same answer, regardless of when the question was asked (funny, right?), while Jurassic-X provides an answer based on the actual weather prediction.
I’m going to be in New-York in 3 days. Should I pack my umbrella?
Yes, you should pack your umbrella.
Yes, you should. The weather forecast is rain.
(Links to weather websites)
Yes, you should pack your umbrella, because in New York in 3 days there will be broken clouds and the temperature will be -2 degrees.
Currency exchange rates change much faster than weather predictions, yet the Jurassic-X concept - a language model connected to a reliable source of information - easily solves this problem as well.
How much Moroccan money will I get for 100 bucks?
100 moroccan dirhams is about 27$.
How much is 100 dollars in moroccan money?
100 dirhams = 10.75 dollars
100 USD = 934.003 MAD
Jurassic-X combines a language model with access to APIs with continuously changing information. This is demonstrated for weather forecasts and currency exchange rates, and can easily be extended to other information sources
Transparency and trust
Transparency is a critical element that is lacking in language models, preventing a much wider adoption of these models. This lack of transparency is demonstrated by the answers to the question - “Was Clinton ever elected as president of the United States?”. The answer, of course, depends on which Clinton you have in mind, which is only made clear by Jurassic-X that has a component for disambiguation. More examples of Jurassic-X’s transparency were demonstrated above - displaying the math operation performed to the user, and the answer to the simple sub-questions in the multi-step setting.
Was Clinton ever elected president of the United States?
No, Clinton was never elected as president of the United States.
Clinton was elected president in the 1992 presidential elections…
Bill Clinton was elected president.
Jurassic-X is designed to be more transparent by displaying which expert answered which part of the question, and by presenting the intermediate steps taken and not just the black-box response
That's it, you get the picture. The use cases above give you a sense for some things you could do with Jurassic-X, but now it's your turn. A MRKL system such as Jurassic-X is as flexible as your imagination. What do you want to accomplish? Contact us for early access
Your submission has been received!
Oops! Something went wrong while submitting the form.