Back to Blog

2024 Enterprise AI Forecast: From Pilot to Production

,
,
,
January 9, 2024
No items found.

2024 will be a pivotal year as enterprises shift their focus from generative AI experimentation to practical implementation. There will be growing pains, but those who navigate them successfully will gain a real competitive advantage. 

After a year of heavy experimentation and inflated expectations, 2024 will be the year enterprises move generative AI from pilots and prototypes into full-scale production systems. This shift from exploration to implementation will force enterprises to tackle thorny challenges around delivering ROI, managing costs, and ensuring responsible AI practices.

At AI21, we foresaw these implementation challenges on the horizon. Rather than get caught up in the hype, we've been diligently building the solutions enterprises need to successfully deploy AI across their organizations. Our suite of practical, optimized models provide the impetus enterprises need to drive business value from AI. And our task-specific models ensure new capabilities translate quickly into real-world impact. 

The hype around generative AI's potential has collided with the practical realities of deploying these systems at scale. As our leadership predicts below, some companies may opt for open-source models and in-house development, while many will seek turnkey solutions that target specific business needs right out of the box. While we’ve seen successful use cases emerge, taking them into production requires overcoming hurdles with unit economics, compliance, privacy, and security.

Overall, 2024 will be a pivotal year as enterprises shift their focus from generative AI experimentation to practical implementation. There will be growing pains, but those who navigate them successfully will gain a real competitive advantage. 

Here’s our predictions for the upcoming year:

Ori Goshen, CEO & Co-Founder

From massive experimentation to production deployments

In 2024, enterprises will increasingly prioritize production-grade AI solutions with a focus on Business Value and Total Cost of Ownership. Focus is going to shift from flashy demos to reliable solutions in production. Robust evaluation frameworks and built-in verification mechanisms will increase overall quality and decrease hallucinations.

From LLMs to AI Systems

LLMs are necessary but not sufficient. They will never be as reliable as the enterprise demands without a system around them. 2024 will be the year where enterprises build and adopt AI systems that go beyond LLMs with retrievers (RAG), tools and other elements.

The beginning of the Agentic era

We'll see systems that are capable of continuous learning, reasoning, decision-making and handling complex tasks. These agent systems will be deeply embedded within our workflows and will empower the workforce in surprising ways.

Shanen Boettcher, Chief AI Policy Officer

From exploration to deployment

2024 will be the year of deployment for AI in enterprises. 2023 was the year of exploration and evaluation. This year, companies have crafted their budgets with AI in mind. For many companies this will look like supercharging productivity of their knowledge workers with reading and writing assistants and selecting an important line-of-business application where they want to tailor a language model to their corporate knowledge base, brand, tone and manner. This could be a key customer-facing application in sales, marketing and/or customer service. The important difference about 2024 is that these technologies will leave the realm of trials and pilots and become part of the everyday workflow for employees and customers.

The pivot to becoming AI suppliers

In 2024 companies will make important decisions about the role AI will play in their business strategies. Specifically, they will be considering if they will be a consumer of AI technology and/or a supplier of AI for other companies. Most companies have developed and maintain proprietary data and best practices that are material to their company strategy. Many of these companies are interested in training language models with these company assets to create scale and efficiency internally. There is also an opportunity for a new business line for these companies to offer their customized language models as a SaaS product in the marketplace.

For example, a financial services firm that has developed leading market analysis capabilities and a supporting knowledge base can choose to train a language model with those abilities and use it as an internal tool available for their financial advisors. In addition, they could decide to offer the customized model as a SaaS service to other companies in the financial services industry - becoming a supplier of AI services. This opens a new high margin and scalable business line as a technology provider. These companies can build significant competitive differentiation by being early to market because accuracy and utility grows geometrically with LLM usage, data and feedback.

Pankaj Dugar, SVP, GM North America

AI investments require tangible ROI

Enterprises will become increasingly vigilant on which POCs to conduct and will demand line-of-sight to business value before committing money and resources on purely gut feel.

Understanding the true cost of enterprise AI

Enterprises will start to start to think much more actively about production use cases (vs POCs), and with it they will start to dig deep into the Total Cost of Ownership (TCO) of LLMs in production. By TCO, I mean not just the cost paid to LLM provider, but the resourcing needs to deploy and maintain the efficacy of the LLMs in production.

Leeor Moses-Voronov, VP BD & Alliances

Focus on production instead of experimentation

With an estimated investment of $2-3 billion, 2023 was a year of GenAI hype and exploration. 2024 will be a year where AI will be adopted on a larger scale in production. GenAI will be experienced by employees and users more regularly through various interfaces, such as banking, eCommerce websites, insurance inquiries, and many more, accelerating adoption in the market.

Focus on developing an in-house experience

GenAI demand will shift from GenAI-enabled products, which have been favored by the market in 2023, toward deeper and more customizable tech, like LLM with agentic capabilities. This will result in two complementary motions: (1) Companies will develop their own internal skills to build tailored GenAI use cases and establish a unique approach to GenAI adoption; (2) agentic capabilities will mature, making it more suitable for organizations to develop their own agents instead of relying on external applications.

Dan Padnos, VP Platform

The hard truth about AI ROI

Enterprises will see that getting ROI on generative AI is harder than it seems. In particular, it requires more than doing a quick evaluation of model capabilities, picking the best one and giving their developers some API credits. There will be a shift towards end-to-end solutions that target specific workflows and tasks.

Buy vs Build

Some enterprises will still want to use open source models and develop a lot of tech in-house on top of open source. For most enterprises, it will be too hard to support this in-house development at a pace that keeps up with innovation happening in the market, so buyers will dominate while builders remain the minority.

Enterprise deployment challenges when scaling

Winning use-cases will emerge and enterprises will want to take them to production at scale. At this point they'll encounter challenges around unit economics, compliance, privacy and security. 

If you're experimenting and ready to move your AI pilots into production, speak with our experts today about scaling AI for real business impact.

Discover more

What is a MRKL system?

In August 2021 we released Jurassic-1, a 178B-parameter autoregressive language model. We’re thankful for the reception it got – over 10,000 developers signed up, and hundreds of commercial applications are in various stages of development. Mega models such as Jurassic-1, GPT-3 and others are indeed amazing, and open up exciting opportunities. But these models are also inherently limited. They can’t access your company database, don’t have access to current information (for example, latest COVID numbers or dollar-euro exchange rate), can’t reason (for example, their arithmetic capabilities don’t come close to that of an HP calculator from the 1970s), and are prohibitively expensive to update.
A MRKL system such as Jurassic-X enjoys all the advantages of mega language models, with none of these disadvantages. Here’s how it works.

Compositive multi-expert problem: the list of “Green energy companies” is routed to Wiki API, “last month” dates are extracted from the calendar and “share prices” from the database. The “largest increase“ is computed by the calculator and finally, the answer is formatted by the language model.

There are of course many details and challenges in making all this work - training the discrete experts, smoothing the interface between them and the neural network, routing among the different modules, and more. To get a deeper sense for MRKL systems, how they fit in the technology landscape, and some of the technical challenges in implementing them, see our MRKL paper. For a deeper technical look at how to handle one of the implementation challenges, namely avoiding model explosion, see our paper on leveraging frozen mega LMs.

A further look at the advantages of Jurassic-X

Even without diving into technical details, it’s easy to get a sense for the advantages of Jurassic-X. Here are some of the capabilities it offers, and how these can be used for practical applications.

Reading and updating your database in free language

Language models are closed boxes which you can use, but not change. However, in many practical cases you would want to use the power of a language model to analyze information you possess - the supplies in your store, your company’s payroll, the grades in your school and more. Jurassic-X can connect to your databases so that you can ‘talk’ to your data to explore what you need-  “Find the cheapest Shampoo that has a rosy smell”, “Which computing stock increased the most in the last week?” and more. Furthermore, our system also enables joining several databases, and has the ability to update your database using free language (see figure below).

Jurassic-X enables you to plug in YOUR company's database (inventories, salary sheets, etc.) and extract information using free language

AI-assisted text generation on current affairs

Language models can generate text, yet can not be used to create text on current affairs, because their vast knowledge (historic dates, world leaders and more) represents the world as it was when they were trained. This is clearly (and somewhat embarrassingly) demonstrated when three of the world’s leading language models (including our own Jurassic-1) still claim Donald Trump is the US president more than a year after Joe Biden was sworn into office.
Jurassic-X solves this problem by simply plugging into resources such as Wikidata, providing it with continuous access to up-to-date knowledge. This opens up a new avenue for AI-assisted text generation on current affairs.

Who is the president of the United States?

T0
Donald Trump
GPT-3
Donald Trump
Jurassic-1
Donald Trump
Google
Joe Biden
Jurassic-X
Joe Biden is the
46th and current
president
Jurassic-X can assist in text generation on up-to-date events by combining a powerful language model with access to Wikidata

Performing math operations

A 6 year old child learns math from rules, not only by memorizing examples. In contrast, language models are designed to learn from examples, and consequently are able to solve very basic math like 1-, 2-, and possibly 3- digit addition, but struggle with anything more complex. With increased training time, better data and larger models, the performance will improve, but will not reach the robustness of an HP calculator from the 1970s. Jurassic-X takes a different approach and calls upon a calculator whenever a math problem is identified by the router. The problem can be phrased in natural language and is converted by the language model to the format required by the calculator (numbers and math operations). The computation is performed and the answer is converted back into free language.
Importantly (see example below) the process is made transparent to the user by revealing the computation performed, thus increasing the trust in the system. In contrast, language models provide answers which might seem reasonable, but are wrong, making them impractical to use.

The company had 655400 shares which they divided equally among 94 employees. How many did each employee get?

T0
94 employees.
GPT-3
Each employee got 7000 stocks
Jurassic-1
1.5
Google
(No answer provided)
Jurassic-X
6972.3
X= 655400/94
Jurassic-X can answer non-trivial math operations which are phrased in natural language, made possible by the combination of a language model and a calculator

Compositionality

Solving simple questions might require multiple steps, for example - “Do more people live in Tel Aviv or in Berlin?” requires answering: i. What is the population of Tel-Aviv? ii. What is the population of Berlin? iii. Which is larger? This is a highly non-trivial process for a language model, and language models fail to answer this question (see example). Moreover, the user can’t know the process leading to the answers, hence is unable to trust them. Jurassic-X can decompose such problems into the basic questions, route each to the relevant expert, and put together an answer in free language. Importantly, Jurassic-X not only provides the correct answer but also displays the steps taken to reach it, increasing the trust in the system.

Do more people live in Tel Aviv or in Berlin?

T0
Berlin
GPT-3
There are more people living in Tel Aviv than in Berlin.
Jurassic-1
Berlin and Tel Aviv are roughly the same size
Google
(First hit is a comparison between Tel Aviv and Berlin)
Jurassic-X
More people live in Berlin than in Tel-Aviv

[‘Return population of Tel Aviv’; Return population of Berlin’; Return which is bigger between #1 and #2’]
Step 1: Population of Tel Aviv. Result - 451523.
Step 1: Population of Berlin. Result - 3664088.
Step 3: Which is bigger,  #1 or #2. Result - Berlin.

Jurassic-X breaks down compositional questions, answers the basic sub-questions, and puts together the answer. Importantly, this process is transparent to the user greatly increasing the trust in the system

Dynamic information (like weather and currency exchange rates)

Certain types of information change continuously - weather, currency exchange rates, share values and more. Such information will never be captured by language models, yet can easily be handled by Jurassic-X by integrating it with a reliable source of information. We performed a proof-of-concept on two such features - weather and currency exchange rates, and the design enables quick integration with more sources to solve your use-case.
Weather - a loosely phrased question about the weather elicits an answer from all language models, where language models always return the same answer, regardless of when the question was asked (funny, right?), while Jurassic-X provides an answer based on the actual weather prediction.

I’m going to be in New-York in 3 days. Should I pack my umbrella?

T0
No
GPT-3
Yes, you should pack your umbrella.
Jurassic-1
Yes, you should. The weather forecast is rain.
Google
(Links to weather websites)
Jurassic-X
Yes, you should pack your umbrella, because in New York in 3 days there will be broken clouds and the temperature will be -2 degrees.

Currency exchange rates change much faster than weather predictions, yet the Jurassic-X concept - a language model connected to a reliable source of information - easily solves this problem as well.

How much Moroccan money will I get for 100 bucks?

T0
125 dirhams
GPT-3
100 moroccan dirhams is about 27$.
Jurassic-1
How much is 100 dollars in moroccan money?
Google
100 dirhams = 10.75 dollars
Jurassic-X
100 USD = 934.003 MAD
Jurassic-X combines a language model with access to APIs with continuously changing information. This is demonstrated for weather forecasts and currency exchange rates, and can easily be extended to other information sources

Transparency and trust

Transparency is a critical element that is lacking in language models, preventing a much wider adoption of these models. This lack of transparency is demonstrated by the answers to the question - “Was Clinton ever elected as president of the United States?”. The answer, of course, depends on which Clinton you have in mind, which is only made clear by Jurassic-X that has a component for disambiguation. More examples of Jurassic-X’s transparency were demonstrated above - displaying the math operation performed to the user, and the answer to the simple sub-questions in the multi-step setting.

Was Clinton ever elected president of the United States?

T0
Yes
GPT-3
No, Clinton was never elected as president of the United States.
Jurassic-1
No
Google
Clinton was elected president in the 1992 presidential elections…
Jurassic-X
Bill Clinton was elected president.
Jurassic-X is designed to be more transparent by displaying which expert answered which part of the question, and by presenting the intermediate steps taken and not just the black-box response

Your Turn

That's it, you get the picture. The use cases above give you a sense for some things you could do with Jurassic-X, but now it's your turn. A MRKL system such as Jurassic-X is as flexible as your imagination. What do you want to accomplish? Contact us for early access

Contact us below and we will get back to you shortly.

Thank you!

Your submission has been received!
Oops! Something went wrong while submitting the form.