Back to Blog

Enterprise GenAI: Definition, Challenges, and Solutions

April 3, 2024
No items found.

Learn about the concerns that organizations are facing today as AI adoption increases, and how enterprise AI is evolving to meet them.

Already one year old, mainstream AI is this decade’s major disruptor: whether it’s the International Monetary Fund predicting that AI will affect nearly 40% of all jobs, or Meta incurring significant controversy by building a fully open-source general intelligence model. While 2024 promises to be the year of AI adoption for those pushing for a competitive edge, the sheer variety of use cases can make general AI difficult to visualize and control - not to mention more prone to error. 

This article will outline some of the concerns facing organizations today, and how enterprise AI is evolving to meet them.

How Small-Scale Models Are Defining Enterprise AI 

Enterprise AI is the focused application of artificial intelligence to improve efficiency, enhance performance, and drive innovation across teams. In the backdrop of general-purpose AI’s large-scale models, enterprise AI is increasingly pulling ahead in favor of hyper-focused, task-specific models (TSMs). 

General-purpose AI, in this context, refers to the mainstream large-scale models that have been dominating headlines since OpenAI’s ChatGPT became the breakout industry leader. Open AIs foundational model works by analyzing each word inputted and sequentially predicting subsequent words - in this way, complete responses are created. Trained on an extensive array of Wikipedia entries, books, and internet resources, these types of LLMs are able to break language down into a series of probability exercises. This technique involves an immense volume of training data, exemplified by Google's latest AI model using nearly 3.6 trillion tokens.

All this data is necessary for general-purpose AI models as they operate as probability engines, assigning likelihoods to potential responses. However, if the ingested information is biased, incomplete, or problematic, the outputs can be unreliable at best, and offensive at worst. These hallucinations -  fabricated data that appear authentic -  occur because LLMs lack any internal understanding of the world they’re describing. As the datasets of LLMs increase, so too does their capacity to act in unusual ways. Microsoft's Bing, powered by GPT-3, not only utilizes the LLM but also integrates search engine results. Within this high-input approach, it’s been found that LLMs can transcend their initial programming and generate content in languages they weren't explicitly trained on. This emergent behavior, while not fully understood, highlights the inherent unpredictability of large-scale, unrefined AI engines. 

The high-volume, spray-and-pray approach of general-purpose AI has become a source of real disappointment. Faced with underwhelming ROI and an unacceptably high risk of inaccuracies, organizations are increasingly applying their approach to more tightly-focused datasets. Enter, enterprise AI.

Instead of a single jack-of-all-trades, enterprise AI offers task-specific models that identify the natural language capability required - for example, context-rich answers - and the team it will be deployed to, such as customer support. This core focus is then supported by further verification mechanisms. By segmenting NLP into its core tasks and relying on internal datasets, teams are able to build their own customized integration of TSMs that provide reliable and grounded results - at a fraction of the cost. 

General Purpose AI vs. Enterprise AI: A Direct Comparison

Enterprise AI is increasingly proving itself to be the high-impact solution that regular AI promised to be. To understand why, both approaches deserve a breakdown of scope, deployment, and integration potential.

General-Purpose AI Enterprise AI
Goal Models are built to extract and utilize as much information as possible, encompassing research, writing, mathematics, translation, and coding. Models are built to target specific, high-value use cases within a team or project.
Data sources and transparency Web scrapers incorporate the entirety of text and images on the internet; other sources include books and articles. Generative AI is increasingly being used to create synthetic data to train other models. Minimal transparency for end-users. Models are trained from internal documents and therefore hyper-focused on relevant context. Complete transparency when inadequate data is provided, rather than guesswork.
Implementation within a team/project A team must self-select their own issues, before spending significant time fine-tuning answers with prompt engineering. Recognizes that organizations require more than raw AI engines. Support is offered to identify the highest-ROI use case in your organization.
Time to production Inhouse projects must go through a lengthy five-stage process of Problem Scoping, Data Acquisition, Data Exploration, Modeling, and Evaluation. Only afterward can the project start realizing ROI. Pre-tuned solutions allow for rapid integration and testing, skipping the headaches of production and data handling. Executive buy-in is immediately on the table.
Security Input data is either handled by a fully in-house model - or, in the case of completely third-party tools, submitted to a model with no transparency. Users have no idea whether sensitive data may reappear as outputs. Data is kept internal to the organization and only accessed on an as-needed basis.
Integrations Demands a degree of technical proficiency and understanding of your enterprise’s backend. The risk of infrastructural disruption is high. As a result, the integration process can suffer from significant delays in complex organizational structures. Offers deep integration with existing enterprise systems such as GCP and AWS. Plug-and-play APIs interact seamlessly with other components of a tech stack.

Revealing LLM’s Challenges - and Enterprise AI’s Solutions 

Without the further streamlining of enterprise-specific training, foundational AI models are left woefully unprepared for real-world applications. In 2023, the now-infamous Mata v. Avianca legal case took a turn for the unexpected when the claimant’s lawyer - tasked with proving his client’s injury during an Avianca flight - used ChatGPT in his legal research. As a result, the court was presented with some of the following judicial decisions: Varghese v. China South Airlines; Shaboon v. EgyptAir; and Estate of Durden v. KLM Royal Dutch Airlines. 

Not only did OpenAI’s LLM create all of these, but it further provided fake internal citations and quotes - even claiming they were available in major legal databases upon further questioning. Partly due to the legitimacy of its formatting, the lawyer responsible was completely unaware of artificial general intelligence’s occasional habit of generating false content. 

Below, we take a look at several AI-related challenges facing organizations, and the solutions that enterprise AI has to offer. 

The Challenge of Inaccurate Outputs

AI’s promise of human-like communication comes at an incredible cost: trust. Ultimately, a successful AI project hinges on retaining trust - both within a company and throughout its wider customer base. It’s why, according to a 2023 KPMG study, approximately 73% of individuals worldwide express apprehension about the potential dangers associated with AI. As the continued acceptance of AI is fundamentally dependent on trust, hallucinations threaten to completely eclipse this early optimism.

AI models, despite being trained on large datasets, are not always able to encompass every conceivable real-world situation. To address this, a process called grounding can enhance the model's training with extra, context-relevant information. This approach broadens the model's comprehension and boosts its capability to perform efficiently in a variety of practical scenarios. However, grounding the entirety of a foundational LLM model would incur eye-watering costs. 

This is how task-specific models can offer high accuracy at minimal cost - by narrowing its generative power down to a specific task, it becomes possible to provide enough external context to eradicate knowledge gaps. These Retrieval Augmented Generation (RAG)-supported LLMs further allow organizations to provide more context in the form of their own internal documentation. The retrieval engine can then search through the provided knowledge base, identify relevant similarities between a user's query, and provide answers of superior accuracy.

Combating a Lack of Specialized Expertise

The arms race for AI talent has seen a stubborn gap remain between enterprises’ AI needs and the current capabilities of their workforce. While long-term strategies are nice, they do not assist organizations hoping to realize their AI potential today. This is where the AI-as-a-service model is drastically accelerating adoption. 

One of the primary ways that Task Specific Models reduce technical complexity is by transforming raw AI engines into customer-centric, task-focused language models. That way - instead of having a taskforce build your organization’s own foundational model on Sagemaker - a pre-existing model is not only already available, but deployable with minimal complexity. 

For example, some common LLM tasks that promise the highest ROI are:

  • Retrieving Contextual Answers
  • Summarizing
  • Paraphrasing
  • Grammatical Error Correction (GEC)
  • Semantic search

Each of these can be reduced in complexity - enterprise AI addresses this by streamlining the data pipeline. With contextual answers, for instance, the input parameters can be whittled down to two - that is, ‘context’ for the relevant internal document - and 'question', for the relevant demand. This way, users aren’t required to be prompt engineers. Enterprise AI is built to factor in the background processing, letting the user simply provide their context and question.

It’s critical to actively reduce the burden on your AI team where you can. Go-to-market solutions drastically reduce this by providing access to pre-trained models via plug-and-play API. 

Safety Concerns

Just as inaccurate or harmful outputs significantly degrade end-user trust, AI is deeply entrenched in a wider conversation around security. The biggest privacy concern keeping most companies awake at night is the accidental disclosure of sensitive information through prompts.

To ensure safety in enterprise LLMs, rigorous measures are implemented to reduce instances where benign inputs lead to detrimental outputs. While harmful outputs are rare, enterprise AI must maintain a focus on preventing harmful outputs being created from harmless inputs. This is achieved by continuous training of the model to differentiate between acceptable and inappropriate content across various contexts, such as fantasy literature versus real-life situations. The goal is to minimize the risk of harmful outputs and promote responsible model usage. 

With Task-Specific Models made available through API, AI safety can also slot into the API security already in place for established organizations. This further allows for data security to be handled internally - for customer-focused applications, this can be as simple as giving no response to any question not answered by the associated documents.

Enterprise AI is Already Here: A Cross-Section of Use Cases

Enough philosophizing - enterprise AI is laser-focused on practical applications. As such, it’s beneficial to outline a few of the transformations occurring today. As one of the highest-ROI enterprise AI providers, these examples come straight from AI21’s userbase. Throughout, the task-by-task approach is already demonstrating its ability to maintain a rapid pace of change, while keeping data within the safe confines of your organization.

In Banking

To keep all bankers equipped with up-to-date stock information, term sheets require the rapid summarization of calls. For one global financial service provider, this was once a labor-intensive job that demanded manual data entry. AI21’s retrieval solution addressed this by skillfully extracting details via algorithms that analyze document relevance and user context. This data was then restructured into the appropriate spreadsheet formats. 

The results for this use case were significant: our solution achieved a 98% accuracy rate in the term sheets during testing. This high accuracy not only impressed clients by identifying and correcting an original mistake but also resulted in considerable time savings for bankers. By eliminating much of the manual effort previously required, bankers are now able to focus on more strategic tasks, thereby enhancing overall efficiency and productivity. 

In the Retail Space

While enterprise AI is streamlining financial data, it’s also making significant changes within the retail sector: one European sports retailer experienced a significant sales uplift by integrating our generative AI within product descriptions for their third-party marketplace. This went much deeper than simple LLM requests: their major roadblock to AI adoption was unreliable seller data. 

With white-glove support and a data enrichment strategy that sourced information from multiple datasets, AI21’s description model was able to rapidly produce new product descriptions in several languages. This not only increased the conversion rate of online sales, but saw a more than 25% reduction in the time taken to onboard new products, streamlining the process and significantly enhancing operational efficiency.

In the Medical Field

Whether retrieving and summarizing data or generating accurate descriptions - the rapid pace of AI21 tooling makes it keenly suited for medical queries, where time is of the essence. In this use case, a global pharmaceutical company enhanced its digital assistant by equipping it with the ability to provide grounded, safe medical responses in real time for patient inquiries. The challenge was to deliver accurate and reliable medical information on demand.  AI21’s Contextual Answers were able to address this need, enabling the generation of complex medical content within an impressive span of eight weeks.

The results were notable: AI21 is uniquely capable of managing medical content, and the pharmaceutical company was able to maintain its security standards while successfully providing patients with grounded and safe medical answers. As a result, patients were given critical health information in a more personalized, approachable manner.

Explore Enterprise AI’s Potential Today

This year will see generative AI evolve from pilots and prototypes into full-scale production. As adoption increases, there’s a considerable chance that the language models - trained off company assets - become their own streams of revenue as companies begin to offer their customized language models as SaaS products. The companies with dialed-in enterprise AI look to benefit from a sizeable head start.

But first, build up trust and adoption of your AI pilots with AI21 Labs. Our state-of-the-art LLMs are only the foundation: hyper-customizable TSMs fit quickly and securely into the workflows and customer funnels your teams have already worked so hard to build. With a baseline of reliability and accuracy, your AI production can keep a close line-of-sight to business value. 

If you're exploring a solution to your team’s time sinks or customers’ pain points, our white-glove support can help identify your highest-ROI use cases - and accelerate your AI adoption within weeks. Reach out today to jumpstart your enterprise AI journey.

Discover more

What is a MRKL system?

In August 2021 we released Jurassic-1, a 178B-parameter autoregressive language model. We’re thankful for the reception it got – over 10,000 developers signed up, and hundreds of commercial applications are in various stages of development. Mega models such as Jurassic-1, GPT-3 and others are indeed amazing, and open up exciting opportunities. But these models are also inherently limited. They can’t access your company database, don’t have access to current information (for example, latest COVID numbers or dollar-euro exchange rate), can’t reason (for example, their arithmetic capabilities don’t come close to that of an HP calculator from the 1970s), and are prohibitively expensive to update.
A MRKL system such as Jurassic-X enjoys all the advantages of mega language models, with none of these disadvantages. Here’s how it works.

Compositive multi-expert problem: the list of “Green energy companies” is routed to Wiki API, “last month” dates are extracted from the calendar and “share prices” from the database. The “largest increase“ is computed by the calculator and finally, the answer is formatted by the language model.

There are of course many details and challenges in making all this work - training the discrete experts, smoothing the interface between them and the neural network, routing among the different modules, and more. To get a deeper sense for MRKL systems, how they fit in the technology landscape, and some of the technical challenges in implementing them, see our MRKL paper. For a deeper technical look at how to handle one of the implementation challenges, namely avoiding model explosion, see our paper on leveraging frozen mega LMs.

A further look at the advantages of Jurassic-X

Even without diving into technical details, it’s easy to get a sense for the advantages of Jurassic-X. Here are some of the capabilities it offers, and how these can be used for practical applications.

Reading and updating your database in free language

Language models are closed boxes which you can use, but not change. However, in many practical cases you would want to use the power of a language model to analyze information you possess - the supplies in your store, your company’s payroll, the grades in your school and more. Jurassic-X can connect to your databases so that you can ‘talk’ to your data to explore what you need-  “Find the cheapest Shampoo that has a rosy smell”, “Which computing stock increased the most in the last week?” and more. Furthermore, our system also enables joining several databases, and has the ability to update your database using free language (see figure below).

Jurassic-X enables you to plug in YOUR company's database (inventories, salary sheets, etc.) and extract information using free language

AI-assisted text generation on current affairs

Language models can generate text, yet can not be used to create text on current affairs, because their vast knowledge (historic dates, world leaders and more) represents the world as it was when they were trained. This is clearly (and somewhat embarrassingly) demonstrated when three of the world’s leading language models (including our own Jurassic-1) still claim Donald Trump is the US president more than a year after Joe Biden was sworn into office.
Jurassic-X solves this problem by simply plugging into resources such as Wikidata, providing it with continuous access to up-to-date knowledge. This opens up a new avenue for AI-assisted text generation on current affairs.

Who is the president of the United States?

Donald Trump
Donald Trump
Donald Trump
Joe Biden
Joe Biden is the
46th and current
Jurassic-X can assist in text generation on up-to-date events by combining a powerful language model with access to Wikidata

Performing math operations

A 6 year old child learns math from rules, not only by memorizing examples. In contrast, language models are designed to learn from examples, and consequently are able to solve very basic math like 1-, 2-, and possibly 3- digit addition, but struggle with anything more complex. With increased training time, better data and larger models, the performance will improve, but will not reach the robustness of an HP calculator from the 1970s. Jurassic-X takes a different approach and calls upon a calculator whenever a math problem is identified by the router. The problem can be phrased in natural language and is converted by the language model to the format required by the calculator (numbers and math operations). The computation is performed and the answer is converted back into free language.
Importantly (see example below) the process is made transparent to the user by revealing the computation performed, thus increasing the trust in the system. In contrast, language models provide answers which might seem reasonable, but are wrong, making them impractical to use.

The company had 655400 shares which they divided equally among 94 employees. How many did each employee get?

94 employees.
Each employee got 7000 stocks
(No answer provided)
X= 655400/94
Jurassic-X can answer non-trivial math operations which are phrased in natural language, made possible by the combination of a language model and a calculator


Solving simple questions might require multiple steps, for example - “Do more people live in Tel Aviv or in Berlin?” requires answering: i. What is the population of Tel-Aviv? ii. What is the population of Berlin? iii. Which is larger? This is a highly non-trivial process for a language model, and language models fail to answer this question (see example). Moreover, the user can’t know the process leading to the answers, hence is unable to trust them. Jurassic-X can decompose such problems into the basic questions, route each to the relevant expert, and put together an answer in free language. Importantly, Jurassic-X not only provides the correct answer but also displays the steps taken to reach it, increasing the trust in the system.

Do more people live in Tel Aviv or in Berlin?

There are more people living in Tel Aviv than in Berlin.
Berlin and Tel Aviv are roughly the same size
(First hit is a comparison between Tel Aviv and Berlin)
More people live in Berlin than in Tel-Aviv

[‘Return population of Tel Aviv’; Return population of Berlin’; Return which is bigger between #1 and #2’]
Step 1: Population of Tel Aviv. Result - 451523.
Step 1: Population of Berlin. Result - 3664088.
Step 3: Which is bigger,  #1 or #2. Result - Berlin.

Jurassic-X breaks down compositional questions, answers the basic sub-questions, and puts together the answer. Importantly, this process is transparent to the user greatly increasing the trust in the system

Dynamic information (like weather and currency exchange rates)

Certain types of information change continuously - weather, currency exchange rates, share values and more. Such information will never be captured by language models, yet can easily be handled by Jurassic-X by integrating it with a reliable source of information. We performed a proof-of-concept on two such features - weather and currency exchange rates, and the design enables quick integration with more sources to solve your use-case.
Weather - a loosely phrased question about the weather elicits an answer from all language models, where language models always return the same answer, regardless of when the question was asked (funny, right?), while Jurassic-X provides an answer based on the actual weather prediction.

I’m going to be in New-York in 3 days. Should I pack my umbrella?

Yes, you should pack your umbrella.
Yes, you should. The weather forecast is rain.
(Links to weather websites)
Yes, you should pack your umbrella, because in New York in 3 days there will be broken clouds and the temperature will be -2 degrees.

Currency exchange rates change much faster than weather predictions, yet the Jurassic-X concept - a language model connected to a reliable source of information - easily solves this problem as well.

How much Moroccan money will I get for 100 bucks?

125 dirhams
100 moroccan dirhams is about 27$.
How much is 100 dollars in moroccan money?
100 dirhams = 10.75 dollars
100 USD = 934.003 MAD
Jurassic-X combines a language model with access to APIs with continuously changing information. This is demonstrated for weather forecasts and currency exchange rates, and can easily be extended to other information sources

Transparency and trust

Transparency is a critical element that is lacking in language models, preventing a much wider adoption of these models. This lack of transparency is demonstrated by the answers to the question - “Was Clinton ever elected as president of the United States?”. The answer, of course, depends on which Clinton you have in mind, which is only made clear by Jurassic-X that has a component for disambiguation. More examples of Jurassic-X’s transparency were demonstrated above - displaying the math operation performed to the user, and the answer to the simple sub-questions in the multi-step setting.

Was Clinton ever elected president of the United States?

No, Clinton was never elected as president of the United States.
Clinton was elected president in the 1992 presidential elections…
Bill Clinton was elected president.
Jurassic-X is designed to be more transparent by displaying which expert answered which part of the question, and by presenting the intermediate steps taken and not just the black-box response

Your Turn

That's it, you get the picture. The use cases above give you a sense for some things you could do with Jurassic-X, but now it's your turn. A MRKL system such as Jurassic-X is as flexible as your imagination. What do you want to accomplish? Contact us for early access

Contact us below and we will get back to you shortly.

Thank you!

Your submission has been received!
Oops! Something went wrong while submitting the form.