Back to Blog

Build a dashboard based on freeform sentiment analysis of hotel reviews

Yuval Belfer
,
Technical Product Marketing
,
,
October 12, 2022
No items found.

Analyze large quantities of reviews in minutes using AI21 Studio

Imagine that you have a platform for hotel reservations, similar to Hotels.com. On your platform, hotel visitors can leave written reviews and an overall star rating for the hotel. These reviews allow hotel owners to gain a general understanding of guest satisfaction through their overall score, but that’s just one part of the story. The quality of the hotel experience is a combination of many factors, including the room quality, the facilities, and the staff. Ideally, the owner would like to get a clear picture of their hotel’s strengths and weaknesses so they can improve on the aspects that are lacking and highlight the aspects that visitors find positive.

To extract these insights, some hotel owners sit down from time to time and read all of the reviews that were written about their hotel. This is a tedious task - one you, as a developer, would probably want to spare your users from. What if you could create a dashboard that highlights the areas guests mention positively or negatively in their reviews? As a result, your user - in this case, the hotel owner - can get a real-time snapshot of their strengths and weaknesses with just a glance at this dashboard.

Not so long ago, you would have needed to work pretty hard to create a solution like that (for example, using several classical methods). But with large language models (LLMs), it’s really easy. You can perform this analysis with high accuracy, even with no prior knowledge of natural language processing (NLP). If that sounds appealing to you, then read on. By the end of this post, you will be able to implement this feature in your platform or product.

This post will walk you through the process of building an NLP-powered dashboard for a hotel, including:

  • Using an external API to gather real-world data.
  • Performing freeform combined topic extraction and sentiment analysis using Jurassic-1, part of the AI21 Studio suite of large language models.
  • Generating the output in a convenient format (JSON) so it’s easier to process.

If you are new to large language models, we recommend first reading this post.

Step 1: Collect reviews using an external API

You can find the Hotels.com API on RapidAPI. There are several endpoints to this API. We need the reviews endpoint, that returns the reviews for a given hotel, page-by-page. There are approximately 50 reviews per page. In order to retrieve all of the available reviews we will call the API while iterating through the pages. The following functions do just that:

def get_hotel_reviews_page(hotel_id, page_number):
    params = {
        "locale": "en_US", 
        "hotel_id": hotel_id, 
        "page_number": str(page_number)
    }
    headers = {
        "X-RapidAPI-Key": RAPID_API_KEY,
        "X-RapidAPI-Host": "hotels-com-provider.p.rapidapi.com"
    }
    response = requests.get(url=GET_REVIEWS_URL, headers=headers, params=params)
     
    reviews = [
        review['summary'] 
        for page_reviews in response.json()['groupReview']
        for review in page_reviews['reviews']
    ]
    return reviews


def get_hotel_reviews(hotel_id, num_pages):
    all_reviews = sum(
        [get_hotel_reviews_page(hotel_id, page_number) for page_number in range(1, num_pages + 1)],
        start=[]
    )
    return all_reviews

Note that running this function requires an API key for the reviews endpoint, which you can obtain from RapidAPI. This is not your AI21 Studio API key. There are some edge cases that this function doesn’t cover, such as requesting a page number that doesn’t exist, but we’ll set them aside for the purposes of this blog post.

Throughout this process, we will use the Empire Hotel in New York as a running example. You can use an API endpoint to get the hotel ID, or you can find it in the hotel’s URL on Hotels.com:

Step 2: Prepare the reviews data for the language model

Large language models are very powerful and can ingest text of all shapes and sizes, but as the old saying goes, “garbage in, garbage out”. If you feed the model a bad prompt, the results will be far from optimal.

In this case, you should keep the following points in mind:

  • Reviews that are too short could be problematic since they often do not contain enough information to extract any meaningful insights.
  • Currently, our models are limited to the English language only.

In order to remove reviews that are too short or not in English, you can apply some simple filters to all of the reviews:

from langdetect import detect
def filter_reviews(reviews):
    en_reviews = [review for review in reviews if len(review)>=20 and detect(review)=="en"]
            
    return en_reviews

Note: Here, we have removed reviews that have no real content by applying a very basic filter. Although this is not mandatory, we recommend that you pre-process your reviews by, for example, removing weird characters and extra spaces, etc.

Step 3: Extract the categories and sentiments using AI21 Studio

This step is where you truly  harness the power of AI21 Studio’s large language models! 

You want the model to extract the topics and sentiments of each free text review into a structured JSON format. You can do this by leveraging a main strength of language models: when provided with text in plain English, the language model can identify patterns and generate text that follows the same pattern. By feeding the model a prompt with a few examples (this is called a few-shot prompt), it can identify the pattern and generate a reasonably good completion.

Obtaining these examples, however, requires you to manually go through several reviews, which we have done below. The resulting few-shot prompt is as follows (the reviews are as written by the platform’s users, with no grammar or spelling corrections):

Review:
Great experience for two teenagers. We would book again. Location good.
Extracted sentiment:
{"Location": "Positive"}
##
Review:
Extremely old cabinets, phone was half broken and full of dust. Bathroom door was broken, bathroom floor was dirty and yellow. Bathroom tiles were falling off. Asked to change my room and the next room was in the same conditions.
The most out of date and least maintained hotel i ever been on.
Extracted sentiment:
{"Cleaning": "Negative", "Hotel Facilities": "Negative", "Room Quality": "Negative"}
##
Review:
Roof top’s view is gorgeous and the lounge area is comfortable. The staff is very courteous and the location is great. The hotel is outdated and the shower need to be clean better. The air condition runs all the time and cannot be control by the temperature control setting.
Extracted sentiment:
{"Cleaning": "Negative", "AC": "Negative", "Room Quality": "Negative", "Service": "Positive", "View": "Positive", "Hotel Facilities": "Positive"}
##

Creating a good prompt is more than simply deciding on the pattern. The goal is to construct a prompt that triggers the model to generate the optimal completion (this is called prompt engineering). To achieve this, you should keep the following in mind:

  • Variety: the examples in the prompt will determine the model’s responses for unseen data, so they must be diverse enough to reflect the real-world distribution. This applies to both the structure of the reviews (such as length) and the content (the topics discussed in every review, the sentiments, etc.). Be sure to include reviews that are mixed in sentiment (like the third example provided above), as these are usually harder to analyze.
  • Amount: how “few” examples should our few-shot prompt include? When it comes to this relatively complex task, it is recommended that you provide at least eight different examples in the prompt (depending on the number of topics and the variety within them). The most effective way to determine this is through testing it out in the playground. Try it yourself! 

Additionally, in this use-case, we recommend setting the temperature to 0, as high accuracy is required more than creativity. Increasing the temperature will result in more creative results, while lowering the temperature will increase their accuracy. Curious about temperature? See Step 4 in this post for more detail on this.

Happy with the prompt and want to start analyzing reviews? You can copy the few-shot prompt from the playground:

[video-to-gif output image]

And use the following function to create the prompt for every review:

def create_review_prompt(review):
    few_shot_examples = FEW_SHOT_EXAMPLES
    prompt = few_shot_examples + review + "\nExtracted sentiment:\n"
    return prompt

For every review, create the prompt and then call Jurassic-1 to perform the analysis (you can take the call from the playground, as illustrated above, or use the function from here).

Step 4: Create the dashboard

Once you have the list of topics and sentiments, you can create your dashboard.

First, gather all of the topics together, assigning a count of "Positive" or "Negative" to each topic. Since you already have the completion in JSON format, you can process it using standard packages. However, as the format may not be perfect, and you don’t want any failures in your automated process, you can add a simple try-catch block. This means if a completion from the model isn’t in perfect JSON format, you drop it. You can use the following function:

def get_topK_categories_and_score(sentiments, k=7):
    parsed_sentiments = []
    for sentiment in sentiments:
      try:
         parsed_sentiments.append(ast.literal_eval(sentiment))
      except SyntaxError:
        pass
    
    df = pd.DataFrame(parsed_sentiments)

    # extract the K categories with the most reviews    
    keys = df.count().sort_values(ascending=False)[:k].index.tolist()
    
    category_names = ['Negative', 'Positive']
    scores = {}

    for key in keys:
        scores[key] = [(df[key] == category_names[0]).sum(), (df[key] == category_names[1]).sum()]
        
    return scores, category_names

At this stage, all that’s left to do is create the figure. With minor changes to this matplotlib example, you’ll have your dashboard:

You can see that the hotel is deemed excellent in Location and rather good in Service and Cleaning. However, it should invest more in the WiFi and AC, and perhaps do some renovations or upgrades to the rooms and facilities.

As a last step and some sanity check, you’ll probably want to validate your results, but without manually reading every review. One way to do that is to compare them with those of other hotel platforms, such as Booking.com. If you go to the Booking.com page for this hotel, where visitors are asked to rate hotels across numerous categories, you will find the overall picture is very similar to your own analysis:

Summary

By following the steps laid out in this post, you have built a very useful feature that can be implemented on hotel and accommodation platforms. Thanks to large language models, analyzing pieces of text, such as reviews, has never been easier. A few simple tweaks, such as writing the examples in JSON format, can save a lot of time in post-processing, making the entire process faster and easier. You can find the full notebook in our dev-hub.

Are you interested in building your own feature? With a custom model, you’ll  always get the highest quality results. You can find out more about that here.

Discover more

What is a MRKL system?

In August 2021 we released Jurassic-1, a 178B-parameter autoregressive language model. We’re thankful for the reception it got – over 10,000 developers signed up, and hundreds of commercial applications are in various stages of development. Mega models such as Jurassic-1, GPT-3 and others are indeed amazing, and open up exciting opportunities. But these models are also inherently limited. They can’t access your company database, don’t have access to current information (for example, latest COVID numbers or dollar-euro exchange rate), can’t reason (for example, their arithmetic capabilities don’t come close to that of an HP calculator from the 1970s), and are prohibitively expensive to update.
A MRKL system such as Jurassic-X enjoys all the advantages of mega language models, with none of these disadvantages. Here’s how it works.

Compositive multi-expert problem: the list of “Green energy companies” is routed to Wiki API, “last month” dates are extracted from the calendar and “share prices” from the database. The “largest increase“ is computed by the calculator and finally, the answer is formatted by the language model.

There are of course many details and challenges in making all this work - training the discrete experts, smoothing the interface between them and the neural network, routing among the different modules, and more. To get a deeper sense for MRKL systems, how they fit in the technology landscape, and some of the technical challenges in implementing them, see our MRKL paper. For a deeper technical look at how to handle one of the implementation challenges, namely avoiding model explosion, see our paper on leveraging frozen mega LMs.

A further look at the advantages of Jurassic-X

Even without diving into technical details, it’s easy to get a sense for the advantages of Jurassic-X. Here are some of the capabilities it offers, and how these can be used for practical applications.

Reading and updating your database in free language

Language models are closed boxes which you can use, but not change. However, in many practical cases you would want to use the power of a language model to analyze information you possess - the supplies in your store, your company’s payroll, the grades in your school and more. Jurassic-X can connect to your databases so that you can ‘talk’ to your data to explore what you need-  “Find the cheapest Shampoo that has a rosy smell”, “Which computing stock increased the most in the last week?” and more. Furthermore, our system also enables joining several databases, and has the ability to update your database using free language (see figure below).

Jurassic-X enables you to plug in YOUR company's database (inventories, salary sheets, etc.) and extract information using free language

AI-assisted text generation on current affairs

Language models can generate text, yet can not be used to create text on current affairs, because their vast knowledge (historic dates, world leaders and more) represents the world as it was when they were trained. This is clearly (and somewhat embarrassingly) demonstrated when three of the world’s leading language models (including our own Jurassic-1) still claim Donald Trump is the US president more than a year after Joe Biden was sworn into office.
Jurassic-X solves this problem by simply plugging into resources such as Wikidata, providing it with continuous access to up-to-date knowledge. This opens up a new avenue for AI-assisted text generation on current affairs.

Who is the president of the United States?

T0
Donald Trump
GPT-3
Donald Trump
Jurassic-1
Donald Trump
Google
Joe Biden
Jurassic-X
Joe Biden is the
46th and current
president
Jurassic-X can assist in text generation on up-to-date events by combining a powerful language model with access to Wikidata

Performing math operations

A 6 year old child learns math from rules, not only by memorizing examples. In contrast, language models are designed to learn from examples, and consequently are able to solve very basic math like 1-, 2-, and possibly 3- digit addition, but struggle with anything more complex. With increased training time, better data and larger models, the performance will improve, but will not reach the robustness of an HP calculator from the 1970s. Jurassic-X takes a different approach and calls upon a calculator whenever a math problem is identified by the router. The problem can be phrased in natural language and is converted by the language model to the format required by the calculator (numbers and math operations). The computation is performed and the answer is converted back into free language.
Importantly (see example below) the process is made transparent to the user by revealing the computation performed, thus increasing the trust in the system. In contrast, language models provide answers which might seem reasonable, but are wrong, making them impractical to use.

The company had 655400 shares which they divided equally among 94 employees. How many did each employee get?

T0
94 employees.
GPT-3
Each employee got 7000 stocks
Jurassic-1
1.5
Google
(No answer provided)
Jurassic-X
6972.3
X= 655400/94
Jurassic-X can answer non-trivial math operations which are phrased in natural language, made possible by the combination of a language model and a calculator

Compositionality

Solving simple questions might require multiple steps, for example - “Do more people live in Tel Aviv or in Berlin?” requires answering: i. What is the population of Tel-Aviv? ii. What is the population of Berlin? iii. Which is larger? This is a highly non-trivial process for a language model, and language models fail to answer this question (see example). Moreover, the user can’t know the process leading to the answers, hence is unable to trust them. Jurassic-X can decompose such problems into the basic questions, route each to the relevant expert, and put together an answer in free language. Importantly, Jurassic-X not only provides the correct answer but also displays the steps taken to reach it, increasing the trust in the system.

Do more people live in Tel Aviv or in Berlin?

T0
Berlin
GPT-3
There are more people living in Tel Aviv than in Berlin.
Jurassic-1
Berlin and Tel Aviv are roughly the same size
Google
(First hit is a comparison between Tel Aviv and Berlin)
Jurassic-X
More people live in Berlin than in Tel-Aviv

[‘Return population of Tel Aviv’; Return population of Berlin’; Return which is bigger between #1 and #2’]
Step 1: Population of Tel Aviv. Result - 451523.
Step 1: Population of Berlin. Result - 3664088.
Step 3: Which is bigger,  #1 or #2. Result - Berlin.

Jurassic-X breaks down compositional questions, answers the basic sub-questions, and puts together the answer. Importantly, this process is transparent to the user greatly increasing the trust in the system

Dynamic information (like weather and currency exchange rates)

Certain types of information change continuously - weather, currency exchange rates, share values and more. Such information will never be captured by language models, yet can easily be handled by Jurassic-X by integrating it with a reliable source of information. We performed a proof-of-concept on two such features - weather and currency exchange rates, and the design enables quick integration with more sources to solve your use-case.
Weather - a loosely phrased question about the weather elicits an answer from all language models, where language models always return the same answer, regardless of when the question was asked (funny, right?), while Jurassic-X provides an answer based on the actual weather prediction.

I’m going to be in New-York in 3 days. Should I pack my umbrella?

T0
No
GPT-3
Yes, you should pack your umbrella.
Jurassic-1
Yes, you should. The weather forecast is rain.
Google
(Links to weather websites)
Jurassic-X
Yes, you should pack your umbrella, because in New York in 3 days there will be broken clouds and the temperature will be -2 degrees.

Currency exchange rates change much faster than weather predictions, yet the Jurassic-X concept - a language model connected to a reliable source of information - easily solves this problem as well.

How much Moroccan money will I get for 100 bucks?

T0
125 dirhams
GPT-3
100 moroccan dirhams is about 27$.
Jurassic-1
How much is 100 dollars in moroccan money?
Google
100 dirhams = 10.75 dollars
Jurassic-X
100 USD = 934.003 MAD
Jurassic-X combines a language model with access to APIs with continuously changing information. This is demonstrated for weather forecasts and currency exchange rates, and can easily be extended to other information sources

Transparency and trust

Transparency is a critical element that is lacking in language models, preventing a much wider adoption of these models. This lack of transparency is demonstrated by the answers to the question - “Was Clinton ever elected as president of the United States?”. The answer, of course, depends on which Clinton you have in mind, which is only made clear by Jurassic-X that has a component for disambiguation. More examples of Jurassic-X’s transparency were demonstrated above - displaying the math operation performed to the user, and the answer to the simple sub-questions in the multi-step setting.

Was Clinton ever elected president of the United States?

T0
Yes
GPT-3
No, Clinton was never elected as president of the United States.
Jurassic-1
No
Google
Clinton was elected president in the 1992 presidential elections…
Jurassic-X
Bill Clinton was elected president.
Jurassic-X is designed to be more transparent by displaying which expert answered which part of the question, and by presenting the intermediate steps taken and not just the black-box response

Your Turn

That's it, you get the picture. The use cases above give you a sense for some things you could do with Jurassic-X, but now it's your turn. A MRKL system such as Jurassic-X is as flexible as your imagination. What do you want to accomplish? Contact us for early access

Contact us below and we will get back to you shortly.

Thank you!

Your submission has been received!
Oops! Something went wrong while submitting the form.