Redesigning the 5-star Rating system to improve accuracy

This Case Study was published in UX Collective Mag in 2022

While designing a new product, I needed a scoring system. Users should vote for a series of parameters, and the first thing that comes to mind is the legacy 5-star rating system. But in recent years this rating system has shown that it’s not optimized and/or efficient.

To break the design rules, you must first master them

Audemars Piguet

To find an alternative system, we need to grasp the 5-star rating model to identify its weaknesses and comprehend the issues it creates in user experience, leading me to set it aside.

Stars do not convey the accurate meaning of how good or bad something is (from the point of view of different users)

Cons of the 5-star rating system:

  • The main issue is that stars do not accurately convey the meaning (from the point of view of different users) of how good or bad something is. In my initial research, I realized that a 3-star rating means “appropriate” for some users and “weak” for others! This means that the data we collect from the rating system will not be accurate and reliable.
  • Another Issue: Is it worse to give something 1 star or no star at all? Should no-star be considered the same as zero? If we want to calculate the average user rating for a product, do we have to calculate that zero? Or should we consider it equal to 3 (as average)? Or not to consider it at all?
  • Next issue: while looking for high-rated movies on IMDB (which has a scoring system of 1 to 10) Good comedy movies usually have a score of 6 (which is equal to 3 stars in a 5-star system) whilst good horror movies usually have a score of 8! This means that users subconsciously give comedies fewer points (they attribute a sense of ridicule to the movie and devalue it in their minds), thus we have very few comedy movies with 8 stars. Whilst comedy movies are selling nicely, Even better than horror movies.

Netflix’s experience in scoring system design

Research shows that when users rate a movie a high score, it does not necessarily mean that they like that movie! Designers of Netflix found out that users rate documentaries higher, but when they return to the site, they are looking for movies that they have rated less! And so Netflix algorithms could not offer users the movies they really like based on their ratings. To improve the rating system, Netflix and YouTube decided to replace the 5-star system with the Like and Dislike buttons which resulted in more accurate rates.

Netflix recently announced that the ranks you see at the top of each movie is not necessarily the average rating given by users. But, it is a number that we think you will give to this movie based on our data mining algorithms, your interests, and your taste! (So if you have always given a high rating to comedy movies, we will show you a higher ranking on comedy movies!)

Research on comments on Amazon has shown that most comments fall into two categories: either they fell in love with the product and praised it, or they hated it badly and said bad things. There are just a few moderate comments that have criticized the product fairly.

Cognitive science can explain this better, but the bottom line is that people only react in extreme cases (not in normal situations)

Alternative rating systems

“Given all this and the ambiguity of the 5-star system, we decided to find a more accurate rating system, and here are the advantages and disadvantages of each rating system:

The first alternative is to keep the same 5 rating levels and replace the stars with emoticons. Option 1 now means “very bad”. While in the previous picture, which had 5 hearts, it was not possible to take one heart as “very bad”. Because the heart or the star has no negative meaning at all.

The Alternative: Thumbs Up/Down

More commonly known as likes and dislikes, are easier for users to measure. Because:

1.  Like / dislike makes it easy for the user to judge. Because it has only two modes. Those are two extreme modes. That is, either the top line or the bottom line! (Compared to the 5-star modes that examine a larger domain)

2.  Dislike clearly indicates that we have missed something. Netflix says the two-button option in the two-button system had a huge advantage for us: Now we can figure out what the user really hates so we never have to show it again in the timeline. (the 5-Star system had no dislike. Even if someone gives something 1 star, you can not tell if the user dislikes it.

3.  Research has shown that the thumbs, encourage users to vote and increase user engagement with the service (in comparison to the 5-star system). If you want more engagement, you have to make voting easier. Like Twitter and Instagram, which have only one state instead of two.

Request accuracy from users = less engagement

Sometimes, we do not need so much detail, and when we ask users to be more accurate, they just leave the application.

Don Norman (The UX founder of Apple in the 90s) discusses in his book, the Designing of Everyday Things:
It is 55° Fahrenheit here. And I’m used to the Celsius system. So what is it in Celsius? This is the exact formula for converting Fahrenheit to Celsius:

C = (F-32) × 5 ÷ 9

It’s hard to memorize and calculate this formula and I will never remember it! But there is a semi-accurate formula that you can easily memorize and use:

C = (F-30) ÷ 2

The answer to the first equation is 12.8 and the second answer, which is less accurate, is 12.5. For me (Norman) an inaccurate answer is enough and it leads me to the result. Because I just wanted to see if it was cold to wear a sweater or not!

Result:

The kind of result we need determines how accurately we should get the numbers from users.

Final Design Solution

This is the final rating component I designed for the product which shows how users agree or disagree with something.

Sometimes we do not need “2 thumbs down” button (which means totally disagree) and we can remove it. But we need to know if users “like” or “love” something, So I included two thumbs up button and one thumbs down button and also an indication for when users were undecided about the vote topic.

1. Totally disagree

2. disagree

3. neutral

4. Agree

5. Totally Agree