Rating Product & Sorting Reviews in Amazon

7 min readNov 29, 2023

In this blog, we’ll be focusing on two primary goals:

Calculating the average rating by considering current reviews and comparing it with the existing average rating.
Sorting reviews using various methods for comparison

Variables:
#reviewerID: User ID
#asin: Product ID
#reviewerName: User Name
#helpful: Helpful rating degree
#reviewText: Review
#overall: Product rating
#summary: Review summary
#unixReviewTime: Review time
#reviewTime: Raw review time
#day_diff: Number of days since the review
#helpful_yes: Number of times the review was found helpful
#total_vote: Total votes given to the review

import matplotlib.pyplot as plt
import pandas as pd
import math
import scipy.stats as st

pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', 10)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.5f' % x)
df = pd.read_csv("amazon_review.csv")
df.head()

      reviewerID        asin  reviewerName helpful                                         reviewText  overall                                 summary  unixReviewTime  reviewTime  day_diff  helpful_yes  total_vote
0  A3SBTW3WS4IQSN  B007WTAJTO           NaN  [0, 0]                                         No issues.  4.00000                              Four Stars      1406073600  2014-07-23       138            0           0
1  A18K1ODH1I2MVB  B007WTAJTO          0mie  [0, 0]  Purchased this for my device, it worked as adv...  5.00000                           MOAR SPACE!!!      1382659200  2013-10-25       409            0           0
2  A2FII3I2MBMUIA  B007WTAJTO           1K3  [0, 0]  it works as expected. I should have sprung for...  4.00000               nothing to really say....      1356220800  2012-12-23       715            0           0
3   A3H99DFEG68SR  B007WTAJTO           1m2  [0, 0]  This think has worked out great.Had a diff. br...  5.00000  Great buy at this price!!!  *** UPDATE      1384992000  2013-11-21       382            0           0
4  A375ZM4U047O79  B007WTAJTO  2&amp;1/2Men  [0, 0]  Bought it with Retail Packaging, arrived legit...  5.00000                        best deal around      1373673600  2013-07-13       513            0           0

Rating Products

We have calculated the basic average rating of the product.

df["overall"].mean()
Out[21]: 4.587589013224822

To sort by time, we define the Time Based Weighted Average function:

def time_based_weighted_average(dataframe, w1=50, w2=25, w3=15, w4=10):
    return dataframe.loc[dataframe["day_diff"] <= dataframe["day_diff"].quantile(0.25), "overall"].mean() * w1 / 100 + \
           dataframe.loc[(dataframe["day_diff"] > dataframe["day_diff"].quantile(0.25)) & (dataframe["day_diff"] <= dataframe["day_diff"].quantile(0.50)), "overall"].mean() * w2 / 100 + \
           dataframe.loc[(dataframe["day_diff"] > dataframe["day_diff"].quantile(0.50)) & (dataframe["day_diff"] <= dataframe["day_diff"].quantile(0.75)), "overall"].mean() * w3 / 100 + \
           dataframe.loc[(dataframe["day_diff"] > dataframe["day_diff"].quantile(0.75)), "overall"].mean() * w4 / 100

We increased the weight of newly written comments and our new rating value became 4.637306192407316. Our basic basic average rating was 4.587589013224822.

So, by using the Time-Based Weighted function, we achieved a higher rating.




time_based_weighted_average(df, w1=50, w2=25, w3=15, w4=10)

time_based_weighted_average(df, w1=50, w2=25, w3=15, w4=10)
Out[48]: 4.637306192407316

We can change the weights. For example:

time_based_weighted_average(df, w1=60, w2=30, w3=8, w4=2)

time_based_weighted_average(df, w1=60, w2=30, w3=8, w4=2)
Out[7]: 4.662975899944154

Sorting Reviews

Our goal is to determine 20 Reviews that will be displayed on the product detail page for the product.

There is no ‘helpful_no’ variable in data set. We need to create it. Up refers to helpful. We created helpful_no varible and create new df for we’ll use:

df["helpful_no"] = df["total_vote"] - df["helpful_yes"]

df = df[["reviewerName", "overall", "summary", "helpful_yes", "helpful_no", "total_vote", "reviewTime"]]
df.head()

reviewerName  overall                                 summary  helpful_yes  helpful_no  total_vote  reviewTime
0           NaN  4.00000                              Four Stars            0           0           0  2014-07-23
1          0mie  5.00000                           MOAR SPACE!!!            0           0           0  2013-10-25
2           1K3  4.00000               nothing to really say....            0           0           0  2012-12-23
3           1m2  5.00000  Great buy at this price!!!  *** UPDATE            0           0           0  2013-11-21
4  2&amp;1/2Men  5.00000                        best deal around            0           0           0  2013-07-13

Calculating score_pos_neg_diff, score_average_rating, and wilson_lower_bound scores and adding them to the dataset.

Up-Down Diff Score = (up ratings) − (down ratings)

def score_up_down_diff(up,down):
    return up - down

df["score_pos_neg_diff"] = df.apply(lambda x: score_up_down_diff(x["helpful_yes"],x["helpful_no"]),axis=1)

df.sort_values("score_pos_neg_diff", ascending=False).head(10)

                   reviewerName  overall                                            summary  helpful_yes  helpful_no  total_vote  reviewTime  score_pos_neg_diff
2031         Hyoun Kim "Faluzure"  5.00000  UPDATED - Great w/ Galaxy S4 & Galaxy Tab 4 10...         1952          68        2020  2013-01-05                1884
4212                  SkincareCEO  1.00000  1 Star reviews - Micro SDXC card unmounts itse...         1568         126        1694  2013-05-08                1442
3449            NLee the Engineer  5.00000  Top of the class among all (budget-priced) mic...         1428          77        1505  2012-09-26                1351
317       Amazon Customer "Kelly"  1.00000                                Warning, read this!          422          73         495  2012-02-09                 349
3981   R. Sutton, Jr. "RWSynergy"  5.00000  Resolving confusion between "Mobile Ultra" and...          112          27         139  2012-10-22                  85
4596  Tom Henriksen "Doggy Diner"  1.00000     Designed incompatibility/Don't support SanDisk           82          27         109  2012-09-22                  55
1835                  goconfigure  5.00000                                           I own it           60           8          68  2014-02-28                  52
4672                      Twister  5.00000  Super high capacity!!!  Excellent price (on Am...           45           4          49  2014-07-03                  41
4306                Stellar Eller  5.00000                                      Awesome Card!           51          14          65  2012-09-06                  37
315    Amazon Customer "johncrea"  5.00000  Samsung Galaxy Tab2 works with this card if re...           38          10          48  2012-08-13                  28

In the previous output, we identified 10 reviews that we will display based on the metric (up-down).

The “up-down” metric is not preferred for sorting reviews because it only relies on the total count of upvotes or likes a review receives. This metric might not fully reflect the quality of a review. In some cases, a high number of likes on a review does not necessarily mean it is genuinely helpful or insightful. Additionally, some negative reviews could contain valuable information

Average rating

def score_average_rating(up,down):
    if up + down == 0:
        return 0
    return up / (up+down)

df["score_average_rating"] = df.apply(lambda x: score_average_rating(x["helpful_yes"], x["helpful_no"]), axis=1)

This function (score_average_rating(up, down)) calculates an average score using the ratio between the number of likes and dislikes.

df.sort_values("score_average_rating", ascending=False).head(10)

             reviewerName  overall                                            summary  helpful_yes  helpful_no  total_vote  reviewTime  score_pos_neg_diff  score_average_rating
4277                 S. Q.  5.00000                                          Perfect!!            1           0           1  2012-12-19                   1               1.00000
2881            Lou Thomas  5.00000                         Nexus One Loves This Card!            1           0           1  2012-01-10                   1               1.00000
1073            C. Sanchez  5.00000                            Tons of space for phone            1           0           1  2013-08-13                   1               1.00000
445     Apache "Elizabeth"  4.00000                                Amazon Great Prices            1           0           1  2013-12-18                   1               1.00000
3923       Rock Your Roots  5.00000                                  What more to say?            1           0           1  2013-12-30                   1               1.00000
435         Anthony L cate  5.00000                             Love the extra storage            1           0           1  2012-07-24                   1               1.00000
2901                  luis  5.00000                          Awesome and fast  card :)            1           0           1  2013-05-13                   1               1.00000
2204         jbwam "jbwam"  2.00000  Sandisk will replace failures due to bad batch...            1           0           1  2013-06-14                   1               1.00000
2206               JCBiker  5.00000                                         Great card            1           0           1  2013-10-31                   1               1.00000
3408  Neng Vang "Neng2012"  5.00000                                 working no problem            1           0           1  2013-07-25                   1               1.00000

In the previous output, we identified 10 reviews that we will display based on the up-down ratio.

Average rating isn’t also preferred like Up-Down Diff Score, has same issue.

Wilson Lower Bound Score

We’re calling our WLB (Wilson Lower Bound) function.

def wilson_lower_bound(up,down,confidence=0.95):
    n = up + down
    if n == 0:
        return 0
    z = st.norm.ppf(1- (1 - confidence) / 2)
    phat = 1.0 * up / n
    return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n )) / (1 + z * z / n)

We’re applying the function to our dataset.

df["wilson_lower_bound"] = df.apply(lambda x: wilson_lower_bound(x["helpful_yes"], x["helpful_no"]), axis=1)

We’re sorting the first 20 results in ascending order.

df.sort_values("wilson_lower_bound", ascending=False).head(20)

                             reviewerName  overall                                            summary  helpful_yes  helpful_no  total_vote  reviewTime  score_pos_neg_diff  wilson_lower_bound
2031                  Hyoun Kim "Faluzure"  5.00000  UPDATED - Great w/ Galaxy S4 & Galaxy Tab 4 10...         1952          68        2020  2013-01-05                1884             0.95754
3449                     NLee the Engineer  5.00000  Top of the class among all (budget-priced) mic...         1428          77        1505  2012-09-26                1351             0.93652
4212                           SkincareCEO  1.00000  1 Star reviews - Micro SDXC card unmounts itse...         1568         126        1694  2013-05-08                1442             0.91214
317                Amazon Customer "Kelly"  1.00000                                Warning, read this!          422          73         495  2012-02-09                 349             0.81858
4672                               Twister  5.00000  Super high capacity!!!  Excellent price (on Am...           45           4          49  2014-07-03                  41             0.80811
1835                           goconfigure  5.00000                                           I own it           60           8          68  2014-02-28                  52             0.78465
3981            R. Sutton, Jr. "RWSynergy"  5.00000  Resolving confusion between "Mobile Ultra" and...          112          27         139  2012-10-22                  85             0.73214
3807                            R. Heisler  3.00000   Good buy for the money but wait, I had an issue!           22           3          25  2013-02-27                  19             0.70044
4306                         Stellar Eller  5.00000                                      Awesome Card!           51          14          65  2012-09-06                  37             0.67033
4596           Tom Henriksen "Doggy Diner"  1.00000     Designed incompatibility/Don't support SanDisk           82          27         109  2012-09-22                  55             0.66359
315             Amazon Customer "johncrea"  5.00000  Samsung Galaxy Tab2 works with this card if re...           38          10          48  2012-08-13                  28             0.65741
1465                              D. Stein  4.00000                                           Finally.            7           0           7  2014-04-14                   7             0.64567
1609                                Eskimo  5.00000                  Bet you wish you had one of these            7           0           7  2014-03-26                   7             0.64567
4302                             Stayeraug  5.00000                        Perfect with GoPro Black 3+           14           2          16  2014-03-21                  12             0.63977
4072                           sb21 "sb21"  5.00000               Used for my Samsung Galaxy Tab 2 7.0            6           0           6  2012-11-09                   6             0.60967
1072                        Crysis Complex  5.00000               Works wonders for the Galaxy Note 2!            5           0           5  2012-05-10                   5             0.56552
2583                               J. Wong  5.00000                  Works Great with a GoPro 3 Black!            5           0           5  2013-08-06                   5             0.56552
121                                 A. Lee  5.00000                     ready for use on the Galaxy S3            5           0           5  2012-05-09                   5             0.56552
1142  Daniel Pham(Danpham_X @ yahoo.  com)  5.00000                          Great large capacity card            5           0           5  2014-02-04                   5             0.56552
1753                             G. Becker  5.00000                    Use Nothing Other Than the Best            5           0           5  2012-10-22                   5             0.56552

We have identified comments to highlight according to the WLB method. Above, you can see the first 20 of them.

Big thanks to Vahit Keskin and Miuul

Contact me on Linkedin :) yaseminderyadilli

Rating Product & Sorting Reviews in Amazon

Rating Products

Sorting Reviews

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Yasemin Derya Dilli

No responses yet