Hello!

In this post we will continue to explore the data about smartphones sold in India, what else could be interesting about this data set?

EDA

This data set that we will use was created during last analysis, where I used the janitor package to clean and organise the columns. Know more about that in this post here!

To start let’s take a glimpse in the data set:

glimpse(smartphones_sold)

Rows: 1,513
Columns: 11
$ product_name        <chr> "XOLO T1000 (Black, 4 GB)", "GIONEE Pion…
$ product_url         <chr> "https://www.flipkart.com/xolo-t1000-bla…
$ brand               <chr> "XOLO", "GIONEE", "KARBONN", "KARBONN", …
$ sale_price          <dbl> 14153, 6500, 13298, 14990, 6499, 13604, …
$ mrp                 <dbl> 14153, 6500, 13298, 14990, 7499, 13604, …
$ discount_percentage <dbl> 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, …
$ number_of_ratings   <dbl> 333, 437, 28, 28, 61, 104, 105, 55, 450,…
$ number_of_reviews   <dbl> 130, 78, 7, 7, 8, 35, 24, 8, 146, 17, 74…
$ upc                 <chr> "MOBDMKDAKQGCYZ6D", "MOBDRKHTA3UXHAVD", …
$ star_rating         <dbl> 3.8, 3.6, 3.3, 3.3, 3.1, 4.3, 3.4, 3.4, …
$ ram                 <chr> "1 GB", "512 MB", "1 GB", "1 GB", "512 M…

Some questions to understand the data set

Which is the most evaluated smartphone between all of them? And the least?
What is the highest “rating” between devices?
Which company has dominated the indian market of influence of smartphones in that year?

Answering the questions

To respond to these questions we can use dplyr’s tools, so remember to call the library(tidyverse) to activate the library.

As our data set is kind of untidy, we will use the janitor package again to tidy things up

library(janitor)
smartphones_sold <- clean_names(smartphones_sold)

Now that everything is set, we can start to analyse some plots to answer our questions.

But first, let’s tackle our first question by using simple commands in R:

 max(smartphones_sold$number_of_ratings)
 min(smartphones_sold$number_of_ratings)
 
#Which was the most rated one?

smartphones_sold %>%
  filter(number_of_ratings == 1340123)

[1] 1340123

[1] 0

# A tibble: 3 × 11
  product_name     product_url brand sale_price   mrp discount_percen…
  <chr>            <chr>       <chr>      <dbl> <dbl>            <dbl>
1 Redmi Note 4 (G… https://ww… Redmi      11400 11400                0
2 Redmi Note 4 (D… https://ww… Redmi      10490 11490                8
3 Redmi Note 4 (L… https://ww… Redmi      10490 10490                0
# … with 5 more variables: number_of_ratings <dbl>,
#   number_of_reviews <dbl>, upc <chr>, star_rating <dbl>, ram <chr>

What is there so interesting about Redmi Note 4 in india?

It seems like the smartphone overcome the expectancies and became quite common around there, interesting huh? What we can learn from the design or stategies related to this product? Would it be possible to use these same techniques again?

Probably yes, it would be interesting to comprehend the business strategy around the device and learn something from it.

For now, let’s continue our analysis to understand who are the main Redmi competitors in the influency market of smartphones from india.

In this plot we can perceive something, there are many more ratings so that we can visualize the discrepancy between those that are most rated.

library(ggthemes)

red_out_smartphones_sold <- smartphones_sold %>%
  group_by(brand) %>%
  arrange(desc(number_of_ratings)) %>%
  slice(1)
 
ggplot(data = red_out_smartphones_sold, 
                                   aes(x = fct_reorder(brand, number_of_ratings), 
                                       y = number_of_ratings, 
                                       size = number_of_ratings,
                                       color = factor(ram))
                                   ) +
geom_point() +
  theme_classic() +
  theme(legend.position= "bottom", 
      panel.grid = element_blank(),
      axis.text = element_blank()
) +
  geom_point(colour = "pink", size = 1)

And of course, if we would like to know which are the Top 10 we can simply use the function head() combined with arrange(desc()):


red_out_smartphones_sold %>%
  select(product_name, brand, number_of_ratings) %>%
  arrange(desc(number_of_ratings)) %>%
  head(10)

# A tibble: 10 × 3
# Groups:   brand [10]
   product_name                              brand    number_of_ratin…
   <chr>                                     <chr>               <dbl>
 1 Redmi Note 4 (Gold, 64 GB)                Redmi             1340123
 2 realme C2 (Diamond Blue, 16 GB)           realme             901941
 3 Honor 9 Lite (Midnight Black, 32 GB)      Honor              475790
 4 Mi A1 (Black, 64 GB)                      Mi                 471046
 5 Moto C Plus (Pearl White, 16 GB)          Motorola           365212
 6 POCO M2 (Brick Red, 128 GB)               POCO               348171
 7 Lenovo K8 Plus (Fine Gold, 32 GB)         Lenovo             307215
 8 ASUS Zenfone Max Pro M1 (Blue, 64 GB)     ASUS               262956
 9 SAMSUNG Galaxy F41 (Fusion Green, 128 GB) SAMSUNG            249339
10 ViVO Z1Pro (Sonic Black, 128 GB)          ViVO               158909

Now we know which were the 10 most influential companies in the market from 2017 in India.

An interesting way of showing these info graphs is by using the waffle plot, a.k.a squared pie chart.

Waffle Plot

To start, let’s install the waffle package:

install.packages("waffle")
library(waffle)

We will need a vector so that waffle() function works:

To draw this info graph we can use the following commands:


library("waffle")

most_evaluated_10 <- red_out_smartphones_sold %>%
  select(product_name, brand, number_of_ratings) %>%
  arrange(desc(number_of_ratings)) %>%
  head(10)

#Creating a vector with the values
vec_most_10 <- c(Redmi = 1340123, realme = 901941, Honor = 475790, Mi = 471046, Motorola = 365212, POCO = 348171, Lenovo = 307215, ASUS = 262956, SAMSUNG = 249339,  ViVO = 158909)

#Normalising the vector so we can obtain related values
norm_vec_most_10 <- vec_most_10 / 4880702 * 100

#Drawing the plot
waffle(norm_vec_most_10, rows = 4,
       colors = c("firebrick1", "deepskyblue", "darkorchid", "gray1", "chartreuse2",
                  "gold", "firebrick", "goldenrod2", "hotpink4", "deeppink3"),
       legend_pos = "bottom",
       xlab = "Total Ratings for smartphones in India - 2017")

Info graphs are really useful to show fast information with clarity. Here for example we can understand how influent was Redmi in comparison to other companies in the indian market.

Hope you liked it, see you on the next one!

Waffle Plot (EN-AU)

Hello!

EDA

Some questions to understand the data set

Answering the questions

Waffle Plot

Follow me on Twitter: @gimbgomes