Hello!

In this post I am explaining how to perform a hypothesis test using R, and we are also going to understand what a p-value is.

ToothGrowth Data set

In this built in R data set (remember to call tidyverse) called ToothGrowth we can find a total of 60 observations (rows) divided into 3 variables (columns):

len (Odontoblast length)
supp (Supplement applied)
dose

This data was obtained via a research, which I’ve explained about with more details in this post.

Generally it demonstrates the effect of applied vitamin C in odontoblasts (cells responsible for teeth growing) from guinea pigs.

ggplot(ToothGrowth) +
  geom_freqpoly(aes(x = len, color = dose), binwidth = 1) +
  theme_classic()

ggplot(ToothGrowth) + geom_freqpoly(aes(x = len, color = dose), binwidth = 1) + theme_classic()

It seems like there is a clear influence of the vitamin dose applied in the guinea pigs and their odontoblasts sizes, but we need to be skepticals and obtain some evidences over what are we claiming before taking this hypothesis under consideration.

First we look at the means

To start understanding the relation of the vitamin doses and tooth growth, we can observe and compare the means of the groups using summarise() from dplyr package.

ToothGrowth %>%
  group_by(dose) %>%
  summarise(mean = mean(len))

# A tibble: 3 × 2
  dose   mean
  <fct> <dbl>
1 0.5    10.6
2 1      19.7
3 2      26.1

It seems like there is a difference between the mean of each dose of vitamin applied, let’s also analyse the difference between means:

observable_diff1 <- 19.735 - 10.605
observable_diff2 <- 26.100  - 19.735
observable_diff3 <- 26.100  - 10.605

Until this point we have already answered some questions, like:

Would the teeth of rats, that received a higher dose of vitamin C, be frequently larger?

#Another form of obtaining variation per dose

dose_05 <- ToothGrowth %>%
              filter(dose == 0.5)

dose_1 <- ToothGrowth %>%
              filter(dose == 1)
              
dose_2 <- ToothGrowth %>%
              filter(dose == 2)

We can check if this information is TRUE orFALSE` with the following code:

#Inside of observable_diff1 we have the difference between the 0.5 and 1 doses

mean(dose_1$len) - mean(dose_05$len) == observable_diff1

[1] TRUE

Which are the largest odontoblasts from each group? And the smallest?

ToothGrowth %>%
  group_by(dose) %>%
  summarise(min = min(len), max = max(len))

# A tibble: 3 × 3
  dose    min   max
  <fct> <dbl> <dbl>
1 0.5     4.2  21.5
2 1      13.6  27.3
3 2      18.5  33.9

So this is it, it really seems we could think that the dose of Vitamin C has the direct effect over teeth growth, well actually not… what really happens is that if we repeat the experiment with different rat groups, we would probably have obtained a different answer from the one we obtained previously.

For this reason this variables are known as random variables, because they can take several different values.

That means, every time we repeat the experiment, we will have different means to the groups we have access to (dose), and by being so the variable len is considered random.

Let’s have a better comprehension about random variables.

Random Variables

Let’s understand how the means change when we collect different samples from our data set:

control <- ToothGrowth$len %>% sample(12)

We can do this repeatedly to build a completely random distribution based on our main data (mainly because we don’t have the means to repeat the real experiment, so we model what would be the population without doses of vitamin C being applied to them).

Please notice how the mean change in each sample, repeat the command of sampling in your R using sample().

mean_control <- ToothGrowth$len %>% sample(12) %>% mean()

print(media_controle)

mean_control <- ToothGrowth$len %>% sample(12) %>% mean()

print(mean_control)

[1] 15.40833

[1] 16.31667

Notice how the mean varies in each measurement, we can do this many times to learn even more about the distribution of this random variable.

As previously said we have to be skeptical, for that we will assume that randomly the observed difference could not mean much, it could be something that happened by chance.

To obtain evidences and minimize this uncertainty, we will use what’s called the Null hypothesis.

Null hypothesis

Let’s analyse again the observed differences from before. Since we need to be skeptical, we have to question ourselves:

How can we affirm that the observed differences are happening because of the applied doses?
What would happen if we gave all the rats the same dose?
Would we observe differences as large as these ones?

In this moment we are working with what is called the skeptic mindset, or the Null hypothesis, which name comes to remind us that we are acting as skeptical: we give credit to the possibility that our observations could just have happened by chance.

Null, because we are questioning what would happen to the observable difference in case there were no application in the experiment, that is in case nothing have happened, what would have happened?

Building a control group to test the Null hypothesis

We can check, as many times as we would like, the differences between teeth sizes without comparing doses. We can do that by sampling a random number of rats from our original data set, then we can record the value of the differences between means of two random groups, this script comes from prof. Rafa, in the book “Data analysis for the life sciences”:

mean_control <- ToothGrowth$len %>% sample(12) %>% mean()

mean_treatment <- ToothGrowth$len %>% sample(12) %>% mean()

print(mean_treatment - mean_control)

It’s also possible to do this many times using loops in R:

n <- 10000 
null <- vector("numeric",n) #first we create a vector with 10000 spaces
for (i in 1:n) { #looping to generate a null vector
mean_control <- ToothGrowth$len %>% sample(12) %>% mean()
mean_treatment <- ToothGrowth$len %>% sample(12) %>% mean()
null[i] <- mean(mean_treatment) - mean(mean_control)
}

print(null)

The values in null are what we call a null distribution.

What percentage of null is equal or higher than the observed differences?

mean(null >= observable_diff1)

mean(null >= observable_diff2)

mean(null >= observable_diff3)

[1] 2e-04

[1] 0.0109

[1] 0

Interpreting results

Only a minority percentage within our 10.000 simulations are equal or higher than the observed differences in the null hypothesis. As skeptics what do we conclude? That when there’s no dose effect, we can observe a difference as big as the one observed before only 0,0001% of the times (in maximum, as could see in observable_diff1). This value is known as p-value.

P-value

We performed a simulation to build the null data set, let’s now observe the the null distribution:

hist(null, freq=TRUE)

We can also see where are the values as high as our observations:

hist(null, freq=TRUE)

abline(v=observable_diff1, col="red", lwd=2)

abline(v=observable_diff2, col="blue", lwd=2)

abline(v=observable_diff3, col="yellow", lwd=2)

Values higher than those ones we have observed are relatively rare. This gives us evidence to support our hypothesis, that the amount of dose of vitamin c applied in guinea pigs affects the size of odontoblasts.

It was possible to observe that the top difference, between the doses 0.5 and 2, didn’t even appeared in our null distribution plot, showing us that the observable difference couldn’t have happened entirely randomly, and that the null hypothesis can be overthrown.

But even so we should always be careful!

Creating a control group using a filter

To create a control group with only one dose, we can filter the data set:

ToothGrowth_dot5 <- ToothGrowth %>%
  filter(dose == 0.5)

Now we can create our data sets to build the null hypothesis:

control_ToothGrowth_dot5 <- ToothGrowth_dot5$len %>% sample(12) %>% mean()

treatment_ToothGrowth_dot5 <- ToothGrowth_dot5$len %>% sample(12) %>% mean()

print(treatment_ToothGrowth_dot5 - control_ToothGrowth_dot5)

[1] 0.9333333

Here we have repeated 10.000 times the differences between means to create a null distribution:

n <- 10000 
null_dose <- vector("numeric",n)
for (i in 1:n) { 
controle_ToothGrowth_dot5 <- ToothGrowth_dot5$len %>% sample(12) %>% mean()
tratamento_ToothGrowth_dot5 <- ToothGrowth_dot5$len %>% sample(12) %>% mean()
null_dose[i] <- mean(tratamento_ToothGrowth_dot5) - mean(controle_ToothGrowth_dot5)
}

print(null)

hist(null_dose, freq=TRUE)

Another very interesting way to compare our observed difference with the null hypothesis is by using the infer package.

Using the infer() package

The infer package helps us to test hypothesis in a pretty fast way, install the package and load it:

install.packages("infer")

library(infer)

To calculate the observed statistic in our data set:

F_hat <- ToothGrowth %>% 
  specify(len ~ dose) %>%
  calculate(stat = "F")

Generating a null distribution:

null_dist <- ToothGrowth %>% 
   specify(len ~ dose) %>%
   hypothesize(null = "independence") %>%
   generate(reps = 1000, type = "permute") %>%
   calculate(stat = "F")

Visualising the observable difference with the calculated null distribution:

visualize(null_dist) +
  shade_p_value(obs_stat = F_hat, direction = "greater")

We are going to further analyse a data set with SmartPhones from India in the next post, see you!

Hypothesis test and P-value