In this post I intend to use the concept of covariation and show you how to do a scatter plot in R.
In this post we will analyse two continuous variables in the same plot, this is a very interesting way to understand the relationship between variables.
In the last post, we talked about variation and in this one we are going to talk about covariation!
If the variation described the behaviour of a variable, covariation describes the behaviour between the variables.
Covariation is the tendency of values of two or more variables vary together.
A Scatter Plot shows the relationship between two continuous
variables. We can build one in R, using the function
geom_point() from the ggplot2() package.
Since you have to load tidyverse to make it work in your
R:
library(tidyverse)

The code:
library(tidyverse)
library(ggthemes)
iris %>%
ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point() +
labs(
title = "Mesuments in centimeters of flowers from 3 species",
subtitle = "by: Guilherme Bastos Gomes",
caption = "Source: Edgar Anderson's Iris Data set",
x = "Sepal Length",
y = "Sepal Width"
) +
theme_classic()
Check how this plot gives us information about the two variables
together, again with the categorical variable
(Species):
Therefore we can start to understand how these numbers behaves inside their groups, from this point the next step would be to model, but let’s continue working with Scatter Plots for now, though they are really powerful, mainly when we are dealing with pattern.
Patterns in data provides us evidence regarding variable relationship, facing a pattern we should think like it was written in the “R For Data Science” book:
In this kind of data there are groups defined by a categorical
variable: Species, then we are able to do the following to
proceed with the analysis of the relationship between length and width
of the Petals:

iris %>%
ggplot(aes(x = Petal.Length, y = Petal.Width, color = Species)) +
geom_point() +
labs(
title = "Mesuments in centimeters of flowers from 3 species",
subtitle = "by: Guilherme Bastos Gomes",
caption = "Source: Edgar Anderson's Iris Data set",
x = "Petal Length",
y = "Petal Width"
) +
theme_classic() +
facet_grid(~Species)
Here we are using the function facet_grid() to divide
our graph into the groups defined by the variable
Species.

Here we are analyzing the distribution between the relationship of the variables from each group. Could we somehow model these relationships? We will in the future!
For now, I needed to install the package hexbin to do
the next plot:
install.packages("hexbin")
Then:
library(hexbin)
iris %>%
ggplot(aes(x = Petal.Length, y = Petal.Width)) +
geom_hex() +
labs(
title = "The effect of vitamin C in tooth growth of Guinea pigs",
subtitle = "by: Guilherme Bastos Gomes",
caption = "Source: ToothGrowth data set",
x = "Supplement",
y = "Length"
) +
facet_grid(~Species)
Observing several Scatter Plots of your data can be a very powerful tool to understand and explain relationships.
Function pairs() helps us plotting several relations at
the same time:

To add groups in this kind of plot, we should first define a vector with the desired colours:
colours <- c("#0000FF","#00FF00","#FF0000")
pairs(iris[,1:4], pch = 19, cex = 0.5,
col = colours[iris$Species],
lower.panel=NULL)

From this moment on we can start talking about correlation, but this one concept I am going to leave for a future post!
Thank you!