Course
Web Services Tutorial for Beginners: Learn in 3 Days
Training Summary Web services is a standardized way or medium to propagate communication between the...
You can use the geometric object geom_boxplot() from ggplot2 library to draw a boxplot() in R. Boxplots() in R helps to visualize the distribution of the data by quartile and detect the presence of outliers.
We will use the airquality dataset to introduce boxplot() in R with ggplot. This dataset measures the airquality of New York from May to September 1973. The dataset contains 154 observations. We will use the following variables:
In this tutorial, you will learn
Before you start to create your first boxplot() in R, you need to manipulate the data as follow:
All these steps are done with dplyr and the pipeline operator %>%.
library(dplyr)
library(ggplot2)
# Step 1
data_air <- airquality % > %
#Step 2
select(-c(Solar.R, Temp)) % > %
#Step 3
mutate(Month = factor(Month, order = TRUE, labels = c("May", "June", "July", "August", "September")),
#Step 4
day_cat = factor(ifelse(Day < 10, "Begin", ifelse(Day < 20, "Middle", "End"))))
A good practice is to check the structure of the data with the function glimpse().
glimpse(data_air)
Output:
## Observations: 153 ## Variables: 5 ## $ Ozone <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, ... ## $ Wind <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6... ## $ Month <ord> May, May, May, May, May, May, May, May, May, May, May,... ## $ Day <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,... ## $ day_cat <fctr> Begin, Begin, Begin, Begin, Begin, Begin, Begin, Begi...
There are NA's in the dataset. Removing them is wise.
# Step 5 data_air_nona <-data_air %>% na.omit()
Let's plot the basic R boxplot() with the distribution of ozone by month.
# Store the graph
box_plot <- ggplot(data_air_nona, aes(x = Month, y = Ozone))
# Add the geometric object box plot
box_plot +
geom_boxplot()
Code Explanation
Output:
You can flip the side of the graph.
box_plot + geom_boxplot()+ coord_flip()
Code Explanation
Output:
You can change the color, shape and size of the outliers.
box_plot +
geom_boxplot(outlier.colour = "red",
outlier.shape = 2,
outlier.size = 3) +
theme_classic()
Code Explanation
Output:
You can add a summary statistic to the R boxplot().
box_plot +
geom_boxplot() +
stat_summary(fun.y = mean,
geom = "point",
size = 3,
color = "steelblue") +
theme_classic()
Code Explanation
Output:
In the next horizontal boxplot R, you add the dot plot layers. Each dot represents an observation.
box_plot +
geom_boxplot() +
geom_dotplot(binaxis = 'y',
dotsize = 1,
stackdir = 'center') +
theme_classic()
Code Explanation
Output:
You can change the colors of the group.
ggplot(data_air_nona, aes(x = Month, y = Ozone, color = Month)) +
geom_boxplot() +
theme_classic()
Code Explanation
Output:
It is also possible to add multiple groups. You can visualize the difference in the air quality according to the day of the measure.
ggplot(data_air_nona, aes(Month, Ozone)) +
geom_boxplot(aes(fill = day_cat)) +
theme_classic()
Code Explanation
Output:
Another way to show the dot is with jittered points. It is a convenient way to visualize points with boxplot for categorical data in R variable.
This method avoids the overlapping of the discrete data.
box_plot +
geom_boxplot() +
geom_jitter(shape = 15,
color = "steelblue",
position = position_jitter(width = 0.21)) +
theme_classic()
Code Explanation
Output:
You can see the difference between the first graph with the jitter method and the second with the point method.
box_plot +
geom_boxplot() +
geom_point(shape = 5,
color = "steelblue") +
theme_classic()
An interesting feature of geom_boxplot(), is a notched boxplot function in R. The notch plot narrows the box around the median. The main purpose of a notched box plot is to compare the significance of the median between groups. There is strong evidence two groups have different medians when the notches do not overlap. A notch is computed as follow:
with is the interquartile and number of observations.
box_plot +
geom_boxplot(notch = TRUE) +
theme_classic()
Code Explanation
Output:
We can summarize the different types of horizontal boxplot R in the table below:
Objective | Code |
|---|---|
Basic box plot | ggplot(df, aes( x = x1, y =y)) + geom_boxplot() |
flip the side | ggplot(df, aes( x = x1, y =y)) + geom_boxplot() + coord_flip() |
Notched box plot | ggplot(df, aes( x = x1, y =y)) + geom_boxplot(notch=TRUE) |
Box plot with jittered dots | ggplot(df, aes( x = x1, y =y)) + geom_boxplot() + geom_jitter(position = position_jitter(0.21)) |
Training Summary Web services is a standardized way or medium to propagate communication between the...
CCleaner is a utility software that clears your online tracks, frees up space, and helps you...
Although Cassandra query language resembles with SQL language, their data modelling methods are...
What is SAP Logon? SAP Logon is used initiate a user session in a desired SAP Server. The same SAP...
What is DataStage? Datastage is an ETL tool which extracts data, transform and load data from...
A free VPN is a software that gives you access to a VPN server network, along with the necessary...