boxplot() in R: How to Make BoxPlots Learn with Example

You can use the geometric object geom_boxplot() from ggplot2 library to draw a boxplot() in R. Boxplots() in R helps to visualize the distribution of the data by quartile and detect the presence of outliers.

We will use the airquality dataset to introduce boxplot() in R with ggplot. This dataset measures the airquality of New York from May to September 1973. The dataset contains 154 observations. We will use the following variables:

In this tutorial, you will learn

Create Box Plot

Before you start to create your first boxplot() in R, you need to manipulate the data as follow:

All these steps are done with dplyr and the pipeline operator %>%.

library(dplyr)
library(ggplot2)
# Step 1
data_air <- airquality % > %

#Step 2
select(-c(Solar.R, Temp)) % > %

#Step 3
mutate(Month = factor(Month, order = TRUE, labels = c("May", "June", "July", "August", "September")), 
       
#Step 4 
day_cat = factor(ifelse(Day < 10, "Begin", ifelse(Day < 20, "Middle", "End"))))

A good practice is to check the structure of the data with the function glimpse().

glimpse(data_air)

Output:

## Observations: 153
## Variables: 5
## $ Ozone   <int> 41, 36, 12, 18, NA, 28, 23, 19, 8, NA, 7, 16, 11, 14, ...
## $ Wind    <dbl> 7.4, 8.0, 12.6, 11.5, 14.3, 14.9, 8.6, 13.8, 20.1, 8.6...
## $ Month   <ord> May, May, May, May, May, May, May, May, May, May, May,...
## $ Day     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
## $ day_cat <fctr> Begin, Begin, Begin, Begin, Begin, Begin, Begin, Begi...

There are NA's in the dataset. Removing them is wise.

# Step 5
data_air_nona <-data_air %>% na.omit()	

Basic box plot

Let's plot the basic R boxplot() with the distribution of ozone by month.

# Store the graph
box_plot <- ggplot(data_air_nona, aes(x = Month, y = Ozone))
# Add the geometric object box plot
box_plot +
    geom_boxplot()

Code Explanation

Output:

Change side of the graph

You can flip the side of the graph.

box_plot +
  geom_boxplot()+
  coord_flip()

Code Explanation

Output:

Change color of outlier

You can change the color, shape and size of the outliers.

box_plot +
    geom_boxplot(outlier.colour = "red",
        outlier.shape = 2,
        outlier.size = 3) +
    theme_classic()

Code Explanation

Output:

Add a summary statistic

You can add a summary statistic to the R boxplot().

box_plot +
    geom_boxplot() +
    stat_summary(fun.y = mean,
        geom = "point",
        size = 3,
        color = "steelblue") +
    theme_classic()

Code Explanation

Output:

Box Plot with Dots

In the next horizontal boxplot R, you add the dot plot layers. Each dot represents an observation.

box_plot +
    geom_boxplot() +
    geom_dotplot(binaxis = 'y',
        dotsize = 1,
        stackdir = 'center') +
    theme_classic()

Code Explanation

Output:

Control Aesthetic of the Box Plot

Change the color of the box

You can change the colors of the group.

ggplot(data_air_nona, aes(x = Month, y = Ozone, color = Month)) +
    geom_boxplot() +
    theme_classic()

Code Explanation

Output:

Box plot with multiple groups

It is also possible to add multiple groups. You can visualize the difference in the air quality according to the day of the measure.

ggplot(data_air_nona, aes(Month, Ozone)) +
    geom_boxplot(aes(fill = day_cat)) +
    theme_classic()

Code Explanation

Output:

Box Plot with Jittered Dots

Another way to show the dot is with jittered points. It is a convenient way to visualize points with boxplot for categorical data in R variable.

This method avoids the overlapping of the discrete data.

box_plot +
    geom_boxplot() +
    geom_jitter(shape = 15,
        color = "steelblue",
        position = position_jitter(width = 0.21)) +
    theme_classic()

Code Explanation

Output:

You can see the difference between the first graph with the jitter method and the second with the point method.

box_plot +
    geom_boxplot() +
    geom_point(shape = 5,
        color = "steelblue") +
    theme_classic()

Notched Box Plot

An interesting feature of geom_boxplot(), is a notched boxplot function in R. The notch plot narrows the box around the median. The main purpose of a notched box plot is to compare the significance of the median between groups. There is strong evidence two groups have different medians when the notches do not overlap. A notch is computed as follow:

with is the interquartile and number of observations.

box_plot +
    geom_boxplot(notch = TRUE) +
    theme_classic()

Code Explanation

Output:

Summary

We can summarize the different types of horizontal boxplot R in the table below:

Objective

Code

Basic box plot

ggplot(df, aes( x = x1, y =y)) + geom_boxplot()

flip the side

ggplot(df, aes( x = x1, y =y)) + geom_boxplot() + coord_flip()

Notched box plot

ggplot(df, aes( x = x1, y =y)) + geom_boxplot(notch=TRUE)

Box plot with jittered dots

ggplot(df, aes( x = x1, y =y)) + geom_boxplot() + geom_jitter(position = position_jitter(0.21))

 

YOU MIGHT LIKE: