Bar Chart & Histogram in R (with Example)

A bar chart is a great way to display categorical variables in the x-axis. This type of graph denotes two aspects in the y-axis.

  1. The first one counts the number of occurrence between groups.
  2. The second one shows a summary statistic (min, max, average, and so on) of a variable in the y-axis.

You will use the mtcars dataset with has the following variables:

In this tutorial, you will learn

How to create Bar Chart

To create graph in R, you can use the library ggplot which creates ready-for-publication graphs. The basic syntax of this library is:

ggplot(data, mapping = aes()) +
geometric object 

arguments: 
data: dataset used to plot the graph 
mapping: Control the x and y-axis 
geometric object: The type of plot you want to show. The most common objects are:

- Point: `geom_point()`
- Bar: `geom_bar()`
- Line: `geom_line()`
- Histogram: `geom_histogram()` 

In this tutorial, you are interested in the geometric object geom_bar() that create the bar chart.

Bar chart: count

Your first graph shows the frequency of cylinder with geom_bar(). The code below is the most basic syntax.

library(ggplot2)
# Most basic bar chart
ggplot(mtcars, aes(x = factor(cyl))) +
    geom_bar()

Code Explanation

Output:

Note: make sure you convert the variables into a factor otherwise R treats the variables as numeric. See the example below.

Customize the graph

Four arguments can be passed to customize the graph:

- `stat`: Control the type of formatting. By default, `bin` to plot a count in the y-axis. For continuous value, pass `stat = "identity"`
- `alpha`: Control density of the color
- `fill`: Change the color of the bar
- `size`: Control the size the bar	

Change the color of the bars

You can change the color of the bars. Note that the colors of the bars are all similar.

# Change the color of the bars
ggplot(mtcars, aes(x = factor(cyl))) +
    geom_bar(fill = "coral") +
    theme_classic()

Code Explanation

Output:

You can use this code:

grDevices::colors()		

to see all the colors available in R. There are around 650 colors.

Change the intensity

You can increase or decrease the intensity of the bars' color

# Change intensity
ggplot(mtcars,
        aes(factor(cyl))) +
    geom_bar(fill = "coral",
        alpha = 0.5) +
    theme_classic()

Code Explanation

Output:

Color by groups

You can change the colors of the bars, meaning one different color for each group. For instance, cyl variable has three levels, then you can plot the bar chart with three colors.

# Color by group
ggplot(mtcars, aes(factor(cyl),
        fill = factor(cyl))) +
    geom_bar()

Code Explanation

Output:

Add a group in the bars

You can further split the y-axis based on another factor level. For instance, you can count the number of automatic and manual transmission based on the cylinder type.

You will proceed as follow:

library(dplyr)
# Step 1
data <- mtcars % > % 
#Step 2
mutate(am = factor(am, labels = c("auto", "man")),
    cyl = factor(cyl))

You have the dataset ready, you can plot the graph;

# Step 3

ggplot(data, aes(x = cyl, fill = am)) +
    geom_bar() +
    theme_classic()

Code Explanation

Output:

The mapping will fill the bar with two colors, one for each level. It is effortless to change the group by choosing other factor variables in the dataset.

Bar chart in percentage

You can visualize the bar in percentage instead of the raw count.

# Bar chart in percentage

ggplot(data, aes(x = cyl, fill = am)) +
    geom_bar(position = "fill") +
    theme_classic()

Code Explanation

Output:

Side by side bars

It is easy to plot the bar chart with the group variable side by side.

# Bar chart side by side
ggplot(data, aes(x = cyl, fill = am)) +
    geom_bar(position = position_dodge()) +
    theme_classic()

Code Explanation

Output:

Histogram

In the second part of the bar chart tutorial, you can represent the group of variables with values in the y-axis.

Your objective is to create a graph with the average mile per gallon for each type of cylinder. To draw an informative graph, you will follow these steps:

Step 1) Create a new variable

You create a data frame named data_histogram which simply returns the average miles per gallon by the number of cylinders in the car. You call this new variable mean_mpg, and you round the mean with two decimals.

# Step 1

data_histogram <- mtcars % > %
mutate(cyl = factor(cyl)) % > %
group_by(cyl) % > %
summarize(mean_mpg = round(mean(mpg), 2))

Step 2) Create a basic histogram

You can plot the histogram. It is not ready to communicate to be delivered to client but gives us an intuition about the trend.

ggplot(data_histogram, aes(x = cyl, y = mean_mpg)) +
    geom_bar(stat = "identity")

Code Explanation

Output:

Step 3) Change the orientation

You change the orientation of the graph from vertical to horizontal.

ggplot(data_histogram, aes(x = cyl, y = mean_mpg)) +
    geom_bar(stat = "identity") +
    coord_flip()

Code Explanation

Output:

Step 4) Change the color

You can differentiate the colors of the bars according to the factor level of the x-axis variable.

ggplot(data_histogram, aes(x = cyl, y = mean_mpg, fill = cyl)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    theme_classic()

Code Explanation

Output:

Step 5) Change the size

To make the graph looks prettier, you reduce the width of the bar.

graph <- ggplot(data_histogram, aes(x = cyl, y = mean_mpg, fill = cyl)) +
    geom_bar(stat = "identity",
        width = 0.5) +
    coord_flip() +
    theme_classic()

Code Explanation

Output:

Step 6) Add labels to the graph

The last step consists to add the value of the variable mean_mpg in the label.

graph +
    geom_text(aes(label = mean_mpg),
        hjust = 1.5,
        color = "white",
        size = 3) +
    theme_classic()

Code Explanation

Output:

Summary

A bar chart is useful when the x-axis is a categorical variable. The y-axis can be either a count or a summary statistic. The table below summarizes how to control bar chart with ggplot2:

Objective

code

Count

ggplot(df, eas(x= factor(x1)) + geom_bar()

Count with different color of fill

ggplot(df, eas(x= factor(x1), fill = factor(x1))) + geom_bar()

Count with groups, stacked

ggplot(df, eas(x= factor(x1), fill = factor(x2))) + geom_bar(position=position_dodge())

Count with groups, side by side

ggplot(df, eas(x= factor(x1), fill = factor(x2))) + geom_bar()

Count with groups, stacked in %

ggplot(df, eas(x= factor(x1), fill = factor(x2))) + geom_bar(position=position_dodge())

Values

ggplot(df, eas(x= factor(x1)+ y = x2) + geom_bar(stat="identity")

 

YOU MIGHT LIKE: