SDLC
15 BEST iPhone Data Recovery Software (2021)
iPhone Recovery Software are applications that allow you to bring back your lost data from...
A bar chart is a great way to display categorical variables in the x-axis. This type of graph denotes two aspects in the y-axis.
You will use the mtcars dataset with has the following variables:
In this tutorial, you will learn
To create graph in R, you can use the library ggplot which creates ready-for-publication graphs. The basic syntax of this library is:
ggplot(data, mapping = aes()) + geometric object arguments: data: dataset used to plot the graph mapping: Control the x and y-axis geometric object: The type of plot you want to show. The most common objects are: - Point: `geom_point()` - Bar: `geom_bar()` - Line: `geom_line()` - Histogram: `geom_histogram()`
In this tutorial, you are interested in the geometric object geom_bar() that create the bar chart.
Your first graph shows the frequency of cylinder with geom_bar(). The code below is the most basic syntax.
library(ggplot2)
# Most basic bar chart
ggplot(mtcars, aes(x = factor(cyl))) +
geom_bar()
Code Explanation
Output:
Note: make sure you convert the variables into a factor otherwise R treats the variables as numeric. See the example below.
Four arguments can be passed to customize the graph:
- `stat`: Control the type of formatting. By default, `bin` to plot a count in the y-axis. For continuous value, pass `stat = "identity"` - `alpha`: Control density of the color - `fill`: Change the color of the bar - `size`: Control the size the bar
You can change the color of the bars. Note that the colors of the bars are all similar.
# Change the color of the bars
ggplot(mtcars, aes(x = factor(cyl))) +
geom_bar(fill = "coral") +
theme_classic()
Code Explanation
Output:
You can use this code:
grDevices::colors()
to see all the colors available in R. There are around 650 colors.
You can increase or decrease the intensity of the bars' color
# Change intensity
ggplot(mtcars,
aes(factor(cyl))) +
geom_bar(fill = "coral",
alpha = 0.5) +
theme_classic()
Code Explanation
Output:
You can change the colors of the bars, meaning one different color for each group. For instance, cyl variable has three levels, then you can plot the bar chart with three colors.
# Color by group
ggplot(mtcars, aes(factor(cyl),
fill = factor(cyl))) +
geom_bar()
Code Explanation
Output:
You can further split the y-axis based on another factor level. For instance, you can count the number of automatic and manual transmission based on the cylinder type.
You will proceed as follow:
library(dplyr)
# Step 1
data <- mtcars % > %
#Step 2
mutate(am = factor(am, labels = c("auto", "man")),
cyl = factor(cyl))
You have the dataset ready, you can plot the graph;
# Step 3
ggplot(data, aes(x = cyl, fill = am)) +
geom_bar() +
theme_classic()
Code Explanation
Output:
The mapping will fill the bar with two colors, one for each level. It is effortless to change the group by choosing other factor variables in the dataset.
You can visualize the bar in percentage instead of the raw count.
# Bar chart in percentage
ggplot(data, aes(x = cyl, fill = am)) +
geom_bar(position = "fill") +
theme_classic()
Code Explanation
Output:
It is easy to plot the bar chart with the group variable side by side.
# Bar chart side by side
ggplot(data, aes(x = cyl, fill = am)) +
geom_bar(position = position_dodge()) +
theme_classic()
Code Explanation
Output:
In the second part of the bar chart tutorial, you can represent the group of variables with values in the y-axis.
Your objective is to create a graph with the average mile per gallon for each type of cylinder. To draw an informative graph, you will follow these steps:
Step 1) Create a new variable
You create a data frame named data_histogram which simply returns the average miles per gallon by the number of cylinders in the car. You call this new variable mean_mpg, and you round the mean with two decimals.
# Step 1
data_histogram <- mtcars % > % mutate(cyl = factor(cyl)) % > % group_by(cyl) % > % summarize(mean_mpg = round(mean(mpg), 2))
Step 2) Create a basic histogram
You can plot the histogram. It is not ready to communicate to be delivered to client but gives us an intuition about the trend.
ggplot(data_histogram, aes(x = cyl, y = mean_mpg)) +
geom_bar(stat = "identity")
Code Explanation
Output:
Step 3) Change the orientation
You change the orientation of the graph from vertical to horizontal.
ggplot(data_histogram, aes(x = cyl, y = mean_mpg)) +
geom_bar(stat = "identity") +
coord_flip()
Code Explanation
Output:
Step 4) Change the color
You can differentiate the colors of the bars according to the factor level of the x-axis variable.
ggplot(data_histogram, aes(x = cyl, y = mean_mpg, fill = cyl)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_classic()
Code Explanation
Output:
Step 5) Change the size
To make the graph looks prettier, you reduce the width of the bar.
graph <- ggplot(data_histogram, aes(x = cyl, y = mean_mpg, fill = cyl)) +
geom_bar(stat = "identity",
width = 0.5) +
coord_flip() +
theme_classic()
Code Explanation
Output:
Step 6) Add labels to the graph
The last step consists to add the value of the variable mean_mpg in the label.
graph +
geom_text(aes(label = mean_mpg),
hjust = 1.5,
color = "white",
size = 3) +
theme_classic()
Code Explanation
Output:
A bar chart is useful when the x-axis is a categorical variable. The y-axis can be either a count or a summary statistic. The table below summarizes how to control bar chart with ggplot2:
Objective | code |
|---|---|
Count | ggplot(df, eas(x= factor(x1)) + geom_bar() |
Count with different color of fill | ggplot(df, eas(x= factor(x1), fill = factor(x1))) + geom_bar() |
Count with groups, stacked | ggplot(df, eas(x= factor(x1), fill = factor(x2))) + geom_bar(position=position_dodge()) |
Count with groups, side by side | ggplot(df, eas(x= factor(x1), fill = factor(x2))) + geom_bar() |
Count with groups, stacked in % | ggplot(df, eas(x= factor(x1), fill = factor(x2))) + geom_bar(position=position_dodge()) |
Values | ggplot(df, eas(x= factor(x1)+ y = x2) + geom_bar(stat="identity") |
iPhone Recovery Software are applications that allow you to bring back your lost data from...
Data visualization tools are cloud-based applications that help you to represent raw data in easy...
In this tutorial, you will learn- What is a Computing Environment? What is a Variable? What are...
What is CISC? CISC was developed to make compiler development easier and simpler. The full form of...
New Relic's is a leading tool for application performance monitoring (APM). It offers real-time...
What is Distributed Testing? Distributed Testing is a kind of testing which use multiple systems to...