VBScript
VBScript Functions & Procedures with Example
In this tutorial, you will learn- VBScript Procedures Types of Procedures in VBScript Sub...
Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels. Factor is mostly used in Statistical Modeling and exploratory data analysis with R.
In a dataset, we can distinguish two types of variables: categorical and continuous.
Categorical variables in R are stored into a factor. Let's check the code below to convert a character variable into a factor variable in R. Characters are not supported in machine learning algorithm, and the only way is to convert a string to an integer.
Syntax
factor(x = character(), levels, labels = levels, ordered = is.ordered(x))
Arguments:
Example:
Let's create a factor data frame.
# Create gender vector
gender_vector <- c("Male", "Female", "Female", "Male", "Male")
class(gender_vector)
# Convert gender_vector to a factor
factor_gender_vector <-factor(gender_vector)
class(factor_gender_vector)
Output:
## [1] "character" ## [1] "factor"
It is important to transform a string into factor variable in R when we perform Machine Learning task.
A categorical variable in R can be divided into nominal categorical variable and ordinal categorical variable.
A categorical variable has several values but the order does not matter. For instance, male or female. Categorical variables in R does not have ordering.
# Create a color vector
color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
# Convert the vector to factor
factor_color <- factor(color_vector)
factor_color
Output:
## [1] blue red green white black yellow ## Levels: black blue green red white yellow
From the factor_color, we can't tell any order.
Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest to lowest with order = FALSE.
Example:
We can use summary to count the values for each factor variable in R.
# Create Ordinal categorical vector
day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
# Convert `day_vector` to a factor with ordered level
factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday', 'afternoon', 'evening', 'midnight'))
# Print the new variable
factor_day
Output:
## [1] evening morning afternoon midday midnight evening
Example:
## Levels: morning < midday < afternoon < evening < midnight # Append the line to above code # Count the number of occurence of each level summary(factor_day)
Output:
## morning midday afternoon evening midnight ## 1 1 1 2 1
R ordered the level from 'morning' to 'midnight' as specified in the levels parenthesis.
Continuous class variables are the default value in R. They are stored as numeric or integer. We can see it from the dataset below. mtcars is a built-in dataset. It gathers information on different types of car. We can import it by using mtcars and check the class of the variable mpg, mile per gallon. It returns a numeric value, indicating a continuous variable.
dataset <- mtcars class(dataset$mpg)
Output
## [1] "numeric"
In this tutorial, you will learn- VBScript Procedures Types of Procedures in VBScript Sub...
Linux is an operating system based on UNIX and was first introduced by Linus Torvalds. It is based on...
There are huge number of books on SAP and the choice to select the important books become...
Game recording software are applications that help you to capture your gameplay in HD quality....
Infotype 003 - Payroll Status It automatically stores data that controls the employee's Payroll...
Video quality enhancers are tools that enable you to improve the resolution of a video. These...