Explore Courses
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Birla Institute of Management Technology Birla Institute of Management Technology Post Graduate Diploma in Management (BIMTECH)
  • 24 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Popular
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science & AI (Executive)
  • 12 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
University of MarylandIIIT BangalorePost Graduate Certificate in Data Science & AI (Executive)
  • 8-8.5 Months
upGradupGradData Science Bootcamp with AI
  • 6 months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
OP Jindal Global UniversityOP Jindal Global UniversityMaster of Design in User Experience Design
  • 12 Months
Popular
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Rushford, GenevaRushford Business SchoolDBA Doctorate in Technology (Computer Science)
  • 36 Months
IIIT BangaloreIIIT BangaloreCloud Computing and DevOps Program (Executive)
  • 8 Months
New
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Popular
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
Golden Gate University Golden Gate University Doctor of Business Administration in Digital Leadership
  • 36 Months
New
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
Popular
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
Bestseller
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
IIIT BangaloreIIIT BangalorePost Graduate Certificate in Machine Learning & Deep Learning (Executive)
  • 8 Months
Bestseller
Jindal Global UniversityJindal Global UniversityMaster of Design in User Experience
  • 12 Months
New
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in AI and Emerging Technologies (Blended Learning Program)
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
ESGCI, ParisESGCI, ParisDoctorate of Business Administration (DBA) from ESGCI, Paris
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration From Golden Gate University, San Francisco
  • 36 Months
Rushford Business SchoolRushford Business SchoolDoctor of Business Administration from Rushford Business School, Switzerland)
  • 36 Months
Edgewood CollegeEdgewood CollegeDoctorate of Business Administration from Edgewood College
  • 24 Months
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with Concentration in Generative AI
  • 36 Months
Golden Gate University Golden Gate University DBA in Digital Leadership from Golden Gate University, San Francisco
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA by Liverpool Business School
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA (Master of Business Administration)
  • 15 Months
Popular
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Business Administration (MBA)
  • 12 Months
New
Deakin Business School and Institute of Management Technology, GhaziabadDeakin Business School and IMT, GhaziabadMBA (Master of Business Administration)
  • 12 Months
Liverpool John Moores UniversityLiverpool John Moores UniversityMS in Data Science
  • 18 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityMaster of Science in Artificial Intelligence and Data Science
  • 12 Months
Bestseller
IIIT BangaloreIIIT BangalorePost Graduate Programme in Data Science (Executive)
  • 12 Months
Bestseller
O.P.Jindal Global UniversityO.P.Jindal Global UniversityO.P.Jindal Global University
  • 12 Months
WoolfWoolfMaster of Science in Computer Science
  • 18 Months
New
Liverpool John Moores University Liverpool John Moores University MS in Machine Learning & AI
  • 18 Months
Popular
Golden Gate UniversityGolden Gate UniversityDBA in Emerging Technologies with concentration in Generative AI
  • 3 Years
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (AI/ML)
  • 36 Months
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDBA Specialisation in AI & ML
  • 36 Months
Golden Gate University Golden Gate University Doctor of Business Administration (DBA)
  • 36 Months
Bestseller
Ecole Supérieure de Gestion et Commerce International ParisEcole Supérieure de Gestion et Commerce International ParisDoctorate of Business Administration (DBA)
  • 36 Months
Rushford, GenevaRushford Business SchoolDoctorate of Business Administration (DBA)
  • 36 Months
Liverpool Business SchoolLiverpool Business SchoolMBA with Marketing Concentration
  • 18 Months
Bestseller
Golden Gate UniversityGolden Gate UniversityMBA with Marketing Concentration
  • 15 Months
Popular
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Corporate & Financial Law
  • 12 Months
Bestseller
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Intellectual Property & Technology Law
  • 12 Months
Jindal Global Law SchoolJindal Global Law SchoolLL.M. in Dispute Resolution
  • 12 Months
IIITBIIITBExecutive Program in Generative AI for Leaders
  • 4 Months
New
IIIT BangaloreIIIT BangaloreExecutive Post Graduate Programme in Machine Learning & AI
  • 13 Months
Bestseller
upGradupGradData Science Bootcamp with AI
  • 6 Months
New
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
KnowledgeHut upGradKnowledgeHut upGradSAFe® 6.0 Certified ScrumMaster (SSM) Training
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutCertified ScrumMaster®(CSM) Training
  • 16 Hours
upGrad KnowledgeHutupGrad KnowledgeHutLeading SAFe® 6.0 Certification
  • 16 Hours
KnowledgeHut upGradKnowledgeHut upGradPMP® certification
  • Self-Paced
upGrad KnowledgeHutupGrad KnowledgeHutAWS Solutions Architect Certification
  • 32 Hours
upGrad KnowledgeHutupGrad KnowledgeHutAzure Administrator Certification (AZ-104)
  • 24 Hours
KnowledgeHut upGradKnowledgeHut upGradAWS Cloud Practioner Essentials Certification
  • 1 Week
KnowledgeHut upGradKnowledgeHut upGradAzure Data Engineering Training (DP-203)
  • 1 Week
MICAMICAAdvanced Certificate in Digital Marketing and Communication
  • 6 Months
Bestseller
MICAMICAAdvanced Certificate in Brand Communication Management
  • 5 Months
Popular
IIM KozhikodeIIM KozhikodeProfessional Certification in HR Management and Analytics
  • 6 Months
Bestseller
Duke CEDuke CEPost Graduate Certificate in Product Management
  • 4-8 Months
Bestseller
Loyola Institute of Business Administration (LIBA)Loyola Institute of Business Administration (LIBA)Executive PG Programme in Human Resource Management
  • 11 Months
Popular
Goa Institute of ManagementGoa Institute of ManagementExecutive PG Program in Healthcare Management
  • 11 Months
IMT GhaziabadIMT GhaziabadAdvanced General Management Program
  • 11 Months
Golden Gate UniversityGolden Gate UniversityProfessional Certificate in Global Business Management
  • 6-8 Months
upGradupGradContract Law Certificate Program
  • Self paced
New
IU, GermanyIU, GermanyMaster of Business Administration (90 ECTS)
  • 18 Months
Bestseller
IU, GermanyIU, GermanyMaster in International Management (120 ECTS)
  • 24 Months
Popular
IU, GermanyIU, GermanyB.Sc. Computer Science (180 ECTS)
  • 36 Months
Clark UniversityClark UniversityMaster of Business Administration
  • 23 Months
New
Golden Gate UniversityGolden Gate UniversityMaster of Business Administration
  • 20 Months
Clark University, USClark University, USMS in Project Management
  • 20 Months
New
Edgewood CollegeEdgewood CollegeMaster of Business Administration
  • 23 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
The American Business SchoolThe American Business SchoolMBA with specialization
  • 23 Months
New
Aivancity ParisAivancity ParisMSc Artificial Intelligence Engineering
  • 24 Months
Aivancity ParisAivancity ParisMSc Data Engineering
  • 24 Months
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGrad KnowledgeHutupGrad KnowledgeHutData Engineer Bootcamp
  • Self-Paced
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
upGradupGradCloud Computing Bootcamp
  • 7.5 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 5 Months
upGrad KnowledgeHutupGrad KnowledgeHutSAFe® 6.0 POPM Certification
  • 16 Hours
upGradupGradDigital Marketing Accelerator Program
  • 05 Months
upGradupGradAdvanced Certificate Program in GenerativeAI
  • 4 Months
New
upGradupGradData Science Bootcamp with AI
  • 6 Months
Popular
upGradupGradFull Stack Software Development Bootcamp
  • 6 Months
Bestseller
upGradupGradUI/UX Bootcamp
  • 3 Months
PwCupGrad CampusCertification Program in Financial Modelling & Analysis in association with PwC India
  • 4 Months
upGradupGradCertificate Course in Business Analytics & Consulting in association with PwC India
  • 06 Months
upGradupGradDigital Marketing Accelerator Program
  • 05 Months

The Ultimate R Cheat Sheet for Data Science Enthusiasts

By Rohit Sharma

Updated on Feb 11, 2025 | 18 min read

Share:

R is powering analysis across industries like healthcare, finance, and marketing such as predictive modeling, risk analysis, and customer segmentation. It offers quick access to essential functions like vector operations, string handling, statistical modeling, and machine learning techniques. Mastering these functions like regression analysis helps you transform raw data into actionable insights.

In this blog, we will cover the basics of vectors, strings, and data transformation, providing hands-on examples to help you get started.

Essential Data Transformation Functions in R Cheat Sheet

Data transformation is a key component of any data analysis process. Without it, raw data can’t be effectively analyzed or used for decision-makingR provides powerful functions to handle large datasets, clean data, and prepare them for analysis. 

For example, when dealing with inconsistent customer data, R’s dplyr and tidyr packages can clean, reshape, and organize the data into an analysis-ready format. These tools streamline the data wrangling process, minimizing human error and enhancing workflow efficiency. 

dplyr helps clean large datasets by providing intuitive functions like mutate() for adding or modifying columns, filter() for subsetting data, and arrange() for sorting data.

On the other hand, tidyr prevents missing data errors by providing functions like spread() and gather(), which help reshape data in a tidy format, ensuring each variable forms its own column and reducing the risk of misaligned or missing data.

R offers a variety of functions to streamline data manipulation:

  • Vector Operations: Functions like sum(), mean(), and length() enable fast computations on datasets.
  • String Handling: R has robust string manipulation functions such as gsub(), substr(), and strsplit() to clean and structure text data efficiently.
  • Data Frames: Data frames are widely used in R for tabular data. You can subset, modify, and merge data frames using dplyr functions like select() and mutate(), making it easier to analyze structured data.

Elevate your R programming skills with upGrad's online Data Science courses, designed in collaboration with top Indian and global universities. Gain hands-on experience, industry-relevant expertise, and career support to land high-paying roles in data science, machine learning, and analytics.

Also Read: Data Frames in Python: Python In-depth Tutorial

Understanding the basics of R will lay the foundation for mastering data transformation techniques.

Fundamental Concepts in R

Before diving into specific functions, it's important to understand some core concepts. A R programming cheat sheet can be a helpful reference as you familiarize yourself with these foundational ideas. These concepts set the stage for efficient R programming and help streamline your work with data.

  • Accessing Help in R

You can access documentation for any function using the help() function or the ? operator. To get details on packages, use library(help = package_name). For quick references, explore R's official online documentation.

Additionally, R users often rely on external resources for troubleshooting and learning, such as RDocumentation.org for package-specific information, or Stack Overflow for community-driven support and practical coding solutions. These platforms provide valuable insights and answers to common R-related questions.

Example:

# Accessing help for the mean function using help()
help(mean)

# Or using the ? operator
?mean

Output:

This will display the documentation for the mean function in R.

Also Read: 10 Interesting R Project Ideas For Beginners [2025]

  • Using Packages in R

R packages enhance R’s functionality, offering more efficient solutions than base R for tasks like data wrangling or visualization. For example, dplyr simplifies data manipulation with concise, readable code.

To install a package, use install.packages("packageName"), and to load it, use library(packageName). Popular repositories include CRAN, Bioconductor (for bioinformatics), and GitHub, offering a vast selection of packages to streamline your analysis.

Example:

# Step 1: Install the dplyr package (this step is only needed once)
install.packages("dplyr")

# Step 2: Load the dplyr package into the R session
library(dplyr)

# Step 3: Example usage of a function from the dplyr package
# Creating a sample data frame
data <- data.frame(
  Name = c("John", "Jane", "Sam", "Sue", "Alex"),
  Age = c(25, 30, 22, 28, 35),
  Score = c(85, 92, 78, 88, 91)
)

# Step 4: Use the filter function from dplyr to filter data
# Example: Filter individuals with Age greater than 25
filtered_data <- filter(data, Age > 25)

# Step 5: Display the filtered data
print(filtered_data)

Output: 

  • When you run this script, it will install the package (if not already installed), load it, perform the filtering, and display the filtered data.
 Name Age Score
1  Jane  30    92
2   Sue  28    88
3  Alex  35    91

Also Read: Top 15 R Libraries for Data Science in 2024

  • The Working Directory

The working directory is where R searches for files and saves results. Use getwd() to check it and setwd() to change it. Proper directory management keeps your project files organized.

After setting the working directory, use functions like read.csv() and write.csv() to read and write files. This ensures efficient file handling in your R projects.

Example:

# Print the current working directory
current_dir <- getwd()
cat("Current Working Directory:", current_dir, "\n")

# Set a new working directory
# Replace this path with the path of the folder you want to set as the working directory
new_dir <- "C:/Users/YourName/Documents"  
setwd(new_dir)

# Verify the working directory has been changed
cat("New Working Directory:", getwd(), "\n")

# Create a new text file in the new working directory
file_name <- "example_file.txt"
file_path <- file.path(new_dir, file_name)

# Write a message to the file
writeLines("Hello, this is a test file.", file_path)
cat("File has been created at:", file_path, "\n")

# Read the contents of the file to verify it's been written
file_contents <- readLines(file_path)
cat("Contents of the file:", file_contents, "\n")

Output:

Current Working Directory: C:/Users/YourName/CurrentDirectory 
New Working Directory: C:/Users/YourName/Documents 
File has been created at: C:/Users/YourName/Documents/example_file.txt 
Contents of the file: [1] "Hello, this is a test file."

Also Read: Why Should You Choose R for Data Science?

  • Operators in R

R offers various operators for different tasks: assignment operators (<-), arithmetic operators (+-*/, etc.), logical operators (&|!), and comparison operators (==!=><>=<=). 

These operators are essential for tasks like performing calculations, filtering data frames with logical conditions, and comparing values for decision-making in your analysis.

Example:

# Assignment Operator
x <- 5   # Assigning 5 to x
y <- 10  # Assigning 10 to y

# Arithmetic Operators
z <- x + y    # Addition
w <- x - y    # Subtraction
v <- x * y    # Multiplication
u <- y / x    # Division
t <- x %% y   # Modulus (remainder)
s <- x^2      # Exponentiation (x squared)

# Comparison Operators
is_equal <- x == y     # Check if x is equal to y
is_greater <- x > y    # Check if x is greater than y
is_less <- x < y       # Check if x is less than y

# Logical Operators
and_condition <- (x > 0 & y > 0)  # Logical AND
or_condition <- (x > 0 | y < 0)   # Logical OR
not_condition <- !(x == y)        # Logical NOT

# Print the results
cat("Arithmetic results:\n")
cat("x + y =", z, "\n")
cat("x - y =", w, "\n")
cat("x * y =", v, "\n")
cat("y / x =", u, "\n")
cat("x %% y =", t, "\n")
cat("x^2 =", s, "\n\n")

cat("Comparison results:\n")
cat("Is x equal to y? ", is_equal, "\n")
cat("Is x greater than y? ", is_greater, "\n")
cat("Is x less than y? ", is_less, "\n\n")

cat("Logical results:\n")
cat("x > 0 AND y > 0? ", and_condition, "\n")
cat("x > 0 OR y < 0? ", or_condition, "\n")
cat("NOT (x == y)? ", not_condition, "\n")

Output:

Arithmetic results:
x + y = 15 
x - y = -5 
x * y = 50 
y / x = 2 
x %% y = 5 
x^2 = 25 

Comparison results:
Is x equal to y?  FALSE 
Is x greater than y?  FALSE 
Is x less than y?  TRUE 

Logical results:
x > 0 AND y > 0?  TRUE 
x > 0 OR y < 0?  TRUE 
NOT (x == y)?  TRUE

Understanding these basics will help you feel comfortable navigating the R environment. Now that you’ve got the essentials, let’s move on to working with vectors.

Working with Vectors in R

Vectors are the foundation of R's data structure system, providing a simple and efficient way to store multiple elements of the same type. They are important for a wide range of operations and are the building blocks for more complex data structures. Below are some common operations and functions for working with vectors.

  • Creating Vectors

You can create vectors in R using the c() function, which stands for "combine." This function allows you to combine individual elements into a vector, such as numbers, characters, or logical values, forming a one-dimensional array.

Example:

# Creating a vector with numbers from 1 to 5
numbers <- c(1, 2, 3, 4, 5)

# Print the created vector
print("The vector 'numbers' is:")
print(numbers)

# Adding 10 to each element of the vector
numbers_plus_ten <- numbers + 10
print("The vector 'numbers' after adding 10 to each element is:")
print(numbers_plus_ten)

# Calculating the sum of all elements in the vector
sum_numbers <- sum(numbers)
print("The sum of elements in the vector 'numbers' is:")
print(sum_numbers)

# Finding the length of the vector
length_numbers <- length(numbers)
print("The length of the vector 'numbers' is:")
print(length_numbers)

# Accessing specific elements of the vector
third_element <- numbers[3]
print("The third element in the vector 'numbers' is:")
print(third_element)

Output:

The vector 'numbers' is:
[1] 1 2 3 4 5
The vector 'numbers' after adding 10 to each element is:
[1] 11 12 13 14 15
The sum of elements in the vector 'numbers' is:
[1] 15
The length of the vector 'numbers' is:
[1] 5
The third element in the vector 'numbers' is:
[1] 3
  • Vector Functions

Vector functions perform various operations on vectors. Common functions include length() for calculating magnitude, sum() for adding elements, and mean() for computing the average. These operations are essential for manipulating and analyzing data in vectorized formats.

Example:

import numpy as np

# Define the vector
numbers = np.array([1, 2, 3, 4, 5])

# Define functions for length, sum, and mean
def length(vec):
    return len(vec)

def sum_vector(vec):
    return np.sum(vec)

def mean_vector(vec):
    return np.mean(vec)

# Call the functions
vector_length = length(numbers)
vector_sum = sum_vector(numbers)
vector_mean = mean_vector(numbers)

print("Length:", vector_length)
print("Sum:", vector_sum)
print("Mean:", vector_mean)

Output: 

Length: 5
Sum: 15
Mean: 3.0
  • Selecting Vector Elements

To select specific elements from a vector, use indexing with square brackets. Indexing starts at 1 in most programming languages, allowing you to retrieve or modify individual values. Negative indices can be used to access elements from the end.

Example:

# Define the list (vector) of numbers
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Select the 4th element using single indexing (index starts from 0)
print("The 4th element (index 3):", numbers[3])

# Select a range of elements using slicing (index 1 to 3, inclusive of 1 but exclusive of 4)
print("Elements from index 1 to 3:", numbers[1:4])

Output:

The 4th element (index 3): 3
Elements from index 1 to 3: [1, 2, 3]
  • Mathematical Operations

R allows vectorized operations, enabling efficient calculations across entire vectors. For instance, adding 2 to every element in a vector is straightforward. Vectorized operations eliminate the need for explicit loops, making code faster and more concise.

Example:

# Create a vector of numbers
numbers <- c(1, 2, 3, 4, 5)

# Add 2 to every element in the vector using vectorized operation
result <- numbers + 2

# Print the result
print(result)

Output:

[1] 3 4 5 6 7

Now that you know how to work with vectors, let’s move on to handling strings, which are another common data type in R.

Handling Strings in R

String manipulation is a frequent task in data processing, and R offers several functions for finding, subsetting, and modifying strings. Base R provides functions like grep()sub(), and gsub(), but for more efficient and user-friendly string operations, the stringr package is highly recommended. 

Functions like str_detect()str_replace(), and str_sub() from stringr are faster and offer a more consistent syntax, making them useful for complex string manipulations.

  • Finding Matches

The grep() function is used to search for elements in a dataset that match a specified pattern. It returns a subset of elements that fit the pattern, allowing for efficient filtering or extraction of relevant data from larger datasets.

Example:

# Create a vector of strings
text <- c("apple", "banana", "cherry")

# Use grep() to find elements that match the pattern "an"
matches <- grep("an", text)

# Print the results
print(matches)

Output:

[1] 2 3
  • Subsetting Strings

You can extract parts of strings using the substr() function, which allows you to specify the starting position and length of the substring you want to extract, providing a flexible way to manipulate string data efficiently.

Example:

# Define the string
string <- "banana"

# Extract the substring from position 1 to position 3
substring_result <- substr(string, 1, 3)

# Print the result
print(substring_result)

Output:

[1] "ban"
  • Mutating Strings

The gsub() function in R is used to replace all instances of a specified pattern in a string with a new value. It allows for powerful string manipulation by applying regular expressions to search and modify text.

Example:

# Example program for mutating strings with gsub()

# Define the string
original_string <- "I love banana"

# Use gsub() to replace "banana" with "orange"
mutated_string <- gsub("banana", "orange", original_string)

# Print the original and mutated strings
cat("Original String: ", original_string, "\n")
cat("Mutated String: ", mutated_string, "\n")

Output:

Original String:  I love banana 
Mutated String:  I love orange
  • Joining and Splitting Strings

The paste() function combines multiple strings into one by inserting a separator, if specified. strsplit() does the opposite, breaking a string into a list of substrings based on a delimiter, useful for data parsing and manipulation.

Example:

# Joining strings using paste()
joined_string <- paste("Hello", "World", sep = " ")
print(joined_string)

# Splitting strings using strsplit()
splitted_strings <- strsplit("apple,orange,banana", ",")
print(splitted_strings)

Output:

[1] "Hello World"
[[1]]
[1] "apple"  "orange" "banana"

These string-handling functions are critical for text data processing in R. Let's take a closer look at working with data frames, which are central to R data manipulation.

Working with Data Frames

Data frames are the go-to structure for handling tabular data. R provides a rich set of functions to manipulate and transform data within data frames. For a quick reference, you can consult an R programming cheat sheet to streamline your work with data frames.

  • Creating Data Frames
    Creating a data frame in R is simple using the data.frame() function. This function combines vectors of equal length into a table-like structure, allowing you to store and manipulate data. It’s commonly used for data analysis tasks.
    Example:
# Create a data frame using the data.frame() function
df <- data.frame(Name = c("John", "Anna", "Peter"),
                 Age = c(23, 25, 30))

# Print the data frame
print(df)

Output:

Name Age
1  John  23
2  Anna  25
3 Peter  30
  • Accessing Data Frame Columns

To access columns in a data frame, use the $ operator followed by the column name. For example, df$column_name will retrieve the data in that specific column, making it easy to reference and manipulate data directly.

Example:

# Creating a data frame
data <- data.frame(
  Name = c('Alice', 'Bob', 'Charlie', 'David'),
  Age = c(25, 30, 35, 40),
  City = c('New York', 'Los Angeles', 'Chicago', 'Houston')
)

# Accessing columns using $ operator
name_column <- data$Name  # Accessing the Name column
age_column <- data$Age    # Accessing the Age column
city_column <- data$City  # Accessing the City column

# Display the results
cat("Name Column:\n")
print(name_column)
cat("\nAge Column:\n")
print(age_column)
cat("\nCity Column:\n")
print(city_column)

Output:

Name Column:
[1] "Alice"   "Bob"     "Charlie" "David"  

Age Column:
[1] 25 30 35 40

City Column:
[1] "New York"     "Los Angeles"  "Chicago"      "Houston"
  • Subsetting Data Frames

Subsetting data frames allows you to extract specific rows or columns using indexing techniques. You can use single or double square brackets to access parts of the dataframe, filtering data based on conditions or selecting desired columns efficiently.

Example:

# Creating a sample data frame
data <- data.frame(
  Name = c('Alice', 'Bob', 'Charlie', 'David', 'Eve'),
  Age = c(25, 30, 35, 40, 45),
  City = c('New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix')
)

# Subsetting examples
first_row <- data[1, ]
second_column <- data[, 2]

# Print results
cat("First Row:\n")
print(first_row)
cat("\nSecond Column:\n")
print(second_column)

Output:

First Row:
  Name Age     City
1 Alice  25 New York

Second Column:
[1] 25 30 35 40 45
  • Mutating Data Frames

Mutating data frames involves adding new columns, removing existing ones, or modifying current data. Common operations include applying functions, creating new variables based on conditions, or transforming existing values to meet specific requirements in data analysis.

Example:

# Creating a sample DataFrame
data <- data.frame(
  Name = c('John', 'Alice', 'Bob'),
  Age = c(23, 25, 22)
)

# Displaying the original DataFrame
cat("Original DataFrame:\n")
print(data)

# Adding a new column 'Gender'
data$Gender <- c('M', 'F', 'M')

# Modifying the 'Age' column (e.g., adding 1 year to each person's age)
data$Age <- data$Age + 1

# Displaying the mutated DataFrame
cat("\nMutated DataFrame:\n")
print(data)

Output:

Original DataFrame:
   Name Age
1  John  23
2 Alice  25
3   Bob  22

Mutated DataFrame:
   Name Age Gender
1  John  24      M
2 Alice  26      F
3   Bob  23      M

With a solid understanding of data frames, let's explore how to load and import data into R to make the most of your R data manipulation skills.

Loading and Importing Data into R

Working with external data is crucial for analysis. R provides various functions to load data from different sources, such as CSV files, Excel, and databases.

The readRDS() function is specifically used to load R-specific objects, including those with metadata, which are saved in .rds format. Unlike read.csv(), which is used for tabular data, readRDS() preserves R objects' structure and attributes.

The following table summarizes key functions used to import data into R.

Function

What It Does

Example Code

read.csv() Loads data from a CSV file. data <- read.csv("file.csv")
read.table() Loads data from a general text file. data <- read.table("file.txt", header=TRUE)
readRDS() Reads an R object saved as an RDS file. data <- readRDS("data.rds")
library(readxl) Reads Excel files after loading the readxl package. data <- read_excel("file.xlsx")

Dive into the practical side of AI with upGrad’s free course on Artificial Intelligence in the Real World. Learn how AI is transforming industries and start applying your skills in real-world scenarios. Enroll today!

Also Read: Data Preprocessing in Machine Learning: 7 Key Steps to Follow, Strategies and Applications

Now that we've covered key transformation functions, let’s explore techniques for generating and manipulating random data in data tables.

Tips for Generating Random Data and Transforming Data in Data Tables

Generating random data is a common task in R, useful for testing algorithms or simulating datasets. R provides several functions to create random values from different distributions. Once the data is generated, transforming it is equally important for analysis. 

Here, you will learn various ways to generate random data and efficiently convert it within data tables.

Generating Random Data in R

R offers powerful functions for generating random data from various statistical distributions. These functions include sample()rnorm(), and runif(). Below, we will explain these functions and provide examples for generating random numbers, normal distributions, and uniform distributions.

  • Generating Random Numbers with sample()
    The sample() function is used to randomly sample elements from a given vector. Example: 
sample(1:10, 5)  # Randomly selects 5 numbers from 1 to 10
  • Generating Random Numbers from a Normal Distribution with rnorm()
    The rnorm() function generates random numbers that follow a normal distribution. You specify the number of values, mean, and standard deviation. 

Example:

rnorm(5, mean = 0, sd = 1)  # Generates 5 random numbers from a standard normal distribution
  • Generating Random Numbers from a Uniform Distribution with runif()
    The runif() function generates random numbers from a uniform distribution between specified minimum and maximum values.

Example:

runif(5, min = 0, max = 1)  # Generates 5 random numbers between 0 and 1

Boost Your Career with Strong Foundations! Master the essentials of programming with upGrad’s free Data Structures and Algorithms course. Enhance your problem-solving skills and excel in R programming roles.

Also Read: 20 Common R Interview Questions & Answers

Generating random data can be especially useful for simulations or when creating synthetic datasets. Next, we will look at how to ensure reproducibility in random sampling.

upGrad’s Exclusive Data Science Webinar for you –

Transformation & Opportunities in Analytics & Insights

 

Random Sampling and Reproducibility

Random sampling is often used in data analysis for selecting subsets of data. To ensure reproducibility of your results, it is essential to control the random number generation. This is where the set.seed() function comes in.

  • Using set.seed() for Reproducibility
    The set.seed() function ensures that you get the same random numbers every time you run the code. It takes an integer value as an argument.
    Example:
set.seed(42)  
sample(1:10, 5)  # Always returns the same set of numbers with seed 42
  • Random Sampling with sample()

You can use sample() for both random sampling with or without replacement.

Example:

sample(1:10, 5, replace = TRUE)  # Randomly selects 5 numbers with replacement

Also Read: Data Cleaning Techniques: Learn Simple & Effective Ways to Clean Data

After learning random sampling, it's important to learn how to transform data efficiently. Let's now explore how to transform data using data.table and dplyr packages.

Transforming Data in Data Tables

Once data is generated or imported into R, transforming it is essential for analysis. The data.table and dplyr packages provide powerful tools to manipulate data tables.

  • Transforming Data with data.table
    The data.table package allows fast data manipulation. You can create a data table using the data.table() function, and use various functions like := for modifying columns.

Example:

library(data.table)
dt <- data.table(A = 1:5, B = letters[1:5])
dt[, C := A * 2]  # Adds a new column 'C' which is twice the value of 'A'
  • Advanced Transformation with data.table

You can chain operations in data.table using the . operator to make transformations more efficient. 

Example:

dt[, .(Sum = sum(A), Mean = mean(A)), by = B]  # Groups by 'B' and calculates sum and mean for 'A'
  • Using dplyr for Data Transformation
    The dplyr package is another popular tool for data manipulation. Functions like mutate(), filter(), and select() are essential for transforming data.

Example:

library(dplyr)
df <- data.frame(A = 1:5, B = letters[1:5])
df %>% mutate(C = A * 2)  # Adds a new column 'C' to the data frame
  • Chaining Operations with dplyr

The pipe operator %>% is widely used to chain multiple operations together in dplyr. This enhances readability and efficiency when applying transformations.

By chaining multiple transformations, you can create more complex data manipulation pipelines in a single, readable line of code. This method greatly enhances the clarity and efficiency of your data processing.

Example:

df %>% filter(A > 2) %>% select(A)  # Filters rows where A > 2 and selects column A

Also Read: 11 Essential Data Transformation Methods in Data Mining 

With your data skills on track, let’s dive into how upGrad can fast-track your journey to becoming a data science pro!

How Can upGrad Support Your Growth in Learning Data Science?

upGrad offers a comprehensive suite of Data Science courses tailored to meet the needs of both beginners and advanced learners. It helps you bridge the gap between learning and applying data science techniques in real-world scenarios. The courses are designed by industry experts and supported by hands-on projects to sharpen your skills.

Here are some recommended courses:

Do you need help deciding which courses can help you excel in R programming? Contact upGrad for personalized counselling and valuable insights. For more details, you can also visit your nearest upGrad offline center. 

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired  with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Frequently Asked Questions

1. What Is Data Science?

2. What Skills Are Required for Data Science?

3. How Long Does It Take to Learn Data Science?

4. Can I Learn Data Science Without a Technical Background?

5. What Are the Key Topics Covered in Data Science?

6. How Does upGrad Support Data Science Learning?

7. Is Data Science a Good Career Choice?

8. What Tools Are Used in Data Science?

9. What Is the Role of a Data Scientist?

10. How Can I Get Started in Data Science?

11. What Are the Job Opportunities After Learning Data Science?

Rohit Sharma

603 articles published

Get Free Consultation

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Suggested Blogs