A Complete Guide - R Language Data Frames and Tibbles

Last Updated: 03 Jul, 2025   
  YOU NEED ANY HELP? THEN SELECT ANY TEXT.

R Language Data Frames and Tibbles: Explained in Detail with Important Info

Overview

Data Frames: The Traditional Tabular Structure

Construction
  • Function: data.frame()
  • Columns: Can have different data types (numeric, character, factor, etc.)
  • Rows: Each row is a distinct record or observation
  • Example:
    df <- data.frame( Name = c("Alice", "Bob"), Age = c(25, 30), stringsAsFactors = FALSE
    )
    
Key Features
  • Subsetting: You can access subsets of data frames using indexing or logical conditions.
  • Combining: Use rbind() to add rows and cbind() to add columns.
  • Factors: Automatically converts string columns to factors unless stringsAsFactors = FALSE.
  • Printing: Displays in a default tabular format.
Common Manipulations
  • Viewing Structure: str(df)
  • Summarizing Data: summary(df)
  • Filtering Rows: Logical statements within square brackets [ ]
    df[df$Age > 25, ]
    
  • Selecting Columns: Use $ or []
    df$Name
    df[, "Name"]
    

Tibbles: A Modern Enhancement

Introduction

Tibbles are a modern version of data frames introduced by the tibble package in R. They offer an enhanced user experience by providing more intuitive behavior and better defaults.

Construction
  • Function: tibble() from the tibble package
  • Columns: Similar to data frames, supports different data types
  • Rows: Represents observations and works identically in terms of indexing and subsetting
  • Example:

Online Code run

🔔 Note: Select your programming language to check or run code at

💻 Run Code Compiler

Step-by-Step Guide: How to Implement R Language Data Frames and Tibbles

Introduction

R is a powerful language for statistical computing and data analysis. The two most common data structures for storing datasets in R are Data Frames and Tibbles. While they have similarities, tibbles (from the tidyverse package) offer some enhancements that make them more user-friendly for data manipulation.

Example 1: Creating Data Frames

Step 1: Create Vectors

First, we'll create some vectors which will be used as columns in our Data Frame.

# Create vectors
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 45)
weight <- c(55.75, 60.50, 85.75)

Step 2: Combine Vectors into a Data Frame

Now we can combine these vectors using data.frame() function.

# Combine vectors into a dataframe
people_df <- data.frame(name = name, age = age, weight = weight) # Print the dataframe
print(people_df)

Output:

 name age weight
1 Alice 25 55.75
2 Bob 30 60.50
3 Charlie 45 85.75

Step 3: Add Row Names to the Data Frame (Optional)

You can also set row names if needed:

# Set row names
row.names(people_df) <- c("Person_1", "Person_2", "Person_3") # Print dataframe with row names
print(people_df)

Output:

 name age weight
Person_1 Alice 25 55.75
Person_2 Bob 30 60.50
Person_3 Charlie 45 85.75

Example 2: Creating Tibbles

Tibbles are an enhanced version of Data Frames provided by the tidyverse package in R. They can be created using the tibble() function from the tibble package.

Step 1: Install and Load Tidyverse Package

We first need to install and load the tidyverse package which includes the tibble package and other useful packages.

# Install the tidyverse package
install.packages("tidyverse") # Load the tidyverse package
library(tidyverse)

Step 2: Create Vectors

Similar to Example 1, we'll create some vectors.

# Create vectors
name <- c("Alice", "Bob", "Charlie")
age <- c(25, 30, 45)
weight <- c(55.75, 60.50, 85.75)

Step 3: Combine Vectors into a Tibble

We can use the tibble() function now to create a Tibble from our vectors.

# Combine vectors into a tibble
people_tbl <- tibble(name = name, age = age, weight = weight) # Print the tibble
print(people_tbl)

Output:

# A tibble: 3 × 3 name age weight <chr> <dbl> <dbl>
1 Alice 25 55.8
2 Bob 30 60.5
3 Charlie 45 85.8

Note how Tibbles have additional helpful information displayed by default such as the dimension and types of variables. It also prints only a select number of rows and columns by default.

Example 3: Manipulating Data Frames

Step 1: Adding a New Column to a Data Frame

Suppose you want to add a new column indicating whether each person's weight is below the average weight of all people in the dataset.

# Calculate the average weight
average_weight <- mean(weight) # Add a new column - is weight below average?
people_df$bellow_avg_wt <- ifelse(weight < average_weight, "Yes", "No") # Print the updated dataframe
print(people_df)

Output:

 name age weight bellow_avg_wt
Person_1 Alice 25 55.75 Yes
Person_2 Bob 30 60.50 No
Person_3 Charlie 45 85.75 No

Step 2: Filtering Rows in a Data Frame

We can filter the rows based on certain conditions using the subset() function for Data Frames.

# Filter rows where age is greater than 30
older_than_30 <- subset(people_df, age > 30) # Print the filtered dataframe
print(older_than_30)

Output:

 name age weight bellow_avg_wt
Person_3 Charlie 45 85.75 No

Alternatively, the dplyr package from the tidyverse makes filtering much easier with filter() function.

# using dplyr to filter
older_than_30_dplyr <- people_df %>% filter(age > 30) # Print the filtered dataframe
print(older_than_30_dplyr)

Output:

 name age weight bellow_avg_wt
1 Charlie 45 85.75 No

Example 4: Manipulating Tibbles

Step 1: Adding a New Column to a Tibble

Adding columns to Tibbles works similarly to Data Frames but is often easier to read, especially when chaining operations. Here’s how you would do it using dplyr's mutate() function.

# Load dplyr library
library(dplyr) # Calculate the average weight
average_weight <- mean(weight) # Add a new column people_tbl <- people_tbl %>% mutate(bellow_avg_wt = if_else(weight < average_weight, "Yes", "No")) # Print the updated tibble
print(people_tbl)

Output:

# A tibble: 3 × 4 name age weight bellow_avg_wt <chr> <dbl> <dbl> <chr> 1 Alice 25 55.8 Yes 2 Bob 30 60.5 No 3 Charlie 45 85.8 No 

Step 2: Filtering Rows in a Tibble

Filtering rows in a Tibble using dplyr is straightforward and easy to read.

# Filter rows where age is greater than 30
older_than_30_tbl <- people_tbl %>% filter(age > 30) # Print the filtered tibble
print(older_than_30_tbl)

Output:

# A tibble: 1 × 4 name age weight bellow_avg_wt <chr> <dbl> <dbl> <chr> 1 Charlie 45 85.8 No 

Example 5: Inspecting Data Frames and Tibbles

You can inspect your data using functions like str(), summary(), and head().

Step 1: Inspect the Structure of a Data Frame

The str() function helps you understand the structure of the data.

# Inspect structure of data frame
str(people_df)

Output:

'data.frame':	3 obs. of 4 variables: $ name : chr "Alice" "Bob" "Charlie" $ age : num 25 30 45 $ weight : num 55.7 60.5 85.8 $ bellow_avg_wt: Factor w/ 2 levels "No","Yes": 2 1 1

Step 2: Get a Summary of a Data Frame

The summary() function provides basic statistics about each variable in your data frame.

# Get summary statistics about data frame
summary(people_df)

Output:

 name age weight bellow_avg_wt Length:3 Min. :25.00 Min. :55.75 No :2 Class :character 1st Qu.:30.00 1st Qu.:60.12 Yes:1 Mode :character Median :35.00 Median :70.14 Mean :33.33 Mean :67.37 3rd Qu.:40.00 3rd Qu.:80.66 Max. :45.00 Max. :85.75 

Step 3: Display the First Few Rows of a Data Frame

The head() function displays the first few rows and is useful during exploratory analysis.

# Display first few rows of data frame
head(people_df)

Output:

 name age weight bellow_avg_wt
1 Alice 25 55.75 Yes
2 Bob 30 60.50 No
3 Charlie 45 85.75 No

Step 1: Inspect the Structure of a Tibble

Again, str() is handy for understanding the structure.

# Inspect structure of tibble
str(people_tbl)

Output:

Classes 'tbl_df', 'tbl' and 'data.frame':	3 obs. of 4 variables: $ name : chr "Alice" "Bob" "Charlie" $ age : num 25 30 45 $ weight : num 55.7 60.5 85.8 $ bellow_avg_wt: chr "Yes" "No" "No"

Step 2: Get a Summary of a Tibble

Just like Data Frames, you can get a summary of a Tibble.

# Get summary statistics about tibble
summary(people_tbl)

Output:

 name age weight bellow_avg_wt Length:3 Min. :25.00 Min. :55.75 Length:3 Class :character 1st Qu.:30.00 1st Qu.:60.12 Class :character Mode :character Median :35.00 Median :70.14 Mode :character Mean :33.33 Mean :67.37 3rd Qu.:40.00 3rd Qu.:80.66 Max. :45.00 Max. :85.75 

Step 3: Display the First Few Rows of a Tibble

Again, head() displays the first few rows.

 YOU NEED ANY HELP? THEN SELECT ANY TEXT.

Top 10 Interview Questions & Answers on R Language Data Frames and Tibbles

1. What are Data Frames in R?

Answer: Data frames in R are used to store data tables. They are essentially lists of vectors of equal length. Each vector represents a column which may be of a different mode (numeric, character, etc.), and each row represents an observation or record. Data frames are particularly useful for data analysis and statistical modeling.

Example:

df <- data.frame(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35), salary = c(50000, 60000, 70000))

2. What are Tibbles in R?

Answer: Tibbles are a modern take on data frames, part of the tibble package within the tidyverse. They print in a more user-friendly format, never adjust the variable names, and preserve column types. Tibbles also provide a more predictable behavior during data manipulation.

Example:

library(tibble)
tibble_df <- tibble(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35), salary = c(50000, 60000, 70000))

3. What is the difference between a data frame and a tibble?

Answer: While both store tabular data, the primary differences are:

  • Printing: Tibbles print a limited number of rows and columns and show the data types of each column.
  • Column Names: Tibbles do not adjust column names if they contain special characters or spaces.
  • Recycling: Tibbles do not recycle shorter vectors in a data frame, which prevents silent recycling errors.
  • Subsetting: Tibbles return single columns as a tibble, unlike data frames which may return a vector.

4. How do you create a data frame in R?

Answer: You can create a data frame using the data.frame() function by passing vectors of equal length.

Example:

df <- data.frame(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35), salary = c(50000, 60000, 70000))

5. How do you create a tibble in R?

Answer: Create a tibble using the tibble() function, from the tibble package.

Example:

library(tibble)
tibble_df <- tibble(name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35), salary = c(50000, 60000, 70000))

6. How do you select specific columns from a data frame or tibble?

Answer: You can select columns using the dollar sign ($) and square brackets [] for more complex selections.

Example for Data Frame:

# Select just the age column from a data frame
age_col <- df$age

Example for Tibble:

# Select just the age column from a tibble
age_col <- tibble_df[["age"]]

7. How do you add a new column to a data frame or a tibble?

Answer: You can add a new column by simply assigning a value to a new column name.

Example:

# Adding a new column to a data frame
df$bonus <- df$salary * 0.1 # Adding a new column to a tibble
tibble_df <- tibble_df %>% mutate(bonus = salary * 0.1)

8. How can you filter rows in a data frame or tibble?

Answer: Use the subset() function for data frames or the filter() function from dplyr for tibbles.

Example for Data Frame:

# Filter rows where age is greater than 28
subset_df <- subset(df, age > 28)

Example for Tibble:

library(dplyr)
# Filter rows where age is greater than 28
filtered_tibble <- tibble_df %>% filter(age > 28)

9. How do you combine two data frames or tibbles by rows?

Answer: Use rbind() to combine by rows.

Example:

# Combine two data frames by rows
new_df <- rbind(df1, df2) # Combine two tibbles by rows
library(dplyr)
new_tibble <- bind_rows(tibble_df1, tibble_df2)

10. How do you combine two data frames or tibbles by columns?

Answer: Use cbind() to combine by columns.

Example:

You May Like This Related .NET Topic

Login to post a comment.