Reading Data into R with readr
Foreword
- Output options: the ‘tango’ syntax and the ‘readable’ theme.
- Snippets and results.
Importing data with readr¶
Reading a .csv file
Your first task will be to master the use of the read_csv() function. There are many arguments available, but the only required argument is file, a path to a CSV file on your computer (or the web).
One big advantage that read_csv() has over read.csv() is that it doesn’t convert strings into factors by default.
read_csv() recognizes 8 different data types (integer, logical, etc.) and leaves anything else as characters. That means you don’t have to set stringsAsFactors = FALSE every time you import a CSV file with character strings!
#install.packages('readr')
library(readr)
getwd()
1 | |
setwd("D:/.../Rprojects/Data Wrangling")
Import .csv (only ‘,’).
# Import chickwts.csv: cwts
cwts <- read_csv('chickwts.csv')
# View the head of cwts
head(cwts)
1 2 3 4 5 6 7 8 9 | |
Reading a (.txt) .tsv file
Skipping columns with col_skip().
Code only:
- Setting the column type.
cols(
weight = col_integer(),
feed = col_character()
)
- Setting the column names.
col_names = c('name', 'state', 'phone')
- Removing NA.
na = c('NA', 'null')
In practice.
# Import data
salaries <- read_tsv('Salaries.txt', col_names = FALSE, col_types = cols(
X2 = col_skip(),
X3 = col_skip(),
X4 = col_skip()
))
# View first six rows of salaries
head(salaries)
1 2 3 4 5 6 7 8 9 | |
Reading a European .csv
In most of Europe, commas (rather than periods) are used as decimal points.
# Import data with read_csv2(): trees
trees <- read_csv2('trees.csv')
# View dimensions and head of trees
dim(trees)
1 | |
head(trees)
1 2 3 4 5 6 7 8 9 | |
Read a fixed-width file
Files containing columns of data that are separated by whitespace and all line up on one side.
Code only:
# Import names.txt: names
names <- read_table('names.txt', col_names = c('name', 'state', 'phone'), na = c('NA', 'null'))
Reading a text file
Import ordinary text files.
# vector of character strings.
# Import as a character vector, one item per line: tweets
tweets <- read_lines('tweets.txt')
tweets
1 2 3 4 5 6 | |
# returns a length 1 vector of the entire file, with line breaks represented as \n
# Import as a length 1 vector: tweets_all
tweets_all <- read_file('tweets.txt')
tweets_all
1 | |
Writing .csv and .tsv files
Code only:
# Save cwts as chickwts.csv
write_csv(cwts, "chickwts.csv")
# Append cwts2 to chickwts.csv
write_csv(cwts2, "chickwts.csv", append = TRUE)
Writing .rds files
If the R object you’re working with has metadata associated with it, saving to a CSV will cause that information to be lost.
Exports an entire R object (metadata and all).
Code only:
# Save trees as trees.rds
write_rds(trees, 'trees.rds')
# Import trees.rds: trees2
trees2 <- read_rds('trees.rds')
# Check whether trees and trees2 are the same
identical(trees, trees2)
Parsing Data with readr¶
Coercing columns to different data types
readr functions are quite good at guessing the correct data type for each column in a dataset. Of course, they aren’t perfect, so sometimes you will need to change the type of a column after importing.
Code only:
# Convert all columns to double
trees2 <- type_convert(trees, col_types = cols(Girth = 'd', Height = 'd', Volume = 'd'))
Coercing character columns into factors
readr import functions is that they don’t automatically convert strings into factors like read.csv does.
Code only:
# Parse the title column
salaries$title <- parse_factor(salaries$title, levels = c('Prof', 'AsstProf', 'AssocProf'))
# Parse the gender column
salaries$gender <- parse_factor(salaries$gender, levels = c('Male', 'Female'))
Creating Date objects
The readr import functions can automatically recognize dates in standard ISO 8601 format (YYYY-MM-DD) and parse columns accordingly. If you want to import a dataset with dates in other formats, you can use parse_date.
Code only:
# Change type of date column
weather$date <- parse_date(weather$date, format = '%m/%d/%Y')
Parsing number formats
The readr importing functions can sometimes run into trouble parsing a column as numbers when it contains non-numeric symbols in addition to numerals.
Code only:
# Parse amount column as a number
debt$amount <- parse_number(debt$amount)
Viewing metadata before importing
In some cases, it may be easier to get an idea of how readr plans to parse a dataset before you actually import it. When you see the planned column specification, you might decide to change the type of one or more columns, for example.
spec_csvfor .csv and .tsv files.spec_delimfor .txt files (among others).
# Specifications of chickwts
spec_csv('chickwts.csv')
1 2 3 4 | |