Reading Data into R with readr
Foreword
- Output options: the ‘tango’ syntax and the ‘readable’ theme.
- Snippets and results.
Importing data with readr
¶
Reading a .csv file
Your first task will be to master the use of the read_csv()
function. There are many arguments available, but the only required argument is file
, a path to a CSV file on your computer (or the web).
One big advantage that read_csv()
has over read.csv()
is that it doesn’t convert strings into factors by default.
read_csv()
recognizes 8 different data types (integer, logical, etc.) and leaves anything else as characters. That means you don’t have to set stringsAsFactors = FALSE
every time you import a CSV file with character strings!
#install.packages('readr')
library(readr)
getwd()
1 |
|
setwd("D:/.../Rprojects/Data Wrangling")
Import .csv (only ‘,’).
# Import chickwts.csv: cwts
cwts <- read_csv('chickwts.csv')
# View the head of cwts
head(cwts)
1 2 3 4 5 6 7 8 9 |
|
Reading a (.txt) .tsv file
Skipping columns with col_skip()
.
Code only:
- Setting the column type.
cols(
weight = col_integer(),
feed = col_character()
)
- Setting the column names.
col_names = c('name', 'state', 'phone')
- Removing NA.
na = c('NA', 'null')
In practice.
# Import data
salaries <- read_tsv('Salaries.txt', col_names = FALSE, col_types = cols(
X2 = col_skip(),
X3 = col_skip(),
X4 = col_skip()
))
# View first six rows of salaries
head(salaries)
1 2 3 4 5 6 7 8 9 |
|
Reading a European .csv
In most of Europe, commas (rather than periods) are used as decimal points.
# Import data with read_csv2(): trees
trees <- read_csv2('trees.csv')
# View dimensions and head of trees
dim(trees)
1 |
|
head(trees)
1 2 3 4 5 6 7 8 9 |
|
Read a fixed-width file
Files containing columns of data that are separated by whitespace and all line up on one side.
Code only:
# Import names.txt: names
names <- read_table('names.txt', col_names = c('name', 'state', 'phone'), na = c('NA', 'null'))
Reading a text file
Import ordinary text files.
# vector of character strings.
# Import as a character vector, one item per line: tweets
tweets <- read_lines('tweets.txt')
tweets
1 2 3 4 5 6 |
|
# returns a length 1 vector of the entire file, with line breaks represented as \n
# Import as a length 1 vector: tweets_all
tweets_all <- read_file('tweets.txt')
tweets_all
1 |
|
Writing .csv and .tsv files
Code only:
# Save cwts as chickwts.csv
write_csv(cwts, "chickwts.csv")
# Append cwts2 to chickwts.csv
write_csv(cwts2, "chickwts.csv", append = TRUE)
Writing .rds files
If the R object you’re working with has metadata associated with it, saving to a CSV will cause that information to be lost.
Exports an entire R object (metadata and all).
Code only:
# Save trees as trees.rds
write_rds(trees, 'trees.rds')
# Import trees.rds: trees2
trees2 <- read_rds('trees.rds')
# Check whether trees and trees2 are the same
identical(trees, trees2)
Parsing Data with readr
¶
Coercing columns to different data types
readr
functions are quite good at guessing the correct data type for each column in a dataset. Of course, they aren’t perfect, so sometimes you will need to change the type of a column after importing.
Code only:
# Convert all columns to double
trees2 <- type_convert(trees, col_types = cols(Girth = 'd', Height = 'd', Volume = 'd'))
Coercing character columns into factors
readr
import functions is that they don’t automatically convert strings into factors like read.csv
does.
Code only:
# Parse the title column
salaries$title <- parse_factor(salaries$title, levels = c('Prof', 'AsstProf', 'AssocProf'))
# Parse the gender column
salaries$gender <- parse_factor(salaries$gender, levels = c('Male', 'Female'))
Creating Date objects
The readr
import functions can automatically recognize dates in standard ISO 8601 format (YYYY-MM-DD) and parse columns accordingly. If you want to import a dataset with dates in other formats, you can use parse_date
.
Code only:
# Change type of date column
weather$date <- parse_date(weather$date, format = '%m/%d/%Y')
Parsing number formats
The readr
importing functions can sometimes run into trouble parsing a column as numbers when it contains non-numeric symbols in addition to numerals.
Code only:
# Parse amount column as a number
debt$amount <- parse_number(debt$amount)
Viewing metadata before importing
In some cases, it may be easier to get an idea of how readr
plans to parse a dataset before you actually import it. When you see the planned column specification, you might decide to change the type of one or more columns, for example.
spec_csv
for .csv and .tsv files.spec_delim
for .txt files (among others).
# Specifications of chickwts
spec_csv('chickwts.csv')
1 2 3 4 |
|