Plot Snippets for Exploratory (and some Explanatory) Analyses
Foreword
- Output options: the ‘tango’ syntax and the ‘readable’ theme.
- Code snippets and results.
- Some data might necessitate more specialized packages.
- For explaining data, presenting results, reporting and publishing, we can generate prettier graphics with
ggvis
orggplot2
, and interactive packages such asshiny
.
Plotting Packages¶
Graphics:
maps
for grids and mapping.diagram
for flow charts.plotrix
for ternary, polar plots.gplots
.pixmap
,png
,rtiff
,ReadImages
,EBImage
,RImageJ
.leaflet
.
Grid:
vcd
for mosaic, ternary plots.grImport
for vectors.ggplot2
and extensions.lattice
andlatticeExtra
.gridBase
.
Devices:
JavaGD
.Cairo
.tikzDevice
.
Interactive:
rgl
.ggvis
.iplots
.rggobi
.
Others:
ash
for density plots.cluster
for dendrograms.copula
for multivariate analyses.corrplot
for correlations.compositions
for geometries, ternary plots.extracat
for missing values.soiltexture
for ternary plots and more.KernSmooth
for histograms-density plots.openair
for polar, circular plots.sm
for density plots.car
for scatter plots.vioplot
for boxplots.vcd
for mosaic plots and multivariate analyses.hexbin
for scatter plots.scatterplot3d
for 3D scatter plots.cluster
for dendrograms.shiny
for interactive plots.ggvis
.
Data Type & Dataset¶
Data Types¶
- continuous vs categorical (or discrete).
- continuous: float, x-y-z, 3D, map coordinates, trianguar, lat-long, polar, degree-distance, angle-vector.
- categorical: integer, binary, dichotomic, dummy, factor, ordinal (ordered).
Continuous variable characteristics:
- asymmetry.
- outliers.
- multimodality.
- gaps, missing values.
- heaping, redundance.
- rounding, integer.
- impossibilities, anomalies.
- errors.
- …
Categorical variable characteristics:
- unexpected pattern of results.
- uneven distribution.
- extra categories.
- unbalanced experiments.
- large numbers of categories.
- NA, errors, missings…
- nominal: no fixed order.
- ordinal: fixed order (scale of 1 to 5).
- discrete: counts, integers.
- dependencies, correlation, associations.
- causal relationships, outliers, groups, clusters, gaps, barriers, conditional relationship.
- …
Univariate main plots:
- histogram.
- density.
- qqmath chart.
- box & whickers chart.
- bar chart.
- dot.
Bivariate main plots:
- xy chart.
- qq chart.
Trivariate main plots:
- cloud.
- wireframe.
- countour.
- level.
Multivariate main plots:
- sploms.
- parallel charts (coordinate).
Specialized plots:
- frequencies, crosstabs: bar charts, mosaic plots, association plots.
- correlations: sploms, pairs, correlograms.
- t-tests, non-parrametric tests of group differences: box plot, density plot.
- regression: scatter plot.
- ANOVA: box plots, line plots.
Functions¶
Create a new variable
iris2 <- within(iris, area <- Petal.Width*Petal.Length)
head(iris2, 3)
1 2 3 4 |
|
area <- with(iris, area <- Petal.Width*Petal.Length)
head(area, 3)
1 |
|
Dataset¶
For most examples, we use the mtcars
dataset.
Prepare the dataset.
attach(mtcars)
Get data attached to a package (an example).
data(gvhd10, package = 'latticeExtra')
The Basic Package¶
Basic Plots, Options & Parameters¶
Standardize the parameters (an example)
# color and tick mark text orientation
par(col = 'black', las = 1)
Grid and layout
One plot.
plot(hp, mpg, xlab = 'horsepower', ylab = 'miles per gallon')
A grid of plots.
par(mfrow = c(2, 1))
plot(mpg, hp, ylab = 'horsepower', xlab = 'miles per gallon')
boxplot(mpg ~ cyl, xlab = 'mile per gallon', ylab = 'number of cylinders', horizontal = TRUE)
par(mfrow = c(1, 2))
plot(mpg, hp, ylab = 'horsepower', xlab = 'miles per gallon')
boxplot(mpg ~ cyl, xlab = 'mile per gallon', ylab = 'number of cylinders', horizontal = TRUE)
par(mfrow = c(1, 1))
Other grids.
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))
plot(mpg, xlab = 'observations', ylab = 'miles per gallon')
plot(hp, mpg, xlab = 'horsepower', ylab = 'miles per gallon')
boxplot(mpg ~ cyl, ylab = 'mile per gallon', xlab = 'number of cylinders')
# view
matrix(c(1,2,1,3), 2, 2, byrow = TRUE)
1 2 3 |
|
layout(matrix(c(1,2,1,3), 2, 2, byrow = TRUE))
hist(wt)
hist(mpg)
hist(disp)
layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE), widths = c(3,1), heights = c(1,2))
hist(wt)
hist(mpg)
hist(disp)
nf <- layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE), widths = lcm(12), heights = lcm(6))
layout.show(nf)
plot(mpg, xlab = 'observations', ylab = 'miles per gallon')
plot(hp, mpg, xlab = 'horsepower', ylab = 'miles per gallon')
boxplot(mpg ~ cyl, ylab = 'mile per gallon', xlab = 'number of cylinders')
Gridview with additional packages.
library(vcd)
mplot(A, B, C)
See the lattice
and latticeExtra
packages for built-in facet/gridview. ggplot2
as well.
Plot and add ablines
plot(hp, mpg, xlab = 'horsepower', ylab = 'miles per gallon')
# abline(h = yvalues, v = xvalues)
abline(lm(mpg ~ hp))
# main = 'Title' or...
title('Title')
plot(hp, mpg, xlab = 'horsepower', ylab = 'miles per gallon')
abline(h = c(20, 25))
abline(v = c(50, 150))
abline(v = seq(200, 300, 50), lty = 2, col = 'blue')
Add a legend
boxplot(mpg ~ cyl, main = 'Title',
yaxt = 'n', xlab = 'mile per gallon', horizontal = TRUE, col = terrain.colors(3))
legend('topright', inset = 0.05, title = 'number of cylinders', c('4','6','8'), fill = terrain.colors(3), horiz = TRUE)
Save
mygraph <- plot(hp, mpg, main = 'Title', xlab = 'horsepower', ylab = 'miles per gallon')
pdf('mygraph.pdf')
png('mygraph.png')
jpeg('mygraph.jpg')
bmp('mygraph.bmp')
postscript('mygraph.ps')
View in a new window
Typing the function will open a new window to render the plot.
windows()
for Windows.X11()
for Linux.quartz()
for OS X.
# open the new windows
windows()
plot(hp, mpg, main = 'Title', xlab = 'horsepower', ylab = 'miles per gallon')
Enrich the plot, add text
plot(hp, mpg,
main = 'Title', col.main = 'blue',
sub = 'figure 1', col.sub = 'blue',
xlab = 'horsepower',
ylab = 'miles per gallon',
col.lab = 'red', cex.lab = 0.9,
xlim = c(50, 350),
ylim = c(0, 40))
text(100, 10, 'text 1') # x and y coordinate
mtext('text 2', 4, line = 0.5) # pos = 1 (bottom), 2 (left), 3 (top), 4 (right); line (margin)
With locator()
, use the mouse; with 1 for 1 click, 2 for… Find the coordinates to be entered in the code. For example (after two clicks):
> locator(2)
$x
[1] 212.5308 293.7854
$y
[1] 33.34040 31.87281
plot(hp, mpg,
main = 'Title',
xlab = 'horsepower',
ylab = 'miles per gallon')
text(hp, mpg, row.names(mtcars), cex = 0.7, pos = 4, col = 'red')
Enrich the plot, add symbols
plot(hp, mpg,
main = 'Title',
xlab = 'horsepower',
ylab = 'miles per gallon')
symbols(250, 20, squares = 1, add = TRUE, inches = 0.1, fg = 'red')
symbols(250, 25, circles = 1, add = TRUE, inches = 0.1, fg = 'red')
#rectangles
#stars
#thermometers
#boxplots
Combine plots; change pch =
& col =
par(mfrow = c(2,2))
# 1
plot(hp, mpg,
main = 'P1',
xlab = 'horsepower',
ylab = 'miles per gallon',
pch = 1,
col = 'black')
# 2
plot(hp, mpg,
main = 'P2',
xlab = 'horsepower',
ylab = 'miles per gallon',
pch = 3,
col = 'blue',
cex = 0.5)
# 3
plot(hp, mpg,
main = 'P3',
xlab = 'horsepower',
ylab = 'miles per gallon',
pch = 5,
col = 'red',
cex = 2)
# 4
plot(hp, mpg,
main = 'P4',
xlab = 'horsepower',
ylab = 'miles per gallon',
pch = 7,
col = 'green')
# reverse
par(mfrow = c(1,1))
Change col =
Change pch =
Change lty =
par(fig = c(0,0.8,0,0.8))
plot(mtcars$wt, mtcars$mpg, xlab = 'Car Weight', ylab = 'miles Per Gallon')
par(fig = c(0,0.8,0.55,1), new = TRUE)
boxplot(mtcars$wt, horizontal = TRUE, axes = FALSE)
par(fig = c(0.65,1,0,0.8), new = TRUE)
boxplot(mtcars$mpg, axes = FALSE)
mtext('Enhanced Scatterplot', side = 3, outer = TRUE, line = -3)
# reverse
par(mfrow = c(1,1))
Change type =
; without dots
x <- c(1:5); y <- x
par(pch = 22, col = 'red') # plotting symbol and color
par(mfrow = c(2,4)) # all plots on one page
opts = c('p','l','o','b','c','s','S','h')
for (i in 1:length(opts)) {
heading = paste('type =',opts[i])
plot(x, y, type = 'n', main = heading)
lines(x, y, type = opts[i])
}
# reverse
par(mfrow = c(1,1), col = 'black')
Change type =
; with dots
x <- c(1:5); y <- x
par(pch = 22, col = 'blue') # plotting symbol and color
par(mfrow = c(2,4)) # all plots on one page
opts = c('p','l','o','b','c','s','S','h')
for (i in 1:length(opts)) {
heading = paste('type =',opts[i])
plot(x, y, main = heading)
lines(x, y, type = opts[i])
}
# reverse
par(mfrow = c(1,1), col = 'black')
Add or modify the axes
plot(hp, mpg,
main = 'Title',
xlab = 'horsepower',
ylab = 'miles per gallon',
xaxt = 'n',
yaxt = 'n')
axis(1, at = c(100, 200, 300), labels = NULL, pos = 15, lty = 'dashed', col = 'green', las = 2, tck = -0.05)
axis(4, at = c(20, 30), labels = c('bt', 'up'), pos = 125, lty = 'dashed', col = 'blue', las = 2, tck = -0.05)
# reverse
par(las = 1)
Add layers to the first plot
plot(mpg,
main = 'Title',
xlab = 'horsepower',
ylab = 'miles per gallon')
# add lines
lines(mpg[1:10], type = 'l', col = 'green')
Univariate Plots¶
Plot; continuous
plot(mpg, main = 'Title', xlab = 'observations', ylab = 'miles per gallon')
Plot; categorical
plot(cyl, main = 'Title', xlab = 'observations', ylab = 'cylinders')
QQnorm; continuous
qqnorm(mpg, main = 'Title', xlab = 'observations', ylab = 'cylinders')
QQnorm; categorical
qqnorm(cyl, main = 'Title', xlab = 'observations', ylab = 'cylinders')
Stripchart; continuous
stripchart(mpg, main = 'Title', xlab = 'miles per gallon')
Stripchart; categorical
stripchart(cyl, main = 'Title', xlab = 'cylinders')
Barplot (vertical); continuous
barplot(mpg[1:10], main = 'Title', xlab = 'observations', ylab = 'miles per gallon')
Barplot (horizontal); categorical
barplot(cyl[1:10], main = 'Title', horiz = TRUE, xlab = 'cylinders', ylab = 'observations')
Barplots options
Group with table()
.
counts <- table(cyl)
counts
1 2 3 |
|
barplot(counts, main = 'Title', horiz = TRUE, xlab = 'count', names.arg = c('4 Cyl', '6 Cyl', '8 Cyl'))
counts <- table(vs, gear)
counts
1 2 3 4 |
|
barplot(counts, main = 'Title', xlab = 'gearbox', col = c('darkblue', 'red'), legend = rownames(counts))
counts <- table(vs, gear)
counts
1 2 3 4 |
|
barplot(counts, main = 'Title', xlab='gearbox', col = c('darkblue', 'red'), legend = rownames(counts), beside = TRUE)
Group with aggregate()
.
aggregate(mtcars, by = list(cyl, vs), FUN = mean, na.rm = TRUE)
1 2 3 4 5 6 7 8 9 10 11 12 |
|
par(las = 2) # make label text perpendicular to axis
par(mar = c(5, 8, 4, 2)) # increase y-axis margin.
counts <- table(mtcars$gear)
barplot(counts, main = 'Car Distribution', horiz = TRUE, names.arg = c('3 Gears', '4 Gears', '5 Gears'), cex.names = 0.8)
# reverse
par(las = 1)
Colors.
library(RColorBrewer)
par(mfrow = c(2, 1))
barplot(iris$Petal.Length)
barplot(table(iris$Species, iris$Sepal.Length), col = brewer.pal(3, 'Set1'))
par(mfrow = c(1, 1))
Pie Chart
Avoid!
Dotchart; continuous
dotchart(mpg, main = 'Title', xlab = 'miles per gallon', ylab = 'observations')
Dotchart; categorical
dotchart(cyl, main = 'Title', xlab = 'cylinders', ylab = 'observations')
Dotchart options
dotchart(mpg,labels = row.names(mtcars), cex = 0.7, main = 'Title', xlab = 'miles per gallon')
# sort by mpg
x <- mtcars[order(mpg),]
# must be factors
x$cyl <- factor(x$cyl)
x$color[x$cyl == 4] <- 'red'
x$color[x$cyl == 6] <- 'blue'
x$color[x$cyl == 8] <- 'darkgreen'
dotchart(x$mpg, labels = row.names(x), cex = 0.7, groups = x$cyl, main = 'Title', xlab = 'miles per gallon', gcolor = 'black', color = x$color)
More with the hmisc
package and panel.dotplot()
and in the lattice
package section.
Boxplot; continuous
boxplot(mpg, main = 'Title', xlab = 'miles per gallon', ylab = 'observations')
Stem; continuous
stem(mpg)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Histogram; continuous
hist(mpg, main = 'Title', xlab = 'miles per gallon - bins', ylab = 'count')
Histogram; categorical
hist(cyl, main = 'Title', xlab = 'cylinders - bins', ylab = 'count')
Histogram options
hist(mpg, breaks = 12, col = 'red')
x <- mpg
h <- hist(x, breaks = 10, main = 'Title', xlab = 'miles per gallon')
xfit <- seq(min(x), max(x),length = 40)
yfit <- dnorm(xfit, mean = mean(x), sd = sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col = 'blue', lwd = 2)
Colors.
library(RColorBrewer)
par(mfrow = c(2, 3))
hist(VADeaths, breaks = 10, col = brewer.pal(3, 'Set3'), main = '3, Set3')
hist(VADeaths, breaks = 4, col = brewer.pal(3, 'Set2'), main = '3, Set2')
hist(VADeaths, breaks = 8, col = brewer.pal(3, 'Set1'), main = '3, Set1')
hist(VADeaths, breaks = 2, col = brewer.pal(8, 'Set3'), main = '8, Set3')
hist(VADeaths, breaks = 10, col = brewer.pal(8, 'Greys'), main = '8, Greys')
hist(VADeaths, breaks = 10, col = brewer.pal(8, 'Greens'), main = '8, Greens')
par(mfrow = c(1, 1))
Density Plot; continuous
plot(density(mpg), main = 'Title')
plot(density(mpg), main = 'Title')
polygon(density(mpg), col = 'red', border = 'blue')
d1 <- density(mtcars$mpg)
plot(d1)
rug(mtcars$mpg)
lines(density(mtcars$mpg, d1$bw/2), col = 'green')
lines(density(mtcars$mpg, d1$bw/5), col = 'blue')
Bivariate (Multivariate) Plots¶
Plot, continuous/continuous
plot(mpg, hp, main = 'Title', xlab = 'miles per gallon', ylab = 'horsepowers')
Plot, continuous/categorical
plot(mpg, cyl, main = 'Title', xlab = 'miles per gallon', ylab = 'cylinders')
Plot options
plot(wt, mpg, main = 'Title', xlab = 'weight', ylab = 'miles per gallon ')
abline(lm(mpg ~ wt), col = 'red') # regression
lines(lowess(wt, mpg), col = 'blue') # lowess line
SmoothScatter; continuous/continuous
smoothScatter(mpg, hp, main = 'Title', xlab = 'miles per gallon', ylab = 'horsepowers')
Sunflowerplot; categorical/categorical
Special symbols at each location: one observation = one dot; more observations = cross, star, etc.
sunflowerplot(gear, cyl, main = 'Title', xlab = 'gearbox', ylab = 'cylinders')
Boxplot
boxplot(mpg ~ cyl, main = 'Title', xlab = 'cylinders', ylab = 'miles per gallon')
Colors.
library(RColorBrewer)
par(mfrow = c(1, 2))
boxplot(iris$Sepal.Length, col = 'red')
boxplot(iris$Sepal.Length ~ iris$Species, col = topo.colors(3))
par(mfrow = c(1, 1))
library(dplyr)
data(Pima.tr2, package = 'MASS')
PimaV <- select(Pima.tr2, glu:age)
boxplot(scale(PimaV), pch = 16, outcol = 'red')
Boxplot options
four <- subset(mpg, cyl == 4)
six <- subset(mpg, cyl == 6)
eight <- subset(mpg, cyl == 8)
boxplot(four, six, eight, main = 'Title', ylab = 'miles per gallon')
axis(1, at = c(1, 2, 3), labels = c('4 Cyl', '6 Cyl', '8 Cyl'))
Dotchart
counts <- table(gear, cyl)
counts
1 2 3 4 5 |
|
dotchart(counts, main = 'Title', xlab = 'count', ylab = 'cylinders/gearbox')
counts <- table(cyl, gear)
counts
1 2 3 4 5 |
|
dotchart(counts, main = 'Title', xlab = 'count', ylab = 'gearbox/cylinders')
Barplot with its options
Vertical or horizontal. The legend as well can be horizontal or vertical.
counts <- table(gear, cyl)
counts
1 2 3 4 5 |
|
barplot(counts, main = 'Title', xlab = 'cylinders', ylab = 'count', ylim = c(0, 20), col = terrain.colors(3))
legend('topleft', inset = .04, title = 'gearbox',
c('3','4','5'), fill = terrain.colors(3), horiz = TRUE)
counts <- table(gear, cyl)
counts
1 2 3 4 5 |
|
barplot(counts, main = 'Title', xlab = 'cylinders', ylab = 'count', ylim = c(0, 25), col = terrain.colors(3), legend = rownames(counts))
counts <- table(gear, cyl)
counts
1 2 3 4 5 |
|
barplot(counts, main = 'Title', xlab = 'cylinders', ylab = 'count', ylim = c(0, 20), col = terrain.colors(3), legend = rownames(counts), beside = TRUE)
Spineplot
‘Count’ = blocks; categorical (with factors).
cyl2 <- as.factor(cyl) # mandatory for the y
gear2 <- as.factor(gear)
spineplot(gear2, cyl2, main = 'Title', xlab = 'gearbox', ylab = 'cylinders')
Count = blocks; continuous.
spineplot(mpg, cyl2, main = 'Title', xlab = 'miles per gallon', ylab = 'cylinders')
Mosaicplot
Count = blocks.
counts <- table(gear, cyl)
counts
1 2 3 4 5 |
|
mosaicplot(counts, main = 'Title', xlab = 'gearbox', ylab = 'cylinders')
Multivariate Plots¶
Pairs
pairs( ~mpg + disp + hp)
Coplot
coplot(mpg ~ hp | wt)
Correlograms
library(corrgram)
corrgram(mtcars, order = TRUE, lower.panel = panel.shade, upper.panel=panel.pie, text.panel = panel.txt, main = 'Car Milage Data in PC2/PC1 Order')
Plot a dataset with colors
library(RColorBrewer)
plot(iris, col = brewer.pal(3, 'Set1'))
Stars
The star branches are explanatory; be careful with the interpretation! Well-advised for visual and pattern exploration.
mtcars[1:4, c(1, 4, 6)]
1 2 3 4 5 |
|
stars(mtcars[1:4, c(1, 4, 6)])
Trivariate plots
image()
.contour()
.filled.contour()
.persp()
.symbols()
.
Times Series¶
Add packages: zoo
and xts
.
Basics
plot(AirPassengers, type = 'l')
Change the type =
y1 <- rnorm(100)
par(mfrow = c(2, 1))
plot(y1, type = 'p', main = 'p vs l')
plot(y1, type = 'l')
plot(y1, type = 'l', main = 'l vs h')
plot(y1, type = 'h')
plot(y1, type = 'l', lty = 3, main = 'l 3 vs o')
plot(y1, type = 'o')
plot(y1, type = 'b', main = 'b vs c')
plot(y1, type = 'c')
plot(y1, type = 's', main = 's vs S')
plot(y1, type = 'S')
# reverse
par(mfrow = c(1, 1))
Add a box
y1 <- rnorm(100)
y2 <- rnorm(100)
par(mfrow = (c(2, 1)))
plot(y1, type = 'l', axes = FALSE, xlab = '', ylab = '', main = '')
box(col = 'gray')
lines(x = c(20, 20, 40, 40), y = c(-7, max(y1), max(y1), -7), lwd = 3, col = 'gray')
plot(y2, type = 'l', axes = FALSE, xlab = '', ylab = '', main = '')
box(col = 'gray')
lines(x = c(20, 20, 40, 40), y = c(7, min(y2), min(y2), 7), lwd = 3, col = 'gray')
# reverse
par(mfrow = c(1,1))
Add lines and text within the plot
y1 <- rnorm(100)
# x goes from 0 to 100
# xaxt = 'n' remove the x ticks
plot(y1, type = 'l', lwd = 2, lty = 'longdash', main = 'Title', ylab = 'y', xlab = 'time', xaxt = 'n')
abline(h = 0, lty = 'longdash')
abline(v = 20, lty = 'longdash')
abline(v = 50, lty = 'longdash')
abline(v = 95, lty = 'longdash')
text(17, 1.5, srt = 90, adj = 0, labels = 'Tag 1', cex = 0.8)
text(47, 1.5, srt = 90, adj = 0, labels = 'Tag a', cex = 0.8)
text(92, 1.5, srt = 90, adj = 0, labels = 'Tag alpha', cex = 0.8)
A comprehensive example
# new data
head(Orange)
1 2 3 4 5 6 7 |
|
# convert factor to numeric for convenience
Orange$Tree <- as.numeric(Orange$Tree)
ntrees <- max(Orange$Tree)
# get the range for the x and y axis
xrange <- range(Orange$age)
yrange <- range(Orange$circumference)
# set up the plot
plot(xrange, yrange, type = 'n', xlab = 'Age (days)',
ylab = 'Circumference (mm)' )
colors <- rainbow(ntrees)
linetype <- c(1:ntrees)
plotchar <- seq(18, 18 + ntrees, 1)
# add lines
for (i in 1:ntrees) {
tree <- subset(Orange, Tree == i)
lines(tree$age, tree$circumference, type = 'b', lwd = 1.5,
lty = linetype[i], col = colors[i], pch = plotchar[i])
}
# add a title and subtitle
title('Tree Growth', 'example of line plot')
# add a legend
legend(xrange[1], yrange[2], 1:ntrees, cex = 0.8, col = colors,
pch = plotchar, lty = linetype, title = 'Tree')
Regressions and Residual Plots¶
# first
regr <- lm(mpg ~ hp)
summary(regr)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
plot(mpg ~ hp)
abline(regr)
par(mfrow = c(2, 2))
# then
plot(regr)
# reverse
par(mfrow = c(1, 1))
The lattice
and latticeExtra
Packages¶
library(lattice)
Coloring¶
# Show the default settings
show.settings()
# Save the default theme
mytheme <- trellis.par.get()
# Turn the B&W
trellis.par.set(canonical.theme(color = FALSE))
show.settings()
Documentation¶
A note on reordering the levels (factors)¶
# start
cyl <- mtcars$cyl
cyl <- as.factor(cyl)
cyl
1 2 |
|
levels(cyl)
1 |
|
# option 1
cyl <- factor(cyl, levels = c('8', '6', '4'))
# or levels = 3:1
# or levels = letters[3:1]
levels(cyl)
1 |
|
cyl <- mtcars$cyl
cyl <- as.factor(cyl)
# option 2
cyl <- reorder(cyl, new.order = 3:1)
levels(cyl)
1 |
|
library(lattice)
# normalized x-axis for comparison
barchart(Class ~ Freq | Sex + Age, data = as.data.frame(Titanic), groups = Survived, stack = TRUE, layout = c(4, 1), auto.key = list(title = 'Survived', columns = 2))
# free x-axis
barchart(Class ~ Freq | Sex + Age, data = as.data.frame(Titanic), groups = Survived, stack = TRUE, layout = c(4, 1), auto.key = list(title = 'Survived', columns = 2), scales = list(x = 'free'))
# or
bc.titanic <- barchart(Class ~ Freq | Sex + Age, data = as.data.frame(Titanic), groups = Survived, stack = TRUE, layout = c(4, 1), auto.key = list(title = 'Survived', columns = 2), scales = list(x = 'free'))
bc.titanic
# add bg grid
update(bc.titanic, panel = function(...) {
panel.grid(h = 0, v = -1)
panel.barchart(...)
})
# remove lines
update(bc.titanic, panel = function(...) {
panel.barchart(..., border = 'transparent')
})
# or
update(bc.titanic, border = 'transparent')
Titanic1 <- as.data.frame(as.table(Titanic[, , 'Adult' ,]))
Titanic1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
barchart(Class ~ Freq | Sex, Titanic1, groups = Survived, stack = TRUE, auto.key = list(title = 'Survived', columns = 2))
Titanic2 <- reshape(Titanic1, direction = 'wide', v.names = 'Freq', idvar = c('Class', 'Sex'), timevar = 'Survived')
names(Titanic2) <- c('Class', 'Sex', 'Dead', 'Alive')
barchart(Class ~ Dead + Alive | Sex, Titanic2, stack = TRUE, auto.key = list(columns = 2))
Uni-, Bi-, Multivariate Plots¶
Barchart
Like barplot()
.
# y ~ x
barchart(mpg ~ hp, main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon')
# y ~ x
barchart(mpg ~ hp, main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon', horizontal = FALSE)
barchart(VADeaths, groups = FALSE, layout = c(1, 4), aspect = 0.7, reference =FALSE, main = 'Title', xlab = 'rate per 100')
data(postdoc, package = 'latticeExtra')
barchart(prop.table(postdoc, margin = 1), xlab = 'Proportion', auto.key = list(adj = 1))
Change layout = c(x, y, page)
barchart(mpg ~ hp | factor(cyl), main = 'Title', xlab = 'horsepowers', ylab = 'cylinders - miles per gallon', layout = c(1,3))
barchart(mpg ~ hp | factor(cyl), main = 'Title', xlab = 'cylinders - horsepowers', ylab = 'miles per gallon', layout = c(3,1))
Change aspect = 1
1
for square.
barchart(mpg ~ hp | factor(cyl), main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon', layout = c(3,1), aspect = 1)
Colors
barchart(mpg ~ hp, group = cyl, auto.key = list(space = 'right'), main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon')
shingle()
; control the ranges.equal.count()
; grid.
Dotplot
Like dotchart()
.
dotplot(mpg, main = 'Title', xlab = 'miles per gallon')
dotplot(factor(cyl) ~ mpg, main = 'Title', xlab = 'miles per gallon', ylab = 'cylinders')
dotplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'gearbox - miles per gallon', ylab = 'cylinders', layout = c(3,1))
dotplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'miles per gallon', ylab = 'gearbox - cylinders', layout = c(1,3), aspect = 0.3)
dotplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'miles per gallon', ylab = 'gearbox - cylinders', layout = c(1,3), aspect = 0.3, origin = 0)
dotplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'miles per gallon', ylab = 'gearbox - cylinders', layout = c(1,3), aspect = 0.3, origin = 0, type = c('p', 'h'))
Set auto.key
.
# maybe we'll want this later
old.pars <- trellis.par.get()
#trellis.par.set(superpose.symbol = list(pch = c(1,3), col = 12:14))
trellis.par.set(superpose.symbol = list(pch = c(1,3), col = 1))
# Optionally put things back how they were
#trellis.par.set(old.pars)
Use auto.key
.
dotplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'miles per gallon', ylab = 'gearbox - cylinders', layout = c(1,3), groups = vs, auto.key = list(space = 'right'))
trellis.par.set(old.pars)
trellis.par.set(superpose.symbol = list(pch = c(1,3), col = 1))
dotplot(variety ~ yield | site, barley, layout = c(1, 6), aspect = c(0.7), groups = year, auto.key = list(space = 'right'))
trellis.par.set(old.pars)
Vertical.
dotplot(mpg ~ factor(cyl) | factor(gear), main = 'Title', xlab = 'cylinders', ylab = 'gearbox - miles per gallon', layout = c(1,3), aspect = 0.3)
library(readr)
density <- read_csv('density.csv')
density$Density <- as.numeric(density$Density)
dotplot(reorder(MetropolitanArea, Density) ~ Density, density, type = c('p', 'h'), main = 'Title', xlab = 'Population Density (pop / sq.mi)')
dotplot(reorder(MetropolitanArea, Density) ~ Density | Region, density, type = c('p', 'h'), strip = FALSE, strip.left = TRUE, layout = c(1, 3), scales = list(y = list(relation = 'free')), main = 'Title', xlab = 'Population Density (pop / sq.mi)')
Stripplot
Like stripchart()
.
stripplot(mpg, main = 'Title', xlab = 'miles per gallon')
stripplot(factor(cyl) ~ mpg, main = 'Title', xlab = 'miles per gallon', ylab = 'cylinders')
stripplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'gearbox - miles per gallon', ylab = 'cylinders', layout = c(1,3))
stripplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'gearbox - miles per gallon', ylab = 'cylinders', layout = c(1,3), groups = vs, auto.key = list(space = 'right'))
stripplot(mpg ~ factor(cyl) | factor(gear), main = 'Title', xlab = 'cylinders', ylab = 'gearbox - miles per gallon', layout = c(1,3))
Histogram
Like hist()
.
histogram(mpg, main = 'Title', xlab = 'miles per gallon')
histogram(~mpg | factor(cyl), layout = c(1, 3), main = 'Title', xlab = 'miles per gallon', ylab = 'density')
Densityplot
Like plot.density()
.
densityplot(mpg, main = 'Title', xlab = 'miles per gallon', ylab = 'density')
densityplot(~mpg | factor(cyl), layout = c(1, 3), main = 'Title', xlab = 'miles per gallon', ylab = 'density')
ECDFplot
library(latticeExtra)
ecdfplot(mpg, main = 'Title', xlab = 'miles per gallon', ylab = '')
BWplot
Like boxplot
.
bwplot(mpg, main = 'Title', xlab = 'miles per gallon', ylab = 'density')
bwplot(factor(cyl) ~ mpg, main = 'Title', xlab = 'miles per gallon', ylab = 'cylinders')
bwplot(factor(cyl) ~ mpg | factor(gear), main = 'Title', xlab = 'miles per gallon', ylab = 'gearbox - cylinders', layout = c(1,3))
bwplot(mpg ~ factor(cyl) | factor(gear), main = 'Title', xlab = 'gearbox - cylinders', ylab = 'miles per gallon', layout = c(3,1))
QQmath
Like qqnorm()
.
qqmath(mpg, main = 'Title', ylab = 'miles per gallon')
XYplot
Like plot()
.
xyplot(mpg ~ disp | factor(cyl), main = 'Title', xlab = 'horsepower', ylab = 'cylinders - miles per gallon', layout = c(1,3))
xyplot(mpg ~ disp | factor(cyl), main = 'Title', xlab = 'cylinder - horsepowers', ylab = 'miles per gallon', layout = c(3,1))
XYplot options
xyplot(mpg ~ disp | factor(cyl), main = 'Title', xlab = 'cylinder - horsepowers', ylab = 'miles per gallon', layout = c(3,1), aspect = 1)
xyplot(mpg ~ disp | factor(cyl), main = 'Title', xlab = 'cylinder - horsepowers', ylab = 'miles per gallon', layout = c(3,1), aspect = 1, scales = list(y = list(at = seq(10, 30, 10))))
meanmpg <- mean(mpg)
xyplot(mpg ~ disp | factor(cyl), main = 'Title', xlab = 'cylinder - horsepowers', ylab = 'miles per gallon', layout = c(3,1), aspect = 1, panel = function(...) {
panel.xyplot(...)
panel.abline(h = meanmpg, lty = 'dashed')
panel.text(450, meanmpg + 1, 'avg', adj = c(1, 0), cex = 0.7)
})
xyplot(mpg ~ disp | factor(cyl), main = 'Title', xlab = 'cylinder - horsepowers', ylab = 'miles per gallon', layout = c(3,1), aspect = 1, panel = function(x, y, ...) {
panel.lmline(x, y)
panel.xyplot(x, y, ...)
})
panel.points()
.panel.lines()
.panel.segments()
.panel.arrows()
.panel.rect()
.panel.polygon()
.panel.text()
.panel.abline()
.panel.lmline()
.panel.xyplot()
.panel.curve()
.panel.rug()
.panel.grid()
.panel.bwplot()
.panel.histogram()
.panel.loess()
.panel.violin()
.panel.smoothScatter()
.- …
par.settings
.- …
library(lattice)
data(SeatacWeather, package = 'latticeExtra')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'l', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'p', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'l', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'o', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'r', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'g', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 's', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'S', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'h', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'a', lty = 1, col = 'black')
xyplot(min.temp + max.temp + precip ~ day | month, ylab = 'Temperature and Rainfall', data = SeatacWeather, layout = c(3,1), type = 'smooth', lty = 1, col = 'black')
xyplot(mpg ~ hp, main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon')
xyplot(mpg ~ hp, main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon', type = 'o')
xyplot(mpg ~ hp, main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon', type = 'o', pch = 16, lty = 'dashed')
xyplot(mpg ~ hp, main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon')
data(USAge.df, package = 'latticeExtra')
xyplot(Population ~ Age | factor(Year), USAge.df, groups = Sex, type = c('l', 'g'), auto.key = list(points = FALSE, lines = TRUE, columns = 2), aspect = 'xy', ylab = 'Population (millions)', subset = Year %in% seq(1905, 1975, by = 10))
xyplot(Population ~ Year | factor(Age), USAge.df, groups = Sex, type = 'l', strip = FALSE, strip.left = TRUE, layout = c(1, 3), ylab = 'Population (millions)', auto.key = list(lines = TRUE, points = FALSE, columns = 2), subset = Age %in% c(0, 10, 20))
data(USCancerRates, package = 'latticeExtra')
xyplot(rate.male ~ rate.female | state, USCancerRates, aspect = 'iso', pch = '.', cex = 2, index.cond = function(x, y) { median(y - x, na.rm = TRUE) }, scales = list(log = 2, at = c(75, 150, 300, 600)), panel = function(...) {
panel.grid(h = -1, v = -1)
panel.abline(0, 1)
panel.xyplot(...)
},
xlab = 'a',
ylab = 'b')
data(biocAccess, package = 'latticeExtra')
baxy <- xyplot(log10(counts) ~ hour | month + weekday, biocAccess, type = c('p', 'a'), as.table = TRUE, pch = '.', cex = 2, col.line = 'black')
baxy
library(latticeExtra)
useOuterStrips(baxy)
xyplot(sunspot.year, aspect = 'xy', strip = FALSE, strip.left = TRUE, cut = list(number = 4, overlap = 0.05))
data(biocAccess, package = 'latticeExtra')
ssd <- stl(ts(biocAccess$counts[1:(24 * 30 *2)], frequency = 24), 'periodic')
xyplot(ssd, main = 'Title', xlab = 'Time (Days)')
Splom
splom(mtcars[c(1, 3, 6)], groups = cyl, data = mtcars, panel = panel.superpose, key = list(title = 'Three Cylinder Options', columns = 3, points = list(text = list(c('4 Cylinder', '6 Cylinder', '8 Cylinder')))))
trellis.par.set(superpose.symbol = list(pch = c(1,3, 22), col = 1, alpha = 0.5))
splom(~data.frame(mpg, disp, hp, drat, wt, qsec), data = mtcars, groups = cyl, pscales = 0, varnames = c('miles\nper\ngallon', 'displacement\n(cu.in(', 'horsepower', 'rear\naxle\nratio', 'weight', '1/4\nmile\ntime'), auto.key = list(columns = 3, title = 'Title'))
trellis.par.set(old.pars)
splom(USArrests)
splom(~USArrests[c(3,1,2,4)] | state.region, pscales = 0, type = c('g', 'p', 'smooth'))
Parallel plot
For multivariate continuous data.
parallelplot(~iris[1:4])
parallelplot(~iris[1:4], horizontal.axis = FALSE)
parallelplot(~iris[1:4], scales = list(x = list(rot = 90)))
parallelplot(~iris[1:4] | Species, iris)
parallelplot(~iris[1:4], iris, groups = Species,
horizontal.axis = FALSE, scales = list(x = list(rot = 90)))
Trivariate plots
Like image()
, contour()
, filled.contour()
, persp()
, symbols()
.
levelplot()
.contourplot()
.cloud()
.wireframe()
.
Additional Packages¶
The sm
Package (density)¶
library(sm)
Density plot
# create value labels
cyl.f <- factor(cyl, levels = c(4, 6, 8), labels = c('4 cyl', '6 cyl', '8 cyl'))
# plot densities
sm.density.compare(mpg, cyl, xlab = 'miles per gallon')
title(main = 'Title')
# add legend via mouse click
colfill <- c(2:(2 + length(levels(cyl.f))))
legend(25, 0.19, levels(cyl.f), fill = colfill)
The car
Package (scatter)¶
library(car)
Scatter plot
scatterplot(mpg ~ wt | cyl, data = mtcars, xlab = 'weight', ylab = 'miles per gallon', labels = row.names(mtcars))
Splom
scatterplotMatrix( ~mpg + disp + drat + wt | cyl, data = mtcars, main = 'Title')
scatterplotMatrix == spm
.
spm( ~mpg + disp + drat + wt | cyl, data = mtcars, main = 'Title')
The vioplot
Package (boxplot)¶
library(vioplot)
Violin boxplot
x1 <- mpg[mtcars$cyl == 4]
x2 <- mpg[mtcars$cyl == 6]
x3 <- mpg[mtcars$cyl == 8]
vioplot(x1, x2, x3, names = c('4 cyl', '6 cyl', '8 cyl'), col = 'green')
title('Title')
The vcd
Package (count, correlation, mosaic)¶
library(vcd)
The package provides a variety of methods for visualizing multivariate categorical data.
Count
counts <- table(gear, cyl)
counts
1 2 3 4 5 |
|
mosaic(counts, shade = TRUE, legend = TRUE)
Correlation
counts <- table(gear, cyl)
counts
1 2 3 4 5 |
|
assoc(counts, shade = TRUE)
Mosaic
ucb <- data.frame(UCBAdmissions)
ucb <- within(ucb, Accept <- factor(Admit, levels = c('Rejected', 'Admitted')))
library(vcd); library(grid)
doubledecker(xtabs(Freq~ Dept + Gender + Accept, data = ucb), gp = gpar(fill = c('grey90', 'steelblue')))
data(Fertility, package = 'AER')
doubledecker(morekids ~ age, data = Fertility, gp = gpar(fill = c('grey90', 'green')), spacing = spacing_equal(0))
doubledecker(morekids ~ gender1 + gender2, data = Fertility, gp = gpar(fill = c('grey90', 'green')))
doubledecker(morekids ~ age + gender1 + gender2, data = Fertility, gp = gpar(fill = c('grey90', 'green')), spacing = spacing_dimequal(c(0.1, 0, 0, 0)))
The hexbin
Package (scatter)¶
library(hexbin)
Scatter plot
# new data
data(NHANES)
# compare
plot(Serum.Iron ~ Transferin, NHANES, main = 'Title', xlab = 'Transferin', ylab = 'Iron')
# with
hexbinplot(Serum.Iron ~ Transferin, NHANES, main = 'Title', xlab = 'Transferin', ylab = 'Iron')
hexbinplot(mpg ~ hp, main = 'Title', xlab = 'horsepowers', ylab = 'miles per gallon')
x <- rnorm(1000)
y <- rnorm(1000)
bin <- hexbin(x, y, xbins = 50)
plot(bin, main = 'Title')
x <- rnorm(1000)
y <- rnorm(1000)
plot(x, y, main = 'Title', col = rgb(0, 100, 0, 50, maxColorValue = 255), pch = 16)
data(Diamonds, package = 'Stat2Data')
a = hexbin(Diamonds$PricePerCt, Diamonds$Carat, xbins = 40)
library(RColorBrewer)
plot(a)
Colors.
rf <- colorRampPalette(rev(brewer.pal(12, 'Set3')))
hexbinplot(Diamonds$PricePerCt ~ Diamonds$Carat, colramp = rf)
Mix lattice
and hexbin
data(gvhd10, package = 'latticeExtra')
xyplot(asinh(SSC.H) ~ asinh(FL2.H), gvhd10, aspect = 1, panel = panel.hexbinplot, .aspect.ratio = 1, trans = sqrt)
xyplot(asinh(SSC.H) ~ asinh(FL2.H) | Days, gvhd10, aspect = 1, panel = panel.hexbinplot, .aspect.ratio = 1, trans =sqrt)
The car
Package (scatter)¶
library(car)
Scatter plot
scatterplotMatrix(~mpg + disp + drat + wt | cyl, data = mtcars,
main = 'Three Cylinder Options')
The scatterplot3d
Package¶
library(scatterplot3d)
Scatter plot
scatterplot3d(wt, disp, mpg, main = 'Title')
scatterplot3d(wt, disp, mpg, pch = 16, highlight.3d = TRUE, type = 'h', main = 'Title')
s3d <- scatterplot3d(wt, disp, mpg, pch = 16, highlight.3d = TRUE, type = 'h', main = ' Title')
fit <- lm(mpg ~ wt + disp)
s3d$plane3d(fit)
The rgl
Package (interactive)¶
library(rgl)
Interactive plot
The plot will open a new window.
plot3d(wt, disp, mpg, col = 'red', size = 3)
The cluster
Package (dendrogram)¶
library(cluster)
Dendrogram
Use the iris
dataset.
subset <- sample(1:150, 20)
cS <- as.character(Sp <- iris$Species[subset])
cS
1 2 3 4 |
|
cS[Sp == 'setosa'] <- 'S'
cS[Sp == 'versicolor'] <- 'V'
cS[Sp == 'virginica'] <- 'g'
ai <- agnes(iris[subset, 1:4])
plot(ai, label = cS)
The extracat
Package (splom)¶
library(extracat)
Splom
For missing values. Binary matrix with reordering and filtering of rows
and columns. The x-axis shows the frequency of NA. The y-axis shows the
marginal distribution of NA.
# example 1
data(CHAIN, package = 'mi')
visna(CHAIN, sort = 'b')
summary(CHAIN)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
# example 2
data(oly12, package = 'VGAMdata')
oly12d <- oly12[, names(oly12) != 'DOB']
oly12a <- oly12
names(oly12a) <- abbreviate(names(oly12), 3)
visna(oly12a, sort = 'b')
# example 3
data(freetrade, package = 'Amelia')
freetrade <- within(freetrade, land1 <- reorder(country, tariff, function(x) sum(is.na(x))))
fluctile(xtabs(is.na(tariff) ~ land1 + year, data = freetrade))
1 |
|
# example 4
data(Pima.tr2, package = 'MASS')
visna(Pima.tr2, sort = 'b')
The ash
Package (density)¶
library(ash)
Density plot
plot(ash1(bin1(mtcars$mpg, nbin = 50)), type = 'l')
1 |
|
The KernSmooth
Package (density)¶
library(KernSmooth)
Density plot
with(mtcars, {
hist(mpg, freq = FALSE, main = '', col = 'bisque2', ylab = '')
lines(density(mpg), lwd = 2)
ks1 <- bkde(mpg, bandwidth = dpik(mpg))
lines(ks1, col = 'red', lty = 5, lwd = 2)})
The corrplot
Package (correlation)¶
library(corrplot)
Splom
# Create a correlation matrix for the dataset (9-14 are the '2' variables only)
correlations <- cor(mtcars)
corrplot(correlations)