Admittedly, we still need to cover basic descriptive statistics and data fundamentals
All of this is coming in 2 weeks as we return to statistics and econometric theory
But let's start with the fun stuff right away, even if you don't fully know the reasons: data visualiation
mpg from the ggplot2 librarylibrary(ggplot2)head(mpg)
## # A tibble: 6 x 11## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…Base R is very powerful and intuitive to plot, but not very sexy
Basic syntax for most types of plots:
plot_type(my_df$variable)
$ by just typing the variable names and then in another argument to the plotting function, specify data = my_dfplot_type(my_df$variable1, my_df$variable2, data = my_df)mpg data, plotting a histogram of hwyhist(mpg$hwy)

mpg data, plotting a boxplot of hwyboxplot(mpg$hwy)

mpg data, plotting a boxplot of hwy by classboxplot(mpg$hwy ~ mpg$class)
# second methodboxplot(mpg ~ class, data = mtcars)
~ is part of R's “formula notation”: +'sy~x+z means "y is explained by x and z"
mpg data, plotting a scatterplot of hwy against displplot(mpg$hwy ~ mpg$displ)
# second methodplot(hwy ~ displ, data = mpg)

"The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Largely (but not only) created by Hadley Wickham
We will look at this much more extensively next week!
This "flavor" of R will make your coding life so much easier!

ggplot2 is perhaps the most popular package in R and a core element of the tidyverse
gg stands for a grammar of graphics
Very powerful and beautiful graphics, very customizable and reproducible, but requires a bit of a learning curve
All those "cool graphics" you've seen in the New York Times, fivethirtyeight, the Economist, Vox, etc use the grammar of graphics


Hadley Wickham
Chief Scientist, R Studio
"The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive."
This is a true grammar
We don’t talk about specific chart types
Instead we talk about specific chart components

Any graphic can be built from the same components:
Not every plot needs every component, but all plots must have the first 3!

Any graphic can be built from the same components:
data to be drawn fromaesthetic mappings from data to some visual markinggeommetric objects on the plotscale define the range of valuescoordinates to organize locationlabels describe the scale and markingsfacet group into subplotstheme style the plot elementsNot every plot needs every component, but all plots must have the first 3!

Produces plot output in viewer
Does not save plot
Export menu in viewerAdding layers requires whole code for new plot
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()+ geom_smooth()
Saves your plot as an R object
Does not show in viewer
Can add layers by calling the original plot name
# make and save plotp <- ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()p # view plot# add a layerp + geom_smooth() # shows new plotp <- p + geom_smooth() # saves and overwrites pp2 <- p + geom_smooth() # saves as different object

ggplot(data = mpg)
Data is the source of our data. As part of the tidyverse, ggplot2 requires data to be "tidy"1:
Each variable forms a column
Each observation forms a row
Each observational unit forms a table
1 Data "tidyness" is the core element of all tidyverse packages. Much more on all of this next class.
Add a layer with + at the end of a line (never at the beginning!)
Style recommendation: start a new line after each + to improve legibility!
We will build a plot layer-by-layer
+ aes()
Aesthetics map data to visual elements or parameters

+ aes()
Aesthetics map data to visual elements or parameters

+ aes()
Aesthetics map data to visual elements or parameters
displ
hwy
class
+ aes()
Aesthetics map data to visual elements or parameters
displ → x
hwy → y
class → shape, size, color, etc.
+ aes()
Aesthetics map data to visual elements or parameters

+ aes()
Aesthetics map data to visual elements or parameters
aes(x = displ, y = hwy, color = class)
+ geom_*()
Geometric objects displayed on the plot

+ geom_*()
Geometric objects displayed on the plot
geoms you should use depends on what you want to show:| Type | geom |
|---|---|
| Point | geom_point() |
| Line | geom_line(), geom_path() |
| Bar | geom_bar(), geom_col() |
| Histogram | geom_histogram() |
| Regression | geom_smooth() |
| Boxplot | geom_boxplot() |
| Text | geom_text() |
| Density | geom_density() |
+ geom_*()
Geometric objects displayed on the plot
## [1] "geom_abline" "geom_area" "geom_bar" "geom_bin2d" ## [5] "geom_blank" "geom_boxplot" "geom_col" "geom_contour" ## [9] "geom_count" "geom_crossbar" "geom_curve" "geom_density" ## [13] "geom_density_2d" "geom_density2d" "geom_dotplot" "geom_errorbar" ## [17] "geom_errorbarh" "geom_freqpoly" "geom_hex" "geom_histogram" ## [21] "geom_hline" "geom_jitter" "geom_label" "geom_line" ## [25] "geom_linerange" "geom_map" "geom_path" "geom_point" ## [29] "geom_pointrange" "geom_polygon" "geom_qq" "geom_qq_line" ## [33] "geom_quantile" "geom_raster" "geom_rect" "geom_ribbon" ## [37] "geom_rug" "geom_segment" "geom_sf" "geom_sf_label" ## [41] "geom_sf_text" "geom_smooth" "geom_spoke" "geom_step" ## [45] "geom_text" "geom_tile" "geom_violin" "geom_vline"
See http://ggplot2.tidyverse.org/reference for many more options
+ geom_*()
Geometric objects displayed on the plot
Or just start typing geom_ in R Studio!

ggplot(data = mpg)

ggplot(data = mpg)+ aes(x = displ, y = hwy)

ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()

ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))

ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()

+ geom_*()
geom_*(aes, data, stat, position)
data: geoms can have their own data
aes: geoms can have their own aesthetics
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()

+ geom_*()
geom_*(aes, data, stat, position)
stat: some geoms statistically transform data
geom_histogram() uses stat_bin() to group observations into binsposition: some adjust location of objects
dodge, stack, jitterggplot(data = mpg)+ aes(x = class, y = hwy)+ geom_boxplot()

ggplot(data = mpg)+ aes(x = class)+ geom_bar()

ggplot(data = mpg)+ aes(x = class, fill = drv)+ geom_bar()

ggplot(data = mpg)+ aes(x = class, fill = drv)+ geom_bar(position = "dodge")

p <- ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()p # show plot

+ facet_wrap()
+ facet_grid()
p + facet_wrap(~year)

+ facet_wrap()
+ facet_grid()
p + facet_grid(cyl~year)

+ labs()
p + facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")

+ scale_*_*()
scale+_+<aes>+_+<type>+()
<aes>: parameter you want to adjust<type: type of parameter
I want to change my discrete x-axis: scale_x_discrete()
scale_y_continuous()scale_x_log10()scale_fill_discrete(), scale_color_manual()ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()

+ theme_*()
Theme changes appearance of plot decorations (things not mapped to data)
Some themes that come with ggplot2:
+ theme_bw()
+ theme_dark()+ theme_gray()+ theme_minimal()+ theme_light()+ theme_classic()+ theme_*()
Theme changes appearance of plot decorations (things not mapped to data)
Many parameters we could change
Global options: line, rect, text, title
axis: x-, y-, or other axis title, ticks, lineslegend: plot legends for fill or colorpanel: actual plot areaplot: whole imagestrip: facet labelsggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_bw()

ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()

ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()+ theme(text = element_text(family = "Fira Sans"))

ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")

+ theme_*()
ggthemes package adds some other nice themes# install if you don't have it# install.packages("ggthemes")library("ggthemes") # load package
library("ggthemes")ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_economist()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")

library("ggthemes")ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_fivethirtyeight()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")

aes() can go in base (data) layer and/or in individual geom() layersgeoms will inherit global aes from data layer unless overridden# ALL GEOMS will map data to colorsggplot(data = mpg, aes(x = displ, y = hwy, color = class))+ geom_point()+ geom_smooth()

# ONLY points will map data to colorsggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(color = class))+ geom_smooth()

aesthetics such as size and color can be mapped from data or set to a single valueaes(), set outside of aes() # Point colors are mapped from class dataggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(color = class))+ geom_smooth()

# Point colors are all set to blueggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(), color = "red")+ geom_smooth(aes(), color = "blue")

# I did some (hidden) data work before this! ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop_dens, group = group))+ geom_polygon(color = "gray90", size = 0.05)+ coord_equal()+ scale_fill_brewer(palette="Blues", labels = c("0-10", "10-50", "50-100", "100-500", "500-1,000", "1,000-5,000", ">5,000"))+ labs(fill = "Population per\nsquare mile") + theme_map() + guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")

library("gapminder")library("gganimate")ggplot(gapminder) + aes(x = gdpPercap, y = lifeExp, size = pop, color = country) + geom_point() + guides(color = FALSE, size = FALSE) + scale_x_log10( breaks = c(10^3, 10^4, 10^5), labels = c("$1k", "$10k", "$100k")) + scale_color_manual(values = gapminder::country_colors) + scale_size(range = c(0.5, 12)) + labs( x = "GDP per capita", y = "Life Expectancy", caption = "Source: Hans Rosling's gapminder.org") + theme_minimal(14, base_family = "Fira Sans") + theme( strip.text = element_text(size = 16, face = "bold"), panel.border = element_rect(fill = NA, color = "grey40"), panel.grid.minor = element_blank())+ transition_states(year, 1, 0)+ ggtitle("Income and Life Expectancy - {closest_state}")

We will return to various graphics as we cover descriptive statistics and regression
I hope to cover some basic principles of good graphic design for figures and plots
We will return to various graphics as we cover descriptive statistics and regression
I hope to cover some basic principles of good graphic design for figures and plots
Remember:

"Shoot me"

"Shoot me"

Less is More:


New York Times: "How Stable Are Democracies? ‘Warning Signs Are Flashing Red’", Nov 29, 2016
On ggplot2
ggplot2's website reference sectionOn data visualization
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |