1.3 — Data Visualization with ggplot2 — R Practice
Getting Set Up
Before we begin, start a new file with
New File \(\rightarrow\)
R Script. As you work through this sheet in the console in
R, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.
“Our Plot” from Class
Download and run in R Studio on your computer (or open the file in our R Studio cloud project and run it there) to see our plot from class.
Exploring the Data
We will look at GDP per Capita and Life Expectancy using some data from the gapminder project. There is a handy package called
gapminder that uses a small snippet of this data for exploratory analysis. Install and load the package
?gapminder and hit enter to see a description of the data.
Let’s get a quick look at
gapminder to see what we’re dealing with.
- Get the
structure of the
- What variables are there?
- Look at the
headof the dataset to get an idea of what the data looks like.
summarystatistics of all variables.
Simple Plots in Base R
Let’s make sure you can do some basic plots before we get into the
gg. Use base
hist() function to plot a histogram of
boxplot() function to plot a boxplot of
Now make it a boxplot by
continent.Hint: use formula notation with
Now make a scatterplot of
gdpPercap on the \(x\)-axis and
LifeExp on the \(y\)-axis.
Load the package
ggplot2 (you should have installed it previously. If not, install first with
Let’s first make a
bar graph to see how many countries are in each continent. The only
aesthetic you need is to map
x. Bar graphs are great for representing categories, but not quantitative data.
For quantitative data, we want a
histogram to visualize the distribution of a variable. Make a
gdpPercap. Your only
aesthetic here is to map
Now let’s try adding some color, specifically, add an
aesthetic that maps
color refers to the outside borders of a
geom (except points),
fill is the interior of an object.
Instead of a
histogram, change the
geom to make it a
density graph. To avoid overplotting, add
alpha=0.4 to the
geom argument (alpha changes the transparency of a
Redo your plot from 11 for
lifeExp instead of
Now let’s try a scatterplot for
x). You’ll need both for
geom here is
Add some color by mapping
color in your
Now let’s try adding a regression line with
geom_smooth(). Add this layer on top of your
Did you notice that you got multiple regression lines (colored by continent)? That’s because we set a
aesthetic of mapping
color. If we want just one regression line, we need to instead move the
color = continent inside the
geom_point. This will only map
color for points, not for anything else.
Now add an
aesthetic to your
points to map
Change the color of the regression line to
"black". Try first by putting this inside an
aes() in your
geom_smooth, and try a second time by just putting it inside
geom_smooth without an
aes(). What’s the difference, and why?
Another way to separate out continents is with
+facet_wrap(~continent) to create subplots by
facet layer. The
scale is quite annoying for the
x-axis, a lot of points are clustered on the lower level. Let’s try changing the scale by adding a layer:
Now let’s fix the labels by adding
labs, make proper axes titles for
y, and a
title to the plot. If you want to change the name of the legends (continent color), add one for
Now let’s try subsetting by looking only at North America. Take the
gapminder dataframe and subset it to only look at
continent=="Americas"). Assign this to a new dataframe object (call it something like
america.) Now, use this as your
data, and redo the graph from question 17. (You might want to take a look at your new dataframe to make sure it worked first!)
Try this again for the whole world, but just for observations in the year 2002.