# Problem Set 1

Due by 11:59 PM Sunday September 6, 2020

ANSWERS:

## Instructions

There are several ways you can complete and turn in this homework assignment:

1. Type up any applicable answers (saving any plots as images and including them) in a (e.g. Word) document and save it as a PDF and turn in a (commented!) .R file of commands for each relevant question.

2. [Not relevant for this homework, but in the future] If you wish to write out answers by hand, you may either print the pdf above or write your answers (all I need is your work and answers) on your own paper and then please scan/photograph & convert them to a single PDF, if they are easily readable, but this is not preferred. See my guide to making a PDF

3. Download the `.Rmd` file, do the homework in markdown, and email to me a single `knit`ted `html` or `pdf` file. Be sure that it shows all of your code (i.e. all chunks have `echo = TRUE` options), otherwise I will also ask for the markdown file.

To minimize confusion, I suggest creating a new `R Project` (e.g. `hw1`) and storing any data and plots in that folder on your computer. See my example workflow.

You may work together (and I highly encourage that) but you must turn in your own answers. I grade homeworks 70% for completion, and for the remaining 30%, pick one question to grade for accuracy - so it is best that you try every problem, even if you are unsure how to complete it accurately.

## The Popularity of Baby Names

Install and load the package `babynames`. Get help for `?babynames` to see what the data includes.

### Question 1

#### Part A

What are the top 5 boys names for 2017, and what percent of overall names is each?

#### Part B

What are the top 5 girls names, and what percent of overall names is each?

### Question 2

Make two barplots, of these top 5 names, one for each sex. Map `aes`thetics `x` to `name` and `y` to `prop`Or `percent`, if you made that variable, as I did.

and use `geom_col` (since you are declaring a specific `y`, otherwise you could just use `geom_bar()` and just an `x`.)

### Question 3

Find your name.If your name isn’t in there 😟, pick a random name.

`count` by `sex` how many babies since 1880 were named your name.Hint: if you do this, you’ll get the number of rows (years) there are in the data. You want to add the number of babies in each row (`n`), so inside `count`, add `wt=n` to weight the count by `n`.

Also add a variable for the percent of each sex.

### Question 4

Make a line graph of the number of babies with your name over time, `color`ed by `sex`.

### Question 5

#### Part A

Make a table of the most common name for boys by year between 1980-2017.Hint: once you’ve got all the right conditions, you’ll get a table with a lot of data. You only want to `slice` the `1`st row for each table.

#### Part B

Now do the same for girls.

### Question 6

Now let’s graph the evolution of the most common names since 1880.

#### Part A

First, find out what are the top 10 overall most popular names for boys and for girls. You may want to create two vectors, each with these top 5 names.

#### Part B

Now make two `line`graphs of these 5 names over time, one for boys, and one for girls.

### Question 7

Bonus (a challenge!): What are the 10 most common “gender-neutral” names?This is hard to define. For our purposes, let’s define this as names where between 48 and 52% of the babies with the name are Male.

## Political and Economic Freedom Around the World

For the remaining questions, we’ll look at the relationship between Economic Freedom and Political Freedom in countries around the world today. Our data for economic freedom comes from the Fraser Institute, and our data for political freedom comes from Freedom House.

### Question 8

Download these two datasets that I’ve cleaned up a bit:If you want, try downloading them from the websites yourself!

Load them with `df<-read_csv("name_of_the_file.csv")` and save one as `econfreedom` and the other as `polfreedom`. Look at each `tibble` you’ve created.

### Question 9

The `polfreedom` dataset is still a bit messy. Let’s overwrite it (or assign to something like `polfreedom2`) and select `Country.Territory` and `Total` (total freedom score) and rename `Country.Territory` to `Country`.

### Question 10

Now we can try to merge these two datasets into one. Since they both have `Country` as a variable, we can merge these tibbles using `left_join(econfreedom, polfreedom, by="Country")`Note, if you saved as something else in question 9., use that instead of `polfreedom`!

and save this as a new tibble (something like `freedom`).

### Question 11

Now make a scatterplot of Political Freedom (`total`)Feel free to `rename` these!

as `y` on Economic Freedom (`ef`) as `x` and `color` by `continent`.

### Question 12

Let’s do this again, but highlight some key countries. Pick three countries, and make a new tibble from `freedom` that is only the observations of those countries. Additionally, install and load a packaged called `ggrepel`This automatically adjusts labels so they don’t cover points on a plot!

Next, redo your plot from question 11, but now add a layer: `geom_label_repel` and set its `data` to your three-country tibble, use same `aes`thetics as your overall plot, but be sure to add `label = ISO`, to use the ISO country code to label.You might also want to set a low `alpha` level to make sure the labels don’t obscure other points!

### Question 13

Make another plot similar to 12, except this time use GDP per Capita (`gdp`) as `y`. Feel free to try to put a regression line with `geom_smooth()`!If you do, be sure to set its data to the full `freedom`, not just your three countries!