2024-09-23
The code in this lecture assumes these three libraries are loaded:
In today’s lecture we recreate this:
gg stands for grammar of graphics.
Analogy: we learn verbs and nouns to construct sentences.
The first step in learning ggplot2 is breaking a graph apart into components.
Let’s break down the plot we want to recreate while introducing some ggplot2 terminology.
ggplot
objectsggplot
objectsTo see the plot we can print it:
We create graphs by adding layers.
Layers define geometries, compute summary statistics, define what scales to use, or even change styles.
To add layers, we use the symbol +
.
In general, a line of code will look like this:
So if we want to make a scatterplot, what geometry do we use?
Let’s look at the cheat sheet: https://rstudio.github.io/cheatsheets/data-visualization.pdf
To make a scatter plot we use geom_points
.
The help file tells us this is how we use it:
p
earlier, we can add a layer like this:x=
and y =
.geom_text
:aes
note that this call:is fine, whereas this call:
will give you an error since abb
is not found because it is outside of the aes
function.
geom_text
does not know where to find abb
: it’s a column name and not a global variable.size
can be an aesthetic mapping, but here it is not, so all points get bigger.nudge_x
is not an aesthetic mapping.aes
in the ggplot
function:aes
by defining one in the geometry functions:labs
function:A legend is added automatically!
murders |> ggplot(aes(population/10^6, total, label = abb)) +
geom_text(nudge_x = 0.05) +
scale_x_log10() +
scale_y_log10() +
labs(x = "Populations in millions (log scale)",
y = "Total number of murders (log scale)",
title = "US Gun Murders in 2010",
color = "Region") +
geom_point(aes(col = region), size = 3)
We want to add a line with intercept the US rate.
Lets compute that
murders |> ggplot(aes(population/10^6, total, label = abb)) +
geom_text(nudge_x = 0.05) +
scale_x_log10() +
scale_y_log10() +
labs(x = "Populations in millions (log scale)",
y = "Total number of murders (log scale)",
title = "US Gun Murders in 2010",
color = "Region") +
geom_point(aes(col = region), size = 3) +
geom_abline(intercept = log10(r), lty = 2, color = "darkgrey")
p
and add layers.p <- murders |> ggplot(aes(population/10^6, total, label = abb)) +
geom_text(nudge_x = 0.05) +
scale_x_log10() +
scale_y_log10() +
labs(x = "Populations in millions (log scale)",
y = "Total number of murders (log scale)",
title = "US Gun Murders in 2010",
color = "Region") +
geom_point(aes(col = region), size = 3) +
geom_abline(intercept = log10(r), lty = 2, color = "darkgrey")
ggthemes provides pre-designed themes.
Here is the FiveThirtyEight theme:
If you want to ruin the plot use the excel theme:
ThemePark provides fun themes:
This is a fan favorite:
To avoid the state abbreviations being on top of each other we can use the ggrepel package.
We change the layer geom_text(nudge_x = 0.05)
to geom_text_repel()
library(ggthemes)
library(ggrepel)
r <- murders |>
summarize(rate = sum(total) / sum(population) * 10^6) |>
pull(rate)
murders |> ggplot(aes(population/10^6, total, label = abb)) +
geom_abline(intercept = log10(r), lty = 2, color = "darkgrey") +
geom_point(aes(col = region), size = 3) +
geom_text_repel() +
scale_x_log10() +
scale_y_log10() +
labs(x = "Populations in millions (log scale)",
y = "Total number of murders (log scale)",
title = "US Gun Murders in 2010",
color = "Region") +
theme_economist()
We often want to put plots next to each other.
The gridExtra package permits us to do that:
There are several additional packages for combining ggplot2 plots into visually appealing layouts:**
cowplot: A versatile package designed for publication-quality plots, offering seamless integration with ggplot2.
ggpubr: Provides user-friendly functions for combining and annotating ggplot2 plots with minimal effort.
New packages frequently emerge. Explore beyond these options and stay curious—there might be new tools that suit your needs even better!