This post contains a detailed guide on how to produce a streamchart
to explore the appearances of the most popular characters in Chris
Claremont’s X-Men comics with ggstream
. This blogpost
contains step-by-step explanations together with useful tricks to
customize up to the smallest detail of a visualization.
This page showcases the work of Cédric Scherer, built for the TidyTuesday initiative. You can find the original code on his github repository here
Thanks to him for accepting sharing his work here! 🙏🙏 Thanks also to Tomás Capretto who help writing down the blogpost!
As a teaser, here is the plot we’re gonna try building:
Let’s start by loading the packages needed to build the figure. They
are all great packages and today’s chart wouldn’t be possible without
them. But
ggstream
is the one that brings streamplots to ggplot2
and
deserves to be highlighted in this little introduction.
# Load packages
library(tidyverse)
library(fuzzyjoin)
library(ggstream)
library(colorspace)
library(ggtext)
library(cowplot)
Next, we set the theme for the plot. This theme is built on top of
theme_minimal()
and uses the font
"Reem Kufi"
. Don’t know how to make custom fonts work in
R
? Have a look at
this guide
especially made for you!
theme_set(theme_minimal(base_family = "Reem Kufi", base_size = 12))
theme_update(
plot.title = element_text(
size = 25,
face = "bold",
hjust = .5,
margin = margin(10, 0, 30, 0)
),plot.caption = element_text(
size = 9,
color = "grey40",
hjust = .5,
margin = margin(20, 0, 5, 0)
),axis.text.y = element_blank(),
axis.title = element_blank(),
plot.background = element_rect(fill = "grey88", color = NA),
panel.background = element_rect(fill = NA, color = NA),
panel.grid = element_blank(),
panel.spacing.y = unit(0, "lines"),
strip.text.y = element_blank(),
legend.position = "bottom",
legend.text = element_text(size = 9, color = "grey40"),
legend.box.margin = margin(t = 30),
legend.background = element_rect(
color = "grey40",
size = .3,
fill = "grey95"
),legend.key.height = unit(.25, "lines"),
legend.key.width = unit(2.5, "lines"),
plot.margin = margin(rep(20, 4))
)
This guide shows how to create a highly customized and beautiful streamchart to visualize the number of appearences of the most popular characters in Chris Claremont’s sixteen-year run on Uncanny X-Men.
The original source of data for this week are the
Claremont Run Project and
Malcom Barret who put
these datasets into a the R package
cleremontrun. This guide uses the character_visualization
dataset
released for the
TidyTuesday
initiative on the week of 2021-06-30. You can find the original
announcement and more information about the data
here. Thank you all for making this possible!
<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-30/character_visualization.csv') df_char_vis
The following is a data frame that ranks the most popular X-Men characters according to this source. Today’s chart is based on the top 5 most popular characters.
<- tibble(
df_best_chars rank = 1:10,
char_popular = c("Wolverine", "Magneto",
"Nightcrawler", "Gambit",
"Storm", "Colossus",
"Phoenix", "Professor X",
"Iceman", "Rogue")
)
The "character"
column in
df_char_vis
contains more information than just the name
of the characters In the next chunk,
regex_inner_join()
from the
fuzzyjoin
package automatically uses regular expressions
to merge df_char_vis
and df_best_chars
into
df_best_stream
. This dataset contains the number of
appearences per issue by character, costume, and type of appearence.
<- df_char_vis %>%
df_best_stream regex_inner_join(df_best_chars, by = c(character = "char_popular")) %>%
group_by(character, char_popular, costume, rank, issue) %>%
summarize_if(is.numeric, sum, na.rm = TRUE) %>%
ungroup() %>%
filter(rank <= 5) %>% # Keep top 5 characters
filter(issue < 281)
The following step isn’t strictly necessary, but it’s a cool trick to make the start and end of the stream smoother.
<- df_best_stream %>%
df_smooth group_by(character, char_popular, costume, rank) %>%
slice(1:4) %>%
mutate(
issue = c(
min(df_best_stream$issue) - 20,
min(df_best_stream$issue) - 5,
max(df_best_stream$issue) + 5,
max(df_best_stream$issue) + 20
),speech = c(0, .001, .001, 0),
thought = c(0, .001, .001, 0),
narrative = c(0, .001, .001, 0),
depicted = c(0, .001, .001, 0)
)
The data is pivoted into a long format. A new variable,
char_costume
, contains both the name of the character and
the costume (costumed or casual).
## factor levels for type of appearance
<- c("depicted", "speech", "thought", "narrative")
levels
## factorized data in long format
<- df_best_stream %>%
df_best_stream_fct bind_rows(df_smooth) %>%
mutate(
costume = if_else(costume == "Costume", "costumed", "casual"),
char_costume = if_else(
== "Storm",
char_popular ::glue("{char_popular} ({costume})"),
glue::glue("{char_popular} ({costume}) ")
glue
),char_costume = fct_reorder(char_costume, rank)
%>%
) pivot_longer(
cols = speech:depicted,
names_to = "parameter",
values_to = "value"
%>%
) mutate(parameter = factor(parameter, levels = levels))
And finally, we define the color palette and some data that will be useful when adding annotations to the plot.
# Define the color palette
<- c(
pal "#FFB400", lighten("#FFB400", .25, space = "HLS"),
"#C20008", lighten("#C20008", .2, space = "HLS"),
"#13AFEF", lighten("#13AFEF", .25, space = "HLS"),
"#8E038E", lighten("#8E038E", .2, space = "HLS"),
"#595A52", lighten("#595A52", .15, space = "HLS")
)
# These are going to be labels added to each panel
<- tibble(
labels issue = 78,
value = c(-21, -19, -14, -11),
parameter = factor(levels, levels = levels),
label = c("Depicted", "Speech\nBubbles", "Thought\nBubbles", "Narrative\nStatements")
)
# These are going to be the text annotations
# If you wonder about the '**' or the '<sup>' within the text, let me tell you
# this is just Markdown syntax used by the ggtext library to make custom text
# annotations very easy!
<- tibble(
texts issue = c(295, 80, 245, 127, 196),
value = c(-35, 35, 30, 57, 55),
parameter = c("depicted", "depicted", "thought", "speech", "speech"),
text = c(
'**Gambit** was introduced for the first time in issue #266 called "Gambit: Out of the Frying Pan"— nevertheless, he is the **4<sup>th</sup> most popular X-Men character**!',
'**Wolverine is the most popular X-Men** and has a regular presence in the X-Men comics between 1975 and 1991.',
'**Storm** is by far the most thoughtful of the five most popular X-Men characters, especially in issues #220, #223 and #265. Storm **ranks 5<sup>th</sup>**.',
"**Magneto** was ranked by IGN as the *Greatest Comic Book Villain of All Time*. And even though he only appears from time to time he **ranks 2<sup>nd</sup>**—<br>4 ranks higher than his friend and opponent Professor X!",
'The **3<sup>rd</sup> most popular X-men character Nightcrawler** gets injured during the "Mutant Massacre" and fell into a coma after an attack from Riptide in issue #211.'
),char_popular = c("Gambit", "Wolverine", "Storm", "Magneto", "Nightcrawler"),
costume = "costumed",
vjust = c(.5, .5, .4, .36, .38)
%>%
) mutate(
parameter = factor(parameter, levels = levels),
char_costume = if_else(
== "Storm",
char_popular ::glue("{char_popular} ({costume})"),
glue::glue("{char_popular} ({costume}) ")
glue
),char_costume = factor(char_costume, levels = levels(df_best_stream_fct$char_costume))
)
Thanks to ggstream
, it’s quite simple to build a
streamchart in ggplot2
. All we need to use is the
geom_stream()
function. On top of that, this first
version also sets the color scales and uses
facet_grid()
to obtain one stream per type of appearence.
<- df_best_stream_fct %>%
g ggplot(
aes(
issue, value, color = char_costume,
fill = char_costume
)+
) geom_stream(
geom = "contour",
color = "white",
size = 1.25,
bw = .45 # Controls smoothness
+
) geom_stream(
geom = "polygon",
bw = .45,
size = 0
+
) scale_color_manual(
expand = c(0, 0),
values = pal,
guide = "none"
+
) scale_fill_manual(
values = pal,
name = NULL
+
) facet_grid( ## needs facet_grid for space argument
~ .,
parameter scales = "free_y",
space = "free"
)
g
Note geom_stream()
is used twice above. The first time,
it adds a white contour to each area. As a result, when the second
stream is added on top, only the outermost contour line remains,
creating a very nice highlighting effect. Nice trick!
The plot above looks really well, but it’s so minimalistic in its annotations that it misses the opportunity to share important information. The next step is to add labels and text to make this chart more insightful.
<- g +
g geom_vline(
data = tibble(x = c(97, seq(125, 250, by = 25), 280)),
aes(xintercept = x),
inherit.aes = FALSE,
color = "grey88",
size = .5,
linetype = "dotted"
+
) annotate(
"rect",
xmin = -Inf, xmax = 78,
ymin = -Inf, ymax = Inf,
fill = "grey88"
+
) annotate(
"rect",
xmin = 299, xmax = Inf,
ymin = -Inf, ymax = Inf,
fill = "grey88"
+
) # Appearence type label on each panel
geom_text(
data = labels,
aes(issue, value, label = label),
family = "Reem Kufi",
inherit.aes = FALSE,
size = 4.7,
color = "grey25",
fontface = "bold",
lineheight = .85,
hjust = 0
+
) # Add informative text
# geom_textbox comes with the great ggtext library.
geom_textbox(
data = texts,
aes(
issue, value, label = text,
color = char_costume,
color = after_scale(darken(color, .12, space = "HLS")),
vjust = vjust
),family = "Reem Kufi",
size = 2.7,
fill = "grey95",
maxwidth = unit(7.25, "lines"),
hjust = .5
+
) # Customize labels of the horizontal axis
scale_x_continuous(
limits = c(74, NA),
breaks = c(94, seq(125, 250, by = 25), 280),
labels = glue::glue("Issue\n#{c(97, seq(125, 250, by = 25), 280)}"),
position = "top"
+
) scale_y_continuous(expand = c(.03, .03)) +
# This clip="off" is very important. It allows to have annotations anywhere
# in the plot, no matter they are not within the extent of
# the corresponding panel.
coord_cartesian(clip = "off")
g
Those annotations are definetely a game changer!
There’s been tremendous progress since the first chart. The last step
is to add a very cool title that will make this marvelous even more
attractive. The function draw_image()
from the
cowplot
library makes it really easy to add an image on top of the plot. Ready
to finish this up? Let’s go!
<- g +
g labs(
title = "Appearance of the Five Most Popular X-Men Characters in Chris Claremont's Comics",
caption = "Visualization by Cédric Scherer • Data by Claremont Run Project via Malcom Barret • Popularity Scores by ranker.com • Logo by Comicraft"
)
<- ggdraw(g) +
g # It works with only the path to the file! :)
draw_image(
"img/fromTheWeb/uncannyxmen.png",
x = .84, y = .955,
width = .1,
hjust = .5,
vjust = .5
) g
And finally, if you want to export this streamchart in a high quality
format, it’s good to use ggsave
with the
agg_png
device from the
ragg
library.
ggsave("img/fromTheWeb/streamchart-xmen.png", g,
width = 16, height = 13, device = ragg::agg_png)
Here we are, with a very highly customized plot showcasing the
possibilities offered by the tidyverse and other packages like
ggstream
, ggtext
, and many others. Thanks
again to Cédric for providing this chart example!
👋 After crafting hundreds of R charts over 12 years, I've distilled my top 10 tips and tricks. Receive them via email! One insight per day for the next 10 days! 🔥