The ggsankey
package in R is an
extension of the ggplot2
package, designed to create flow
This post showcases the key
features of ggsankey
and provides a set of
diagram examples using the package.
The ggsankey
package in R is an extension of the ggplot2
package, designed to create flow visualizations.
It offers a set of functions that make it easy to specify flow diagrams in a declarative manner.
✍️ author → David Sjoberg
📘 documentation → github
⭐️ more than 250 stars on github
To get started with ggsankey
, you can install its
developpment version directly from Github using the
The ggsankey
package extends the grammar of graphics to
include the description of flow diagrams, specifically, sankey, alluvial
and sankey bump diagrams. You start with a dataframe in wide format,
transform it using the make_long()
function, and then add
flow-specific layers to your ggplot.
Here’s a basic example where we show how dimensions are linked using a sankey diagram:
# Create a simple dataset about education and career paths
df <- data.frame(
education = c(rep("High School", 40), rep("Bachelor", 35), rep("Master", 25)),
field = c(rep("Science", 20), rep("Arts", 20), rep("Business", 30), rep("Engineering", 30)),
job = c(rep("Research", 25), rep("Teaching", 25), rep("Industry", 30), rep("Consulting", 20))
# Convert to long format for Sankey diagram
df_long <- df %>%
make_long(education, field, job)
# Create the diagram
aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_sankey() +
Labels with geom_sankey_label()
nicely places labels in
the center of nodes if given the same aesthetics.
also comes with custom minimalistic themes that
can be used. Here we use theme_sankey()
aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_sankey(alpha = 0.8) +
scale_fill_viridis_d(option = "plasma", name = "Category") +
theme_sankey(base_size = 14) +
labs(title = "Education to Career Path Flow",
x = NULL) +
theme(legend.position = "bottom",
legend.title = element_text(hjust = 0.5), = "center",
plot.title = element_text(hjust = 0.5),
axis.text.y = element_blank(),
axis.ticks = element_blank()) +
scale_x_discrete(labels = c("Education", "Field", "Job"))
Alluvial diagrams are very similiar to sankey diagrams but have no spaces between nodes and start at y = 0, instead of being centered around the x-axis.
This diagram shows how individuals progress from their education
level through their field of study to their eventual job role. The
function creates smooth flowing bands
between the nodes, with the width of each band
representing the number of individuals following each path.
aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node)) +
geom_alluvial(flow.alpha = .6) +
geom_alluvial_text(size = 3, color = "black", space = 0.5) + # Adjusted space and changed text color
scale_fill_viridis_d(option = "plasma", drop = FALSE) +
theme_alluvial(base_size = 18) +
labs(x = NULL) +
theme(legend.position = "none",
plot.title = element_text(hjust = .5)) +
ggtitle("Education to Career Pathways") +
scale_x_discrete(labels = c("Education", "Field", "Job"))
Sankey bump plots combine characteristics of bump charts and Sankey diagrams, visualizing how rankings and volumes change over time. When a group’s value surpasses another, it shifts upward in the visualization, creating a distinctive “bumping” effect.
This visualization shows how different renewable energy sources have
contributed to global electricity generation from 2000 to 2023. The
function creates flowing streams that
expand or contract to show changes in generation capacity, with
technologies shifting positions when their relative contribution changes
over time.
# Read and prepare the data
renewable_data <- read.csv("") %>%
# Select last 10 years for better visualization
filter(Year >= 2000) %>%
cols = c(wind_generation__twh, hydro_generation__twh,
solar_generation__twh, other_renewables_including_bioenergy_generation__twh),
names_to = "technology",
values_to = "generation"
) %>%
technology = case_when(
technology == "wind_generation__twh" ~ "Wind",
technology == "hydro_generation__twh" ~ "Hydro",
technology == "solar_generation__twh" ~ "Solar",
technology == "other_renewables_including_bioenergy_generation__twh" ~ "Other Renewables"
# Create the plot
aes(x = Year,
node = technology,
fill = technology,
value = generation)) +
geom_sankey_bump(space = 0,
type = "alluvial",
color = "transparent",
smooth = 6) +
scale_fill_viridis_d(option = "inferno", alpha = .8) +
scale_x_continuous(breaks = scales::pretty_breaks(), expand = c(0,0)) +
scale_y_continuous(labels = scales::comma, expand = c(0,0)) +
theme_sankey_bump(base_size = 16) +
labs(x = NULL,
y = "Electricity Generation (TWh)",
fill = "Technology",
title = "Global Renewable Energy Generation by Source",
subtitle = "Period: 2000-2023",
caption = "Data sources: Ember (2024), Energy Institute - Statistical Review of World Energy (2024)\nProcessed by Our World in Data") +
theme(legend.position = "bottom",
legend.title = element_text(hjust = 0.5),
legend.title.position = "top",
plot.caption = element_text(hjust = 0, size = 10, face = "italic"))
👋 After crafting hundreds of R charts over 12 years, I've distilled my top 10 tips and tricks. Receive them via email! One insight per day for the next 10 days! 🔥