A custom scatterplot with auto-positioned labels to explore the
palmerpenguins dataset made with
R and the
tidyverse. This blogpost guides you through a highly
customized scatterplot that includes a variety of custom colors,
markers, and fonts. The library
ggrepel is used to
automatically adjust the position of labels in the plots.
This page showcases the work of
Tuo Wang that introduces
packages to make
plots more beautiful. You can find the original code on Tuo’s blog
Thanks to him for accepting sharing his work here! Thanks also to Tomás Capretto who split the original code into this step-by-step guide!
As usual, it is first necessary to load some packages before
building the figure.
provides geoms for
ggplot2 to repel overlapping text
labels. Text labels repel away from each other, away from data
points, and away from edges of the plotting area in an automatic
is used to generate random names that will be the text labels in the
randomNames is only available
for R > 4.0.0.
library(ggrepel) library(palmerpenguins) library(randomNames) library(tidyverse)
The palmerpenguins data were collected and made available by
Dr. Kristen Gorman
Palmer Station, Antarctica LTER, a member of the
Long Term Ecological Research Network. This dataset was popularized by
Allison Horst in her R
with the goal to offer an alternative to the iris dataset for data
exploration and visualization.
data("penguins", package = "palmerpenguins")
First of all, observations with missing values are discarded from the dataset.
Then, each observation is assined a random name. A new variable
highlight is added to the dataset to indicate which names
are highlighted in the plot. These are the names starting with the
## Generate random names # The results of set.seed may depends on R version. set.seed(2021+03+27) <- randomNames(nrow(penguins), which.names = "first") name_vector ## Create 'highlight' indicator variable <- penguins %>% penguins mutate( name = name_vector, highlight = case_when( str_starts(name, "C") ~ name, TRUE ~ "" ) )
Today’s chart is a scatterplot that shows the association between
the flip length and the bill length of the penguins in the dataset.
Point are colored according to the species to add an extra layer of
information to the vizualisation. The first step is to create a
basic colored scatterplot with
ggplot2. Let’s get
# Note `color = species` and s`hape = species`. # This means each species will have BOTH a different color and shape. <- ggplot(penguins, aes(x = flipper_length_mm, y = bill_length_mm)) + plt geom_point( aes(color = species, shape = species), size = 1.5, alpha = 0.8 # It's nice to add some transparency because there may be overlap. + ) # Use custom colors scale_color_manual( values = c("#386cb0", "#fdb462", "#7fc97f") )
That’s a pretty good start, but let’s make it better!
What’s truely missing here are labels. It’s very frustrating not knowing which item is hidden under a data point, isn’t it!?
It is pretty challenging to add many labels on a plot since labels
tend to overlap each other, making the figure unreadable.
ggrepel package is here to help us. It
provides an algorithm that will automatically place the labels for
us. Let’s do it!
<- plt + plt geom_text_repel( aes(label = highlight), family = "Poppins", size = 3, min.segment.length = 0, seed = 42, box.padding = 0.5, max.overlaps = Inf, arrow = arrow(length = unit(0.010, "npc")), nudge_x = .15, nudge_y = .5, color = "grey50" )
Isn’t it wonderful how well
The chart above is pretty close from being publication ready. What’s needed now is a good title, a legend to make color and shapes more insightful, and some axis customization:
<- plt + plt # Add axes labels, title, and subtitle labs( title = "Palmer Penguins Data Visualization", subtitle = "Scatter plot of flipper lenth vs bill length", x = "flip length (mm)", y = "bill length (mm)" + ) theme( # The default font when not explicitly specified text = element_text(family = "Lobster Two", size = 8, color = "black"), # Customize legend text, position, and background. legend.text = element_text(size = 9, family = "Roboto"), legend.title = element_text(face = "bold", size=12, family = "Roboto"), legend.position = c(1, 0), legend.justification = c(1, 0), legend.background = element_blank(), # This one removes the background behind each key in the legend legend.key = element_blank(), # Customize title and subtitle font/size/color plot.title = element_text( family = "Lobster Two", size = 20, face = "bold", color = "#2a475e" ),plot.subtitle = element_text( family = "Lobster Two", size = 15, face = "bold", color = "#1b2838" ),plot.title.position = "plot", # Adjust axis parameters such as size and color. axis.text = element_text(size = 10, color = "black"), axis.title = element_text(size = 12), axis.ticks = element_blank(), # Axis lines are now lighter than default axis.line = element_line(colour = "grey50"), # Only keep y-axis major grid lines, with a grey color and dashed type. panel.grid.minor = element_blank(), panel.grid.major.x = element_blank(), panel.grid.major.y = element_line(color = "#b4aea9", linetype ="dashed"), # Use a light color for the background of the plot and the panel. panel.background = element_rect(fill = "#fbf9f4", color = "#fbf9f4"), plot.background = element_rect(fill = "#fbf9f4", color = "#fbf9f4") )
What a lovely plot!