Boxplot hides the distribution behind each group. This post show how to tackle this issue in base R, adding individual observation using dots with jittering.
Boxplot can be dangerous: the exact distribution of each group is hidden behind boxes as explained in data-to-viz.
If the amount of observation is not too high, you can add individual observations on top of boxes, using jittering to avoid dot overlap.
In base R, it is done manually creating a function that adds dot one by one, computing a random X position for all of them.
# Create data
<- c(rep("A", 80) , rep("B", 50) , rep("C", 70))
names <- c( rnorm(80 , mean=10 , sd=9) , rnorm(50 , mean=2 , sd=15) , rnorm(70 , mean=30 , sd=10) )
value <- data.frame(names,value)
data
# Basic boxplot
boxplot(data$value ~ data$names , col=terrain.colors(4) )
# Add data points
<- levels(data$names)
mylevels <- summary(data$names)/nrow(data)
levelProportions for(i in 1:length(mylevels)){
<- mylevels[i]
thislevel <- data[data$names==thislevel, "value"]
thisvalues
# take the x-axis indices and add a jitter, proportional to the N in each level
<- jitter(rep(i, length(thisvalues)), amount=levelProportions[i]/2)
myjitter points(myjitter, thisvalues, pch=20, col=rgb(0,0,0,.9))
}
👋 After crafting hundreds of R charts over 12 years, I've distilled my top 10 tips and tricks. Receive them via email! One insight per day for the next 10 days! 🔥