For this post, we need to load the following library:

The gtsummary
uses the `tbl_summary()`

to generate the summary table and
works well with the `%>%`

symbol.

It automatically **detects data type** and use it to
decides what type of statistics to compute. By default, it’s: -
**median, 1st and 3rd quartile** for numeric columns -
**number of observations** and proportion for categorical
columns

```
library(gtsummary)
# create dataset
data("Titanic")
df = as.data.frame(Titanic)
# create the table
df %>%
tbl_summary()
```

Characteristic |
N = 32^{1} |
---|---|

Class | |

1st | 8 (25%) |

2nd | 8 (25%) |

3rd | 8 (25%) |

Crew | 8 (25%) |

Sex | |

Male | 16 (50%) |

Female | 16 (50%) |

Age | |

Child | 16 (50%) |

Adult | 16 (50%) |

Survived | 16 (50%) |

Freq | 14 (1, 77) |

^{1} n (%); Median (IQR) |

If you want to add p-values to the table, you **have
to** add `by=variable_name`

in the
`tbl_summary()`

function. This happens because p-values are
used to compare things between them.

The variable in the `by`

argument will be used to
**split the dataset** into multiple sub-samples (2 if it’s
dichotomous, 3 if there are 3 distinct labels in the variable, etc).
Those samples will be **compared** for each column in the
dataset, and the test done depends on the type of data.

In this case, we add: - `add_p()`

to create a new column
for p-values - `add_overall()`

to add a new column for
descriptive statistics for the whole sample

```
library(gtsummary)
# create dataset
data("Titanic")
df = as.data.frame(Titanic)
# create the table
df %>%
tbl_summary(by=Survived) %>%
add_overall() %>%
add_p() #%>%
```

Characteristic |
Overall, N = 32^{1} |
No, N = 16^{1} |
Yes, N = 16^{1} |
p-value^{2} |
---|---|---|---|---|

Class | >0.9 | |||

1st | 8 (25%) | 4 (25%) | 4 (25%) | |

2nd | 8 (25%) | 4 (25%) | 4 (25%) | |

3rd | 8 (25%) | 4 (25%) | 4 (25%) | |

Crew | 8 (25%) | 4 (25%) | 4 (25%) | |

Sex | >0.9 | |||

Male | 16 (50%) | 8 (50%) | 8 (50%) | |

Female | 16 (50%) | 8 (50%) | 8 (50%) | |

Age | >0.9 | |||

Child | 16 (50%) | 8 (50%) | 8 (50%) | |

Adult | 16 (50%) | 8 (50%) | 8 (50%) | |

Freq | 14 (1, 77) | 9 (0, 96) | 14 (10, 75) | 0.6 |

^{1} n (%); Median (IQR) |
||||

^{2} Fisher’s exact test; Pearson’s Chi-squared test; Wilcoxon rank sum test |

Thanks to the `add_stat()`

function, we can create new
column based on our own functions.

Below, we define an **anova** function that returns the
p-values of an **ANOVA** and pass it to the
`add_stat()`

function.

```
library(gtsummary)
# create dataset
data("iris")
df = as.data.frame(iris)
my_anova = function(data, variable, by, ...) {
result = aov(as.formula(paste(variable, "~", by)), data = data)
summary(result)[[1]]$'Pr(>F)'[1] # Extracting the p-value for the group effect
}
# create the table
df %>%
tbl_summary(by=Species) %>%
add_overall() %>%
add_p() %>%
add_stat(fns = everything() ~ my_anova) %>%
modify_header(
list(
add_stat_1 ~ "**p-value**",
all_stat_cols() ~ "**{level}**"
)
) %>%
modify_footnote(
add_stat_1 ~ "ANOVA")
```

Characteristic |
Overall^{1} |
setosa^{1} |
versicolor^{1} |
virginica^{1} |
p-value^{2} |
p-value^{3} |
---|---|---|---|---|---|---|

Sepal.Length | 5.80 (5.10, 6.40) | 5.00 (4.80, 5.20) | 5.90 (5.60, 6.30) | 6.50 (6.23, 6.90) | <0.001 | 0.000 |

Sepal.Width | 3.00 (2.80, 3.30) | 3.40 (3.20, 3.68) | 2.80 (2.53, 3.00) | 3.00 (2.80, 3.18) | <0.001 | 0.000 |

Petal.Length | 4.35 (1.60, 5.10) | 1.50 (1.40, 1.58) | 4.35 (4.00, 4.60) | 5.55 (5.10, 5.88) | <0.001 | 0.000 |

Petal.Width | 1.30 (0.30, 1.80) | 0.20 (0.20, 0.30) | 1.30 (1.20, 1.50) | 2.00 (1.80, 2.30) | <0.001 | 0.000 |

^{1} Median (IQR) |
||||||

^{2} Kruskal-Wallis rank sum test |
||||||

^{3} ANOVA |

This post explained how to create summary table using the gtsummary library. For more of this package, see the dedicated section or the table section.

👋 After crafting hundreds of R charts over 12 years, I've distilled
my top 10 tips and tricks. Receive them via email!
**One insight per day for the next 10 days**! 🔥