I found a nice explanation of data validation on RPubs: https://rpubs.com/techanswers88/validate-data-in-r. I have tried to reproduce the steps in an R notebook. All the steps work perfectly until I get to the tables with the results. The example uses flextable to tabulate the errors from validating the example dataset. The flextable shows well when I knit the notebook preview:flextable in Notebook preview. However, when I knit to PDF the same flextable looks terrible: Flextable in PDF. So I went to kable/KableExtra, as I am more familiar with there packages and I style most of my tables with these. Kable/kableExtra usually works very well with both the R Notebooks and PDF. With Kable/KableExtra I get a more compact table which I prefer. However, some of the validation criteria in the table contain the pipe character (e.g. "(DBP == 0 | SBP ==0) & Outcome == 'Survived' )". The pipe character shows as '|' in the table. This specifically happens as soon as I pass the table setup through KableExtra's styling options.
In the MWE below I have left out the graphics of the original example I tried to reproduce and only focus on the troublesome table. R version used: 4.3.2; dplyr 1.1.4; Flextable 0.9.4; kableExtra 1.4.0; validate 1.1.5.
MWE:
---
title: "kableExtra vs flextable"
output:
pdf_document:
toc: true
number_sections: true
html_notebook:
toc: true
number_sections: true
---
```{r}
#Load Libraries
library(dplyr)
library(validate)
library(flextable)
library(kableExtra)
```
```{r}
#Create dataset (from https://rpubs.com/techanswers88/validate-data-in-r)
myData <- data.frame(PatientID = c('P001', 'P002', 'P003' , NA, 'P005', 'P006', 'P007', 'P008',
'P009', 'P010', 'P008', 'P11'),
Age = c(23, 12, 5, 8, 245, NA, 23, 45, 87, 121, 56, 130),
Outcome = c('Died', 'Died', NA, 'Survived', 'Survived', 'Survived', 'Survived',
'Survived', 'Survived', '?', 'Survived', 'Unknown'),
SBP = c(0, 0, 120, 80, 45, 67, 100, 130, 350, 120, 46, 120),
DBP = c(0, 0, 80, 70, 30, 40, 80, 210, NA, 80, 0, 80)
)
#Add an id field in the data.
myData <- myData%>%
dplyr::mutate(id = row_number())
```
```{r}
#Create validation rules
myrules <- validator( "Patient Id Unique" = is_unique(PatientID),
"Patient ID" = is.na(PatientID),
"Age in range" = Age >= 0 & Age <= 120,
"Outcome validity " = Outcome %in% c('Survived', 'Died'),
"SBP in range" = SBP >= 0 & SBP <= 300,
"DBP in range" = in_range (DBP, min = 0 , max = 200),
"Survived with No BP" = (DBP == 0 | SBP ==0) & Outcome == 'Survived' )
```
```{r}
#create validation results with the confront function
output <- confront(myData, myrules, key ="id")
```
```{r}
#Show errors in table
#Convert our output into a dataframe first
dout <- as.data.frame(output)
```
```{r}
#Errors for each record id
dErrors <- dout%>%
dplyr::filter(! value == TRUE)%>%
dplyr::select(id, name, expression)%>%
dplyr::arrange(id)
#Display validation results in flextable
#This part works well in a R Notebook or HTML, but not in PDF.
ft <- flextable(dErrors)%>%
theme_booktabs()%>%
merge_v(j = ~id)%>%
set_header_labels(name = "Error",
id = "Record ID",
expression = "Validation expression")
ft
```
```{r}
#Try to reproduce the flextable with kable/kableExtra
#This produces a more compact table but unfortunately without the pipe character
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, escape = TRUE) %>%
kable_classic(full_width = FALSE) %>%
column_spec(3, width = "10cm") %>%
collapse_rows(columns = 1, valign = "top")
```
I have found that if I only sent the results through kable but not kableExtra
```{r}
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, escape = TRUE)
```
then the pipe symbol will show in the table . However as soon as I try to do any styling (e.g. collapse_rows() or kable_styling) then the pipe symbol will be changed to '|'. Passing "format = "latex" to kable does not solve the problem, nor does "escape = TRUE". As said, the pipe character will show in the table in PDF up until the moment I add any styling. I can't find a solution online or in the manual of kableExtra. I have tried updating R and the packages but without result.
I tried to 'fix' flextable to see if I could make that work to get a better looking table, but without success. I have tried to change the width of the flextable with ft <- fit_to_width(ft, max_width = 15, unit = "cm")
. However this does not do anything with the flextable width in the PDF and all text remains smashed together. Some advice online suggest to use 'latex_engine: xelatex' in the YAML header however this does not have any effect either.
I would prefer to use kableExtra as I am more familiar with that package. Any advice on how to show the pipe character when using kableExtra?
This seems to work if you use the format
option of kable
:
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, format = "html") %>%
kable_classic(full_width = FALSE) %>%
column_spec(3, width = "10cm") %>%
collapse_rows(columns = 1, valign = "top")
If you want to knit either to HTML or to LaTeX, you can do:
fmt <- ifelse(knitr::is_html_output(), "html", "latex")
dout %>%
filter(!value==TRUE) %>%
select(id, name, expression) %>%
arrange(id) %>%
kable(booktabs = TRUE, format = fmt) %>%
kable_classic(full_width = FALSE) %>%
column_spec(3, width = "10cm") %>%
collapse_rows(columns = 1, valign = "top")