I would like to use R-Shiny to develop a web app that accesses an SQL database with sensitive data.
So far I have written the database query as a pure SQL string.
However, this approach is very susceptible to SQL injections.
My intention is therefore to write the database query with dplyr
, as the query is written in R and not in pure SQL.
Unfortunately, I have not yet been able to find any more detailed information or an estimate on this topic online. Therefore I would like to clarify my request in this question.
So far I have only read a statement about the dbplyr::translate_sql()
function that is supposed to protect SQL injections:
https://dbplyr.tidyverse.org/articles/sql-translation.html#vectors
As my data is very confidential, dbplyr / dplyr should provide 100% protection against SQL injections.
I would be very pleased to receive further information and assessments on this topic.
UPDATE
As suggested in the comments, I have added an example for reproducibility:
(adapted from: https://shiny.posit.co/r/articles/build/pool-dplyr/)
library(shiny)
library(DBI)
library(pool)
library(tidyverse)
library(dbplyr)
pool <- dbPool(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest")
ui <- fluidPage(
textInput("ID", "Enter your ID:", "5"),
tableOutput("tbl"),
numericInput("nrows", "How many cities to show?", 10),
plotOutput("popPlot"))
server <- function(input, output, session) {
output$tbl <- renderTable({
pool %>% tbl("City") %>%
filter(ID == !!input$ID)
})
output$popPlot <- renderPlot({
df <- pool %>% tbl("City") %>%
head(as.integer(input$nrows)[1]) %>% collect()
pop <- df$Population
names(pop) <- df$Name
barplot(pop)
})
}
shinyApp(ui, server)
The threat of SQL injection depends on how the user-provided text is incorporated into the dynamic SQL query. So if you are trying to protect against injection, you need to assess each of the ways user input is used when querying the database.
In the example you have posted there are two pieces of user input:
input$nrows
which is limited to numeric by the input widgetinput$ID
which is free text, and hence has greater risk of SQL injection.You can use any of the approaches below to investigate each piece of user input. Overall, the approach I would choose is to Sanitise and then Review.
This answer focuses on the case where you are using dbplyr to translate from R to SQL. If other approaches are used to access the database (such as the DBI package) then a different approach will be required.
Consider where input$ID
is used in the example code:
pool %>%
tbl("City") %>%
filter(ID == !!input$ID)
In order to execute this, dbplyr will translate the dplyr commands from R into SQL. We can review this translation using the show_query
command.
pool %>%
tbl("City") %>%
filter(ID == !!input$ID) %>%
show_query()
This will probably produce an SQL query similar to the following:
SELECT *
FROM city
WHERE ID == 'my_id_value'
You do not need the entire Shiny app to test this for SQL injection. You can simply vary this query and see what happens. For example, try a simple alternative query:
attempt_inject = 'my_id_value;SELECT 123'
pool %>%
tbl("City") %>%
filter(ID == !!attempt_inject) %>%
show_query()
I suspect this will produce SQL similar to the following:
SELECT *
FROM city
WHERE ID == 'my_id_value;SELECT 123'
Which would seem an effective prevention for SQL injection. But you will need to test this as thoroughly as required to ensure you are confident.
The dbplyr package is open source and available online. You could review the source code to confirm how inputs are handled during translation.
My suspicion is that unless inputs are of type sql
, then they will be escaped as either text or an SQL object.
There is an sql()
function that converts input to type sql
. One of its uses is to tell dbplyr not to translate the contents of the function but to use them as-is. This means that if your user can submit R objects as inputs then this is a significant area of vulnerability. However, this is unlikely to be the case as your app restricts users to submitting text and numeric input.
R has several advantages over pure SQL, in that you can use R to validate users' input before including it in a dbplyr query.
Consider this an an alternative to your output$tbl
component of the app:
output$tbl <- renderTable({
current_ID = as.character(input$ID)
acceptable_ids = pool %>%
tbl("City") %>%
select(ID) %>%
distinct() %>%
collect() %>%
pull()
req(current_ID %in% acceptable_ids)
pool %>%
tbl("City") %>%
filter(ID == !!current_ID)
})
This approach enforces that the user input is converted to type character and is a value found in the ID column, before passing the user input through to dbplyr.
The exact choice of checks will depend on how user input is used. I often include a check that user input does not contain special characters (such as ;{}[]*
or spaces).
dbplyr includes commands to delimit certain inputs. Investigate and use these commands where applicable.
Notes
req
is a Shiny command that stops execution of a component if the condition is not met. You can use stopifnot
outside a Shiny context.renderTable
statement is not efficient app design. This should be calculated once and reused multiple times.