sqlrdbirpostgres

How to avoid making multiple connections to postgres database when accessing using R


I'm using the following code however it is creating multiple connections when calling the map function and they are not closing. As a result my rds database is getting flooded with connections. Is there any way to change this code to prevent so many connections?

   connect.to.database <- function (dbname, schema = "public", host, port, user, pass) {
      con <- dbConnect(RPostgres::Postgres(),
                       dbname = dbname,
                       user = user,
                       password = pass,
                       host = host,
                       port = port)
      
      
      # this puts the schema in the search path, which means that instead of
      # having to use <schema name>.<table name> you can just write <table name>
      res <- dbSendQuery(con, paste0("SET search_path TO ",
                                     dbQuoteIdentifier(con, schema),
                                     ", public"))
      
      # check for errors
      dbFetch(res)
      dbClearResult(res)
      
      con
    }

    schemas <- dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT schema_name FROM information_schema.schemata"))
    
    schema_names <- schemas %>% pull()
    
    schemas_tables <- map(.x = schema_names,~dbGetQuery(connect.to.database(dbname, "public", host, port, user, password), paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>% mutate(schema_name = .x)) %>%
                      bind_rows()

Solution

  • Create a single global connection object and use it within map. (I remove the unnecessary paste0 from your first query.)

    conn <- connect.to.database(dbname, "public", host, port, user, password)
    schema <- dbGetQuery(conn, "SELECT schema_name FROM information_schema.schemata")
    
    schemas_tables <- map(
      .x = schema$schema_name,
      ~ dbGetQuery(conn, paste0("SELECT table_name FROM information_schema.tables WHERE table_schema = ","'",.x,"'")) %>%
        mutate(schema_name = .x)
    ) %>%
      bind_rows()
    

    You may want to consider parameterized queries vice constructing query strings manually. While there are security concerns about malicious SQL injection (e.g., XKCD's Exploits of a Mom aka "Little Bobby Tables"), it is also a concern for malformed strings or Unicode-vs-ANSI mistakes, even if it's a single data analyst running the query. Both DBI (with odbc) and RODBC support parameterized queries, either natively or via add-ons.

    That would change this to:

    schemas_tables <- map(
      .x = schema$schema_name,
      ~ dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema = ?",
                   params = list(.x)) %>%
        mutate(schema_name = .x)
    ) %>%
      bind_rows()
    

    But frankly, I think it might be much easier to use IN instead of =. Again, using parameter-binding.

    schemas_tables <- dbGetQuery(conn, "SELECT table_name FROM information_schema.tables WHERE table_schema IN (?)",
                                 params = list(schema$schema_name))
    

    (No map required.)

    Or I believe you can do it in one query, not two.

    dbGetQuery(conn, "
        select table_name
        from information_schema.tables
        where table_schema in (
          select schema_name from information_schema.schemata
        )")
    

    Remember

    ... to close the connection when you are done.

    dbDisconnect(conn)