rversion-controlcommentshelp-system

R: Finding code with authors' comments, if any


Is there any reasonably simple, straightforward function along the line of getAnywhere() which returns the source code of a function with any comments, such that if I see no comments I can be confident that there are none, whether the code is in R, c, c++, Fortran, or something else? For example, stats:::plot.acf does not seem to have any comments in it. Can I conclude from this that there are no comments on its text?

I understand that there is a flowchart-like search process where if you know that source is written in R, then that source including comments is available from a specific github repository via some search method appropriate tp gethub. Also if you have determined that code is in some specified other language it is available via a more elaborate search process that involves finding the correct file and then doing text search within it, different for base and contributed packages. I am under the impression that at least until recently there was no shortcut to learning and working your way through that implicit flowchart search method if you wanted to learn whether there is a version of the code which contains comments. Moreover, I believe that that versions of the code which do or don't contain comments were nowhere identified as such, except by the comments themselves or by prior knowledge.

However, R is a pretty rapidly evolving ecosystem and I don't think it is entirely unreasonable to hope that simpler tools for determining whether there is a version of the source that includes comments, and finding it if there is, might now exist. Do they?


Solution

  • Whether the source code of an R function is preserved internally (via its srcref attribute) depends on the value of option keep.source when the function is defined. By source code, I mean the code as entered by the user, with comments, possibly inconsistent indentation, possibly inconsistent spacing around operators, etc.

    options(keep.source = FALSE)
    f <- function(x) {
        ## A comment
            x +
      1}
    
    getSrcref(f)
    ## NULL # (invisibly)
    
    deparse(f, control = "all")
    ## [1] "function (x) "
    ## [2] "{"            
    ## [3] "    x + 1"    
    ## [4] "}" 
    
    options(keep.source = TRUE)
    g <- function(x) {
        ## A comment
            x +
      1}
    
    getSrcref(g)
    ## function(x) {
    ##     ## A comment
    ##         x +
    ##   1}
    
    deparse(g, control = "all")
    ## [1] "function(x) {"   
    ## [2] "    ## A comment"
    ## [3] "        x +"     
    ## [4] "  1}"
    

    Whether functions in a contributed package retain their source code depends on options passed to R CMD INSTALL when the package was built from sources (by you or by CRAN). The default is to discard source code, but you can avoid that by installing from sources and setting the --with-keep.source flag:

    install.packages(pkgs, type = "source", INSTALL_opts = "--with-keep.source")
    

    Functions in base packages (base, stats, etc.) won't have their source code unless you build R itself from sources with environment variable R_KEEP_PKG_SOURCE set to yes—at least, that is what I infer from ?options. To learn about building R, see the corresponding manual.

    Given a function with source references, you can programmatically extract comments from the source code. A quick and dirty approach is pattern matching:

    zzz <- deparse(g, control = "all")
    grep("#", zzz, value = TRUE)
    ## [1] "    ## A comment"
    

    There can be false positives, though, because the pattern # also matches strings and non-syntactic names containing the hash character, which aren't comments at all.

    grep("#", "\"## Not a comment\"", value = TRUE)
    ## [1] "\"## Not a comment\""
    

    A much more robust way to extract comments is to examine the parse data for tokens of type COMMENT:

    getParseData(parse(text = zzz), includeText = NA)
    ##    line1 col1 line2 col2 id parent          token terminal         text
    ## 23     1    1     4    4 23      0           expr    FALSE             
    ## 1      1    1     1    8  1     23       FUNCTION     TRUE     function
    ## 2      1    9     1    9  2     23            '('     TRUE            (
    ## 3      1   10     1   10  3     23 SYMBOL_FORMALS     TRUE            x
    ## 4      1   11     1   11  4     23            ')'     TRUE            )
    ## 20     1   13     4    4 20     23           expr    FALSE             
    ## 6      1   13     1   13  6     20            '{'     TRUE            {
    ## 8      2    5     2   16  8     20        COMMENT     TRUE ## A comment
    ## 17     3    9     4    3 17     20           expr    FALSE             
    ## 10     3    9     3    9 10     12         SYMBOL     TRUE            x
    ## 12     3    9     3    9 12     17           expr    FALSE             
    ## 11     3   11     3   11 11     17            '+'     TRUE            +
    ## 14     4    3     4    3 14     15      NUM_CONST     TRUE            1
    ## 15     4    3     4    3 15     17           expr    FALSE             
    ## 16     4    4     4    4 16     20            '}'     TRUE            }
    

    Clearly, getParseData returns much more information than you need. Here is a utility that you can use instead, which takes as an argument a function with source references and returns a character vector listing the comments, if any:

    getComments <- function(func) {
        func <- match.fun(func)
        if (is.null(getSrcref(func))) {
            stop("'func' has no source references")
        }
        data <- getParseData(func, includeText = NA) 
        if (is.null(data)) {
            op <- options(keep.source = TRUE, keep.parse.data = TRUE)
            on.exit(options(op))
            expr <- parse(text = deparse(func, control = "all"))
            data <- getParseData(expr, includeText = NA)
        }
        data$text[data$token == "COMMENT"]
    }
    
    getComments(g)
    ## [1] "## A comment"
    
    h <- function(x) {
        ## I will comment
                ##     anywhere
        ######## and with as many hashes
        x + 1 # as I want!
    }
    
    getComments(h)
    ## [1] "## I will comment"                
    ## [2] "##     anywhere"                 
    ## [3] "######## and with as many hashes"
    ## [4] "# as I want!"
    
    ## You will need Rtools on Windows and Command Line Tools on macOS
    ## to install from sources packages containing C/C++/Fortran code.
    ## 'lme4' is one such package ... feel free to choose a different one.
    install.packages("lme4", type = "source", INSTALL_opts = "--with-keep.source")
    getComments(lme4::lmer)
    ##  [1] "## , ...)"                                                                              
    ##  [2] "## see functions in modular.R for the body .."                                          
    ##  [3] "## back-compatibility kluge"                                                            
    ##  [4] "## if (!is.null(list(...)[[\"family\"]])) {"                                            
    ##  [5] "##    warning(\"calling lmer with 'family' is deprecated; please use glmer() instead\")"
    ##  [6] "##    mc[[1]] <- quote(lme4::glmer)"                                                    
    ##  [7] "##    if(missCtrl) mc$control <- glmerControl()"                                        
    ##  [8] "##    return(eval(mc, parent.frame(1L)))"                                               
    ##  [9] "## }"                                                                                   
    ## [10] "## update for  back-compatibility kluge"                                                
    ## [11] "## https://github.com/lme4/lme4/issues/50"                                              
    ## [12] "## parse data and formula"                                                              
    ## [13] "## create deviance function for covariance parameters (theta)"                          
    ## [14] "## optimize deviance function over covariance parameters"                               
    ## [15] "## prepare output"
    

    AFAIK, there is no convenient mechanism for checking whether C code called by an R function contained comments before it was compiled...

    Relevant documentation is a bit scattered, as always. I have found these help pages useful: ?parse, ?deparse, ?.deparseOpts, ?srcref (and links therein), ?options, and ?getParseData.