rweb-scrapingquantmodrvestquandl

Web scraping of key stats in Yahoo! Finance with R


Is anyone experienced in scraping data from the Yahoo! Finance key statistics page with R? I am familiar scraping data directly from html using read_html, html_nodes(), and html_text() from rvest package. However, this web page MSFT key stats is a bit complicated, I am not sure if all the stats are kept in XHR, JS, or Doc. I am guessing the data is stored in JSON. If anyone knows a good way to extract and parse data for this web page with R, kindly answer my question, great thanks in advance!

Or if there is a more convenient way to extract these metrics via quantmod or Quandl, kindly let me know, that would be a extremely good solution!


Solution

  • I gave up on Excel a long time ago. R is definitely the way to go for things like this.

    library(XML)
    
    stocks <- c("AXP","BA","CAT","CSCO")
    
    for (s in stocks) {
          url <- paste0("http://finviz.com/quote.ashx?t=", s)
          webpage <- readLines(url)
          html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
          tableNodes <- getNodeSet(html, "//table")
    
          # ASSIGN TO STOCK NAMED DFS
          assign(s, readHTMLTable(tableNodes[[9]], 
                    header= c("data1", "data2", "data3", "data4", "data5", "data6",
                              "data7", "data8", "data9", "data10", "data11", "data12")))
    
          # ADD COLUMN TO IDENTIFY STOCK 
          df <- get(s)
          df['stock'] <- s
          assign(s, df)
    }
    
    # COMBINE ALL STOCK DATA 
    stockdatalist <- cbind(mget(stocks))
    stockdata <- do.call(rbind, stockdatalist)
    # MOVE STOCK ID TO FIRST COLUMN
    stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]
    
    # SAVE TO CSV
    write.table(stockdata, "C:/Users/your_path_here/Desktop/MyData.csv", sep=",", 
                row.names=FALSE, col.names=FALSE)
    
    # REMOVE TEMP OBJECTS
    rm(df, stockdatalist)