I am trying to get google finance JSON data into a dataframe. I tried:
library(jsonlite)
dat1 <- fromJSON("http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM")
dat1
However I get an error:
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) : parse error: trailing garbage
Thank you for any help.
I could not replicate your error using fromJSON
due to proxy issues from my side but the following works using httr
require(jsonlite)
require(httr)
#Set your proxy setting if needed
#set_config(use_proxy(url='hostname',port= port,username="",password=""))
url.name = "http://www.google.com/finance/info?q=NSE:%20AAPL,MSFT,TSLA,AMZN,IBM"
url.get = GET(url.name)
#parsing the content as json results in similar error as you encountered
#url.content = content(url.get,type="application/json")
#Error in parseJSON(txt) : parse error: trailing garbage
# " : "0.57" ,"yld" : "2.46" } ,{ "id": "358464" ,"t" : "MSFT"
# (right here) ------^
#read content as html text
url.content = content(url.get, as="text")
#remove html tags
clean.text = gsub("<.*?>", "", url.content)
#remove residual text
clean.text = gsub("\\n|\\//","",clean.text)
DF = fromJSON(clean.text)
head(DF[,1:10],5)
# id t e l l_fix l_cur s ltt lt lt_dts
#1 22144 AAPL NASDAQ 92.51 92.51 92.51 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#2 358464 MSFT NASDAQ 51.05 51.05 51.05 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#3 12607212 TSLA NASDAQ 208.96 208.96 208.96 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#4 660463 AMZN NASDAQ 713.23 713.23 713.23 1 4:00PM EDT May 11, 4:00PM EDT 2016-05-11T16:00:02Z
#5 18241 IBM NYSE 148.95 148.95 148.95 2 6:59PM EDT May 11, 6:59PM EDT 2016-05-11T18:59:12Z