Sample data look like this (fyi I have hundreds of files like this). The tricky part is the "NO RECORD" in the file. I haven't been trying hours to get it into R without any success
BEGIN DATA
RIM
DATE AF QD QU
09/30/1920 NO RECORD 370.00 NO RECORD
10/01/1920 NO RECORD 391.00 391.00
10/02/1920 NO RECORD 496.00 MISSING
10/03/1920 NO RECORD 660.00 MISSING
10/04/1920 NO RECORD 881.00 MISSING
10/05/1920 NO RECORD 660.00 MISSING
10/06/1920 NO RECORD 515.00 -9999
10/07/1920 NO RECORD 443.00 NO RECORD
10/08/1920 NO RECORD 443.00 MISSING
10/09/1920 NO RECORD 443.00 443.00
10/10/1920 NO RECORD 443.00 MISSING
This is my latest R code
library(zoo)
# function to read data
obsRead <- function(path2file, filename, number_line_skip, header_or_not) {
tmpName <- paste(path2file, filename, sep="")
tmpData <- read.zoo(tmpName,
tz='', stringsAsFactors = FALSE, strip.white = TRUE,
header=header_or_not, skip=number_line_skip,
na.strings = c("NA", "N/A", "MISSING", "NO RECORD", "-9999"), # tell zoo what NA values look like
qName <- c("AF", "QD", "QU")
names(tmpData) <- qName
index(tmpData) <- as.Date(index(tmpData)) # Convert index from POSIXct to Date
str(tmpData)
return(tmpData)
}
dataDir = "path/to/file/"
dataFile <- "sampleData.txt"
nLineSkip <- 3
header_or_not <- FALSE
Q_obs <- obsRead(dataDir, dataFile, nLineSkip, header_or_not)
And the error I got from R
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 6 elements
Any suggestion would be appreciated! Thanks!
Edit
@MichaelChirico posted another solution using data.table
package here
fread(text=x, sep = " ", header=TRUE, fill=length(unlist(strsplit(x[1], " "))), na.strings=strrep("*", 1:6))
Provided NO RECORD
is consistent, this should give you a start:
tmp <- readLines("sample.dat")
# substitute 'NO RECORD' for 'NORECORD' and use read.table()
# to process the collapsed vector
tmp.collapse <- paste(gsub("NO RECORD", "NORECORD",
tmp[4:length(tmp)]), sep="", collapse="\n")
# get the column names from the third row and use them in the data table
read.table(textConnection(tmp.collapse),
header=FALSE, stringsAsFactors=FALSE,
col.names=unlist(strsplit(tmp[3], "\ +")))
## DATE AF QD QU
## 1 09/30/1920 NORECORD 370 NORECORD
## 2 10/01/1920 NORECORD 391 391.00
## 3 10/02/1920 NORECORD 496 MISSING
## 4 10/03/1920 NORECORD 660 MISSING
## 5 10/04/1920 NORECORD 881 MISSING
## 6 10/05/1920 NORECORD 660 MISSING
## 7 10/06/1920 NORECORD 515 -9999
## 8 10/07/1920 NORECORD 443 NORECORD
## 9 10/08/1920 NORECORD 443 MISSING
## 10 10/09/1920 NORECORD 443 443.00
## 11 10/10/1920 NORECORD 443 MISSING