I am trying to produce gridded rainfall data over the UK by using a Thin Plate Spline algorithm and eliminate values that are not over land in R - a process I can only achieve so far manually. The problem is challenging (for me) and even challenging to explain - so I will step through what I have done so far. Any help will be greatly welcome.
First, I load a data table into R that represents rainfall on a single day from a number of point location weather stations, and each row of the data table contains the date, id of the station, the easting and northing of the station, the daily rainfall at that site and the average rainfall for the year. I also load the libraries fields,maptools and gstat.
library(fields)
library(maptools)
library(gstat)
dat <- read.table("1961month1day1.csv", header=T, sep=",", quote = "")
names(dat) <- c("easting", "northing", "dailyrainfall","avaerageyearlyrainfall")
Here is a sample of the data:
dput(head(dat, 20))
structure(list(easting = c(130000L, 145000L, 155000L, 170000L,
180000L, 180000L, 180000L, 180000L, 185000L, 200000L, 200000L,
205000L, 210000L, 220000L, 225000L, 230000L, 230000L, 230000L,
230000L, 235000L), northing = c(660000L, 30000L, 735000L, 40000L,
30000L, 45000L, 60000L, 750000L, 725000L, 50000L, 845000L, 65000L,
770000L, 105000L, 670000L, 100000L, 620000L, 680000L, 95000L,
120000L), dailyrainfall = c(9.4, 4.1, 12.4, 2.8, 1.3, 3.6, 4.8, 26.7, 19.8,
4.6, 1.7, 4.1, 12.7, 1.8, 3, 5.3, 1, 1.5, 1.5, 4.6), averageyearlyrainfall = c(1334.626923,
1123.051923, 2072.030769, 1207.584615, 928, 1089.334615, 880.0884615,
2810.323077, 1933.719231, 1215.642308, 2644.171154, 1235.913462,
2140.111538, 1010.436538, 1778.432692, 1116.934615, 912.2807692,
1579.386538, 1085.498077, 1250.601923)), .Names = c("easting",
"northing", "dailyrainfall", "averageyearlyrainfall"), row.names = c(NA, 20L), class = "data.frame")
I can then fit a thin plate spline to the data so as to give me a gridded surface and plot the surface:
fit <- Tps(cbind(dat$easting,dat$northing),dat$dailyrainfall)
surface(fit)
I can then create a grid of the UK, in 1km steps by using:
xvals <- seq(0, 700000, by=1000)
yvals <- seq(0, 1250000, by=1000)
and then plot the surface onto this grid and write the data into a table:
griddf <- expand.grid(xvals, yvals)
griddf$pred <- predict(fit, x=as.matrix(griddf))
write.table(griddf, file="1Jan1961grid.csv", sep=",", qmethod="double")
Great - so far so good. I now have converted my point data to 1km gridded data over the entire 0 to 700000 (E) and 0 to 1250000 (N) grid. The written data table is a list containing an index, an easting, a northing and the predicted rainfall value.
Now the challenge - I want to eliminate any values from this list that are not over land. I can achieve this manually by loading the data into excel (or Access) and comparing the data to another file that contains the same grid and the average yearly rainfall (the file is called 1kmgridaveragerainfall.csv). Here is a sample of this file:
dput(head(dat1, 20))
structure(list(easting = c(-200000L, -200000L, -200000L, -200000L,
-200000L, -200000L, -200000L, -200000L, -200000L, -200000L, -200000L,
-200000L, -200000L, -200000L, -200000L, -200000L, -200000L, -200000L,
-200000L, -200000L), northing = c(1245000L, 1240000L, 1235000L,
1230000L, 1225000L, 1220000L, 1215000L, 1210000L, 1205000L, 1200000L,
1195000L, 1190000L, 1185000L, 1180000L, 1175000L, 1170000L, 1165000L,
1160000L, 1155000L, 1150000L), averageyearlyrainfall = c(-9999, -9999, -9999,
-9999, -9999, -9999, -9999, -9999, -9999, -9999, -9999, -9999,
-9999, -9999, -9999, -9999, -9999, -9999, -9999, -9999)), .Names = c("easting",
"northing", "averageyearlyrainfall"), row.names = c(NA, 20L), class = "data.frame")
Any grid square that is not over land has an average yearly rainfall of -9999. Hence once matched (i.e. using vlookup or a query in Access) I can filter out values that have this -9999 value and this leaves me with a data table that has the easting and northing and daily rainfall and average yearly rainfall for land values only. I can then load this back into R and plot this using:
quilt.plot(cbind(dat$easting,dat$northing),dat$mm, add.legend=TRUE, nx=654, ny=1209,xlim=c(0,700000),ylim=c(0,1200000))
and I am left with a plot of rainfall over the UK land (and not the sea area).
So, can anybody suggest a way of achieving the same but without all the filtering etc using excel or access i.e. can the same be achieved using R only? Is there a way of loading both data tables into R at the beginning and somehow fitting the TPS of the point data over the average data so that grid squares that are equal to -9999 are not plotted.
I know that the TPS can be weighted using a covariate (Z) - does this help at all? i.e.
fit <- Tps(cbind(dat$easting,dat$northing),dat$dailyrainfall, Z=dat$averageyearlyrainfall)
Also, when I perform surface(fit) of the original TPS, how do I extend the plot to the edges of the plot - I'm sure I've read this somewhere where you put something like interp=TRUE but this doesn't work.
Any help would be most appreciated
Thanks, Tony
If you have already got to the point where you have the two dataframes you should be able to merge them into a new dataframe and filter/subset the result.
set.seed(1234) # for reproducibility
# "The written data table is a list containing an index, an easting,
# a northing and the predicted rainfall value"
# Create a simple data frame containing made-up data
mydf1 <- data.frame(index = 1:10,
easting = c(1, 1, 3, 4, 5, 5, 5, 5, 6, 6),
northing = c(12, 13, 13, 13, 14, 14, 15, 17, 18, 20),
predicted = runif(10, 500, 1000))
# "...comparing the data to another file that contains the same grid
# and the average yearly rainfall"
# Second data frame is similar, but has rainfall instead of predicted
mydf2 <- data.frame(index = 1:10,
easting = c(1, 1, 3, 4, 5, 5, 5, 5, 6, 6),
northing = c(12, 13, 13, 13, 14, 14, 15, 17, 18, 20),
rainfall = c(runif(9, 500, 1000), -9999))
# If data frames are of same size and have mostly common columns,
# merging them probably makes it easy to manipulate the data
mydf.merged <- merge(mydf1, mydf2)
# Finally, filter the merged data frame so that it only contains
# rainfall values that are not the -9999 value that denotes sea
mydf.final <- mydf.merged[mydf.merged$rainfall > -9999, ]
This is the first data frame:
> mydf1
index easting northing predicted
1 1 1 12 556.8517
2 2 1 13 811.1497
3 3 3 13 804.6374
4 4 4 13 811.6897
5 5 5 14 930.4577
6 6 5 14 820.1553
7 7 5 15 504.7479
8 8 5 17 616.2753
9 9 6 18 833.0419
10 10 6 20 757.1256
>
This is the second dataframe:
> mydf2
index easting northing rainfall
1 1 1 12 846.7956
2 2 1 13 772.4874
3 3 3 13 641.3668
4 4 4 13 961.7167
5 5 5 14 646.1579
6 6 5 14 918.6478
7 7 5 15 643.1116
8 8 5 17 633.4104
9 9 6 18 593.3614
10 10 6 20 -9999.0000
>
Merged dataframe:
> mydf.merged
index easting northing predicted rainfall
1 1 1 12 556.8517 846.7956
2 10 6 20 757.1256 -9999.0000
3 2 1 13 811.1497 772.4874
4 3 3 13 804.6374 641.3668
5 4 4 13 811.6897 961.7167
6 5 5 14 930.4577 646.1579
7 6 5 14 820.1553 918.6478
8 7 5 15 504.7479 643.1116
9 8 5 17 616.2753 633.4104
10 9 6 18 833.0419 593.3614
>
Final dataframe with -9999 row removed:
> mydf.final
index easting northing predicted rainfall
1 1 1 12 556.8517 846.7956
3 2 1 13 811.1497 772.4874
4 3 3 13 804.6374 641.3668
5 4 4 13 811.6897 961.7167
6 5 5 14 930.4577 646.1579
7 6 5 14 820.1553 918.6478
8 7 5 15 504.7479 643.1116
9 8 5 17 616.2753 633.4104
10 9 6 18 833.0419 593.3614
>