I am trying to store several columns of the data, and then merge them with the other data when necessary. However, the data could only be merged with other data when I created it at the start, and it fails after fwrite and fread the data.
region <- climatebound_rast |> as.data.table(xy = TRUE)
fwrite(region, "region.csv")
region2 <- fread("region.csv", stringsAsFactors = TRUE)
all.equal(region,region2)
[1] TRUE
TEST <- raster_to_table[Variable=="npp"&Model=="DLEM", 1:4]
# 1. Merge raster data with region data
merged_region <- merge(TEST, region, all.x = TRUE) |>
drop_na(region)
merged_region
Key: <x, y>
x y Model Variable region
<num> <num> <char> <char> <fctr>
1: 87.99277 49.0312 DLEM npp MNG
2: 88.49277 48.5312 DLEM npp MNG
3: 88.49277 49.0312 DLEM npp MNG
4: 88.99277 48.0312 DLEM npp MNG
5: 88.99277 48.5312 DLEM npp MNG
---
5846: 161.49277 68.5312 DLEM npp RUS
5847: 161.99277 68.5312 DLEM npp RUS
5848: 161.99277 69.0312 DLEM npp RUS
5849: 162.49277 68.5312 DLEM npp RUS
5850: 162.49277 69.0312 DLEM npp RUS
merged_region2 <- merge(TEST, region2, all.x = TRUE) |>
drop_na(region)
merged_region2
Key: <x, y>
Empty data.table (0 rows and 5 cols): x,y,Model,Variable,region
all.equal()
is not the same as identical()
What is happening here is that you are trying to join on a float which cannot be precisely represented and is altered by saving to csv. Here's a simple example:
library(data.table)
dat1 <- data.table(x = 1/3, label = "one third")
dat2 <- data.table(x = 1/3)
It's possible (though inadvisable - see below) to merge these:
merge(dat1, dat2)
# Key: <x>
# x label
# <num> <char>
# 1: 0.3333333 one third
However, let's write dat1
to a csv and then read it back in. Floating-point numbers are stored in binary format with finite precision, meaning decimal fractions which cannot be expressed as a power of 2 (like 1/3
) cannot be represented exactly. When written to a csv, values which cannot be expressed precisely as decimals (e.g. 1/3
) are rounded to a limited number of decimal places, and reading them back introduces tiny errors.
fwrite(dat1, "./dat.csv")
dat1_fread <- fread("./dat.csv")
all.equal(dat1, dat1_fread) # TRUE
identical(dat1, dat1_fread) # FALSE
You can see that dat1
and dat1_fread
are all.equal()
, as this is a utility to compare R objects x and y testing ‘near equality’. However, they are not identical()
. Joins require exact matches. And so we don't return any results if we join on x
:
merge(dat1_fread, dat2)
# Empty data.table (0 rows and 2 cols): x,label
You can see the issue if you print the values with high enough precision:
sprintf("%.20f", dat1$x)
# [1] "0.33333333333333331483"
sprintf("%.20f", dat2$x)
# [1] "0.33333333333333331483"
sprintf("%.20f", dat1_fread$x)
# [1] "0.33333333333333298176"
Interestingly, the text value of x
stored in dat.csv
is 0.333333333333333
, which is not the same as the value when it is read back in. This is because when fwrite()
saves the csv, it rounds 1/3
to 15 significant digits. When fread()
reads the csv, it sees a number which cannot be exactly represented in binary. It then represents it as 0.33333333333333298176...
. This is a fundamental limitation of floating-point arithmetic, not an R issue.
The best plan is not to join on a float. If you have to join on a float, you could round, if you can afford to lose some precision (which essentially you can because you can't rely on floats being stored with very high accuracy). If you don't want to round, don't save your file to csv but serialize to some binary format instead. For example, rds:
saveRDS(dat1, "./dat.rds")
dat1_rds <- readRDS("./dat.rds")
merge(dat1_rds, dat2)
# Key: <x>
# x label
# <num> <char>
# 1: 0.3333333 one third