Here are my data (it takes some SECONDS to download, please be patient):
library(dplyr)
mydata <- "https://pxdata.stat.fi:443/PxWeb/sq/87e44319-48f8-41b4-bd0d-a6629dc7829c" %>%
paste0(".relational_table") %>% read.table(sep = "\t", header = T)
Now, some of the rows look as they should, e.g.
> head(mydata)
Underlying.cause.of.death..ICD.10..3.character.level. Age Year Sex Information Deaths
1 A00-Y89 Total Total 2022 Total Deaths 63172
2 A00-Y89 Total Total 2022 Males Deaths 31703
3 A00-Y89 Total Total 2022 Females Deaths 31469
4 A00-Y89 Total 0 2022 Total Deaths 91
5 A00-Y89 Total 0 2022 Males Deaths 52
6 A00-Y89 Total 0 2022 Females Deaths 39
However, there are rows that do not look so goood:
> mydata %>% filter(grepl("\t",Underlying.cause.of.death..ICD.10..3.character.level.)) %>% head
Underlying.cause.of.death..ICD.10..3.character.level. Age Year Sex Information Deaths
1 A30 Leprosy (Hansens disease)\tTotal\t2022\tTotal\tDeaths\t0\nA30 Leprosy (Hansens disease) Total 2022 Males Deaths 0
2 A30 Leprosy (Hansens disease)\tTotal\t2022\tFemales\tDeaths\t0\nA30 Leprosy (Hansens disease) 0 2022 Total Deaths 0
3 A30 Leprosy (Hansens disease)\t0\t2022\tMales\tDeaths\t0\nA30 Leprosy (Hansens disease) 0 2022 Females Deaths 0
4 A30 Leprosy (Hansens disease)\t1 - 4\t2022\tTotal\tDeaths\t0\nA30 Leprosy (Hansens disease) 1 - 4 2022 Males Deaths 0
5 A30 Leprosy (Hansens disease)\t1 - 4\t2022\tFemales\tDeaths\t0\nA30 Leprosy (Hansens disease) 5 - 9 2022 Total Deaths 0
6 A30 Leprosy (Hansens disease)\t5 - 9\t2022\tMales\tDeaths\t0\nA30 Leprosy (Hansens disease) 5 - 9 2022 Females Deaths 0
Any ideas, why this is happening? If read.table should use "\t" as a column separator, why on earth it is pasting the original rows as follows, and this happens only for some of the rows?
Is there a better function that could read these these data to a table properly?
(I am using Windows 10, if that could have anything to do with this issue.)
This is an issue with some single (or double) quotes as in Hansen's disease, which makes read.table
believe that the whole row is in the same cell. Adding quote = ""
as an argument to read.table
has the effect of ignoring all quoting characters.
mydata <- read.table(mydata, sep = "\t", header = T, quote = "")
Raw data
library(dplyr)
mydata <- "https://pxdata.stat.fi:443/PxWeb/sq/87e44319-48f8-41b4-bd0d-a6629dc7829c" %>%
paste0(".relational_table")