I want to convert the data frame below (dataset) into a xyz format, getting the longitude (x variable, the first 36 values in each column of the first row), latitude (y variable, the first 30 values in each row of the first column) and the current value of each x-y ordered pair (z variable). Is it possible to do it using r?
The data comprises spatially distributed rainfall values (most values are equal or close to zero). Longitude ranges from -47.8 to -36.2, and latitude ranges from -21.2 to -7.2.
> dataset <- read.csv("rainfall_data.csv", header = FALSE, sep = ",")
> headTail(dataset)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
1 <NA> -47.8 -47.4 -47 -46.6 -46.2 -45.8 -45.4 -45 -44.6 -44.2 -43.8 -43.4 -43 -42.6 -42.2 -41.8 -41.4 -41 -40.6
2 -21.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 -20.8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 -20.4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
34 -8.4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 -8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36 -7.6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37 -7.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31
1 -40.2 -39.8 -39.4 -39 -38.6 -38.2 -37.8 -37.4 -37 -36.6 -36.2
2 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ...
34 0 0 0 0 0 0 0 0 0 0 0
35 0 0 0 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0 0 0 0
37 0 0 0 0 0 0 0 0 0 0 0
Possible outcome:
x y z
1 -47.8 -21.1 0
2 -47.4 -21.1 0
3 -47.0 -21.1 0
4 -46.6 -21.1 0
... ... ... ...
1077 -37.4 -7.2 0
1078 -37.0 -7.2 0
1079 -36.6 -7.2 0
1080 -36.2 -7.2 0
Data are available here: https://drive.google.com/file/d/1IQqoV1WsrEXQoorZmuE4DEA8wzJkI3GL/view?usp=sharing
With base R, we can create a new dataframe without the first column and first row. Then, we can assign those as the row and column names (excluding the NA
). Then, I convert to matrix
in order to convert to a table
, then into a data frame with three columns.
result <- df[-1, -1]
row.names(result) <- df[-1,1]
colnames(result) <- df[1,-1]
as.data.frame(as.table(as.matrix(result)))
Working from your .csv
, we can use read.delim
to specify the first column as rownames and the first row as the header, and we add check.names = FALSE
, so that we return the correct numbers as the header (if you put TRUE
, then it appends a X.
to the beginning of the name).
df <- read.delim(file = "~/Downloads/rainfall_data.csv", header = TRUE, row.names = 1, sep = ",", check.names=FALSE)
as.data.frame(as.table(as.matrix(df)))
Another option would be to convert the V1
to row names and row 1 to the column names using a combination of tibble::column_to_rownames
and janitor::row_to_names
. Then, follow the same steps as above with base R.
library(tidyverse)
library(janitor)
results <- df %>%
tibble::column_to_rownames("V1") %>%
janitor::row_to_names(1) %>%
as.matrix() %>%
as.table() %>%
as.data.frame()
Output
head(results)
Var1 Var2 Freq
1 -21.2 -47.8 0
2 -20.8 -47.8 0
3 -20.4 -47.8 0
4 -21.2 -47.4 0
5 -20.8 -47.4 0
6 -20.4 -47.4 0
Data
df <- structure(list(V1 = c("<NA>", "-21.2", "-20.8", "-20.4"), V2 = c(-47.8,
0, 0, 0), V3 = c(-47.4, 0, 0, 0), V4 = c(-47L, 0L, 0L, 0L), V5 = c(-46.6,
0, 0, 0), V6 = c(-46.2, 0, 0, 0), V7 = c(-45.8, 0, 0, 0), V8 = c(-45.4,
0, 0, 0), V9 = c(-45L, 0L, 0L, 0L), V10 = c(-44.6, 0, 0, 0),
V11 = c(-44.2, 0, 0, 0), V12 = c(-43.8, 0, 0, 0), V13 = c(-43.4,
0, 0, 0), V14 = c(-43L, 0L, 0L, 0L), V15 = c(-42.6, 0, 0,
0), V16 = c(-42.2, 0, 0, 0), V17 = c(-41.8, 0, 0, 0), V18 = c(-41.4,
0, 0, 0), V19 = c(-41L, 0L, 0L, 0L), V20 = c(-40.6, 0, 0,
0)), class = "data.frame", row.names = c(NA, -4L))