rimportimport-from-excelimport-from-csv

Best way to transform source data?


Working in R. But I think this question is universal.

Wall Street Journal visualized a dataset on disease infection rates in the U.S.: enter image description here

X-axis is year. Y-axis is state.

And shade of red per tile is infection rate intensity for that particular state recorded for that year.

The source dataset being visualized is arranged as follows: enter image description here

Each row in the dataset corresponds to a single infection rate for a single country in a given year. So, each red tile in the visualization corresponds to a row from the dataset.

But what if the dataset looked like this?: enter image description here

Now, each row corresponds to a state. And each state/row has multiple infection rates, one for each year recorded. This might match how data is captured in the real world because for each year or day (in the case of coronavirus) you track the infection rate, you can just add a new column (rather than a 50 new rows).

The problem is while this layout is more human-friendly, it's not very R-friendly. We can easily create the tile visualization based on the source dataset arrangement where data is arranged by infection rate, but not so easily if it's arranged by state.

So, finally, my question is — is there an easy way to transform data from the second layout to the first, in Excel?


Solution

  • You can use the transpose function in the free, open-source OpenRefine tool to prepare your data file prior to loading it into R.