rseq

seq_along giving the column number not row number


I'm trying to write a for loop that creates a new variable from an existing variable in a dataframe, and does so by iterating over each row in turn. I've tried using for (i in seq_along(data)), but this only created the new variable correctly for the first 19 rows, and I realised that seq_along wasn't working as I had expected: instead of creating the sequence based on the number of rows, it had done so based on the number of columns:

seq_along(data) returns

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

while nrow(data) returns

[1] 82

and ncol(data) returns

[1] 19

Additionally, the output for seq(data) is the same as that for seq_along, and length(data) returns [1] 19.

While I've got a workaround that resolves the issue for the for loop (for (i in 1:nrow(data))), I'm curious to know what the reason is for seq_along (and seq and length) not behaving the way I'd expected.


Solution

  • Formalizing the comments into a community answer, seq_along(aDataFrame) sequences along columns in a data frame because a data frame is also a list(). We can demonstrate this with the typeof() function as follows with the Motor Trend Cars data frame.

    > typeof(mtcars)
    [1] "list"
    

    Each element in the list contains one column from a data frame. We can use the names() function to extract the element names from the list.

    > names(mtcars)
    [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
    

    Therefore, seq_along(mtcars) will produce a vector of 1:11, corresponding to the number of elements in the list().

    > seq_along(mtcars)
     [1]  1  2  3  4  5  6  7  8  9 10 11