I'm using seqformat
in R to analyze the sequence of events.
I have this data, for example, for practice where I have a huge dataset, but I'm using it to understand the function format:
Location_Id Event Start_day End_day temp year
1 Sever snow 6 12 4 2014
1 Medium snow 15 21 6 2016
2 Sever snow 7 8 3 2013
I used this command:
sts.data <- seqformat(df, from="SPELL", to="STS", id="Event", begin="Start_day", end="End_day", status="temp",limit=3)
When I run the command, I get this message
[!!] max of 'end' column > limit! Sequences truncated at limit= 3 [>]
converting SPELL data into 2 STS sequences (internal format)
The output with NA values is as below
a1 a2 a3
Sever snow NA NA NA
Medium snow NA NA NA
I'm not sure if the end
parameter needs to be greater than the begin
parameter among all events or this is not the problem.
Any thoughts about why I can't have this sequence of events created successfully, please?
The limit
argument sets the maximum length of the sequences. In your data the first valid information is at day 6 and, therefore, the first three positions (days) are NAs.
The latest valid information is on day 21. To avoid truncation of the sequences, set limit=21
or larger. Note also that the function may produce unexpected results when ids are not contiguous. Since you are using Event
as id, I sort the rows of df
by Event
to make ids contiguous.
df <- read.table(header=TRUE, text = "
Location_Id Event Start_day End_day temp year
1 Sever.snow 6 12 4 2014
1 Medium.snow 15 21 6 2016
2 Sever.snow 7 8 3 2013
")
## Event used as id: sort to make identical ids contiguous
df <- df[order(df[,"Event"]),]
sts.data <- seqformat(df, from="SPELL", to="STS", id="Event",
begin="Start_day", end="End_day", status="temp",limit=21)
sts.data
# a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21
# Medium.snow NA NA NA NA NA NA NA NA NA NA NA NA NA NA 6 6 6 6 6 6 6
# Sever.snow NA NA NA NA NA 4 3 3 4 4 4 4 NA NA NA NA NA NA NA NA NA