I use the bupar package to do process analysis. Suppose my data stored in a csv file looks like this (the file is already sorted properly by caseid and timestamp):
STATUS;timestamp;CASEID
created;16-02-2023 09:46:32;1
revised;13-04-2023 23:58:59;1
accepted;13-04-2023 23:59:59;1
created;16-02-2023 09:46:32;2
accepted;13-04-2023 23:59:59;2
created;14-12-2022 13:17:54;3
revised;02-01-2023 23:59:59;3
accepted;28-02-2023 19:37:01;3
submitted;03-03-2023 23:59:59;3
created;02-01-2023 07:45:43;5
created;24-01-2022 16:05:58;6
accepted;03-02-2022 23:59:59;6
created;24-01-2022 15:52:53;7
accepted;03-02-2022 23:59:59;7
created;15-08-2022 12:54:23;8
rejected;18-08-2022 23:59:59;8
created;21-03-2022 15:32:05;9
accepted;26-04-2022 23:59:59;9
created;21-03-2022 15:42:39;10
The first case with id 1 has the trace "created-revised-accepted". So first comes the event created, then revised and then accepted.
I now use the following code to create a process map:
library(bupaR)
library(processmapR)
library(edeaR)
datafile <- read.csv(file="pathtofile\\testfile.csv",header=T, sep=";")
datafile$timestampcolumn <- as.POSIXct(datafile$timestamp, format="%d-%m-%Y %H:%M:%S")
mytest <- simple_eventlog(datafile, case_id = "CASEID", activity_id = "STATUS", timestamp = "timestampcolumn")
process_map(mytest, type = frequency("absolute"))
This gives:
Now I would like to add the trace for each case into my original file. The trace of course is always the same for a case. So the output should be like this (each event in the trace separated by example "-"):
STATUS;timestamp;CASEID;trace
created;16-02-2023 09:46:32;1;created-revised-accepted
revised;13-04-2023 23:58:59;1;created-revised-accepted
accepted;13-04-2023 23:59:59;1;created-revised-accepted
created;16-02-2023 09:46:32;2;created-accepted
accepted;13-04-2023 23:59:59;2;created-accepted
created;14-12-2022 13:17:54;3;created-revised-accepted-submitted
revised;02-01-2023 23:59:59;3;created-revised-accepted-submitted
accepted;28-02-2023 19:37:01;3;created-revised-accepted-submitted
submitted;03-03-2023 23:59:59;3;created-revised-accepted-submitted
created;02-01-2023 07:45:43;5;created
created;24-01-2022 16:05:58;6;created-accepted
accepted;03-02-2022 23:59:59;6;created-accepted
created;24-01-2022 15:52:53;7;created-accepted
accepted;03-02-2022 23:59:59;7;created-accepted
created;15-08-2022 12:54:23;8;created-rejected
rejected;18-08-2022 23:59:59;8;created-rejected
created;21-03-2022 15:32:05;9;created-accepted
accepted;26-04-2022 23:59:59;9;created-accepted
created;21-03-2022 15:42:39;10;created
I tried to play around with filter_activity
, trace_list
(from edeaR package) and other commands, but I was not able to figure it out. I want to use the results from the process_map algorithm / bupar package code. So that it corresponds to the output in the graph. So I do not want to manually implement an algorithm by myself to calculate the traces. So of course I could implement an algorithm to go through each case and write down the statuses and so. But this is already somehow in the bupar eventlog / process_map command and I would like to use it. I want to dig into the details to see which case had a specific trace according to the graph. That's why it is important to get it consistent with the bupar output and not program it with an algorithm separately. This information must be already somehow included, otherwise the graph would not exist.
So how can I achieve this?
I have never worked with any of these packages, but solved the problem like this:
mytest
:class(mytest)
# [1] "eventlog" "log" "tbl_df" "tbl" "data.frame"
eventlog
:methods(class = "eventlog")
# [1] act_collapse activities activity_frequency
# [4] activity_instance_id activity_presence add_end_activity
# [7] add_start_activity arrange calculate_queuing_times
# [10] case_id case_list cases
# [13] detect_resource_inconsistencies dotted_chart durations
# [16] end_activities events_to_activitylog filter
# [19] filter_activity_instance filter_attributes filter_endpoints_condition
# [22] filter_infrequent_flows filter_lifecycle filter_lifecycle_presence
# [25] filter_precedence_resource filter_time_period filter_trim
# [28] filter_trim_lifecycle first_n fix_resource_inconsistencies
# [31] group_by group_by_activity group_by_activity_instance
# [34] group_by_case group_by_resource group_by_resource_activity
# [37] idle_time last_n lifecycle_id
# [40] lifecycle_labels lifecycles lined_chart
# [43] mapping mutate n_activity_instances
# [46] n_events number_of_repetitions number_of_selfloops
# [49] process_map process_matrix processing_time
# [52] redo_repetitions_referral_matrix redo_selfloops_referral_matrix resource_frequency
# [55] resource_id resource_map resource_matrix
# [58] resources sample_n select
# [61] set_activity_instance_id set_timestamp setdiff
# [64] size_of_repetitions size_of_selfloops slice_activities
# [67] slice_events standardize_lifecycle start_activities
# [70] summarise summary throughput_time
# [73] timestamp timestamps to_activitylog
# [76] trace_explorer trace_length trace_list
# [79] ungroup_eventlog unite
case_list
library(bupaR)
library(processmapR)
library(edeaR)
library(dplyr)
d <- readr::read_delim(
"STATUS;timestamp;CASEID
created;16-02-2023 09:46:32;1
revised;13-04-2023 23:58:59;1
accepted;13-04-2023 23:59:59;1
created;16-02-2023 09:46:32;2
accepted;13-04-2023 23:59:59;2
created;14-12-2022 13:17:54;3
revised;02-01-2023 23:59:59;3
accepted;28-02-2023 19:37:01;3
submitted;03-03-2023 23:59:59;3
created;02-01-2023 07:45:43;5
created;24-01-2022 16:05:58;6
accepted;03-02-2022 23:59:59;6
created;24-01-2022 15:52:53;7
accepted;03-02-2022 23:59:59;7
created;15-08-2022 12:54:23;8
rejected;18-08-2022 23:59:59;8
created;21-03-2022 15:32:05;9
accepted;26-04-2022 23:59:59;9
created;21-03-2022 15:42:39;10", delim = ";")
d$timestampcolumn <- as.POSIXct(d$timestamp, format="%d-%m-%Y %H:%M:%S")
mytest <- simple_eventlog(d,
case_id = "CASEID",
activity_id = "STATUS",
timestamp = "timestampcolumn")
process_map(mytest, type = frequency("absolute"))
d %>%
inner_join(case_list(mytest) %>%
select(CASEID, trace),
"CASEID")
# # A tibble: 19 × 5
# STATUS timestamp CASEID timestampcolumn trace
# <chr> <chr> <dbl> <dttm> <chr>
# 1 created 16-02-2023 09:46:32 1 2023-02-16 09:46:32 created,revised,accepted
# 2 revised 13-04-2023 23:58:59 1 2023-04-13 23:58:59 created,revised,accepted
# 3 accepted 13-04-2023 23:59:59 1 2023-04-13 23:59:59 created,revised,accepted
# 4 created 16-02-2023 09:46:32 2 2023-02-16 09:46:32 created,accepted
# 5 accepted 13-04-2023 23:59:59 2 2023-04-13 23:59:59 created,accepted
# 6 created 14-12-2022 13:17:54 3 2022-12-14 13:17:54 created,revised,accepted,submitted
# 7 revised 02-01-2023 23:59:59 3 2023-01-02 23:59:59 created,revised,accepted,submitted
# 8 accepted 28-02-2023 19:37:01 3 2023-02-28 19:37:01 created,revised,accepted,submitted
# 9 submitted 03-03-2023 23:59:59 3 2023-03-03 23:59:59 created,revised,accepted,submitted
# 10 created 02-01-2023 07:45:43 5 2023-01-02 07:45:43 created
# 11 created 24-01-2022 16:05:58 6 2022-01-24 16:05:58 created,accepted
# 12 accepted 03-02-2022 23:59:59 6 2022-02-03 23:59:59 created,accepted
# 13 created 24-01-2022 15:52:53 7 2022-01-24 15:52:53 created,accepted
# 14 accepted 03-02-2022 23:59:59 7 2022-02-03 23:59:59 created,accepted
# 15 created 15-08-2022 12:54:23 8 2022-08-15 12:54:23 created,rejected
# 16 rejected 18-08-2022 23:59:59 8 2022-08-18 23:59:59 created,rejected
# 17 created 21-03-2022 15:32:05 9 2022-03-21 15:32:05 created,accepted
# 18 accepted 26-04-2022 23:59:59 9 2022-04-26 23:59:59 created,accepted
# 19 created 21-03-2022 15:42:39 10 2022-03-21 15:42:39 created