[SOLVED] Prepare a csv file for process mining

Prepare a csv file for process mining

hope you are doing well ! I was following tutorials for process mining using 'PM4PY', but I found difficulties in the csv file , in my csv file I have this columns : 'id', 'status', 'mailID', 'date'.... ('status' is same as 'activity' that contain some specific choises )

my csv file contains a lot of data.

to follow process mining tutorial I must have in my columns something like 'case:concept:name' ... but I don't know how can I make it

Solution

In your case, I assume 'id' would be the same as the Case ID in normal process mining terminology. Similarly, 'status' corresponds to Activity ID and 'date' would correspond to the timestamp.

The best option is to first read into a pandas dataframe before feeding into PM4Py.

For a detailed understanding of how to do this, here is an example below. As you have not mentioned all the columns that you have in your csv file, let us assume that currently you only have [ 'id', 'status', 'date' ] as your column list. The following code can be adapted to any number of columns you have (by adding them to the list named cols) :

import pandas as pd
from pm4py.objects.conversion.log import converter as log_converter

path = '' # Enter path to the csv file
data = pd.read_csv(path)
cols = ['case:concept:name','concept:name','time:timestamp']
data.columns = cols
data['time:timestamp'] = pd.to_datetime(data['time:timestamp'])
data['concept:name'] = data['concept:name'].astype(str)

log = log_converter.apply(data, variant=log_converter.Variants.TO_EVENT_LOG)

Here we have changed the column names and their datatypes as required by the PM4Py package. Convert this dataframe into an event log using the log_converter function. Now you can perform your regular process mining tasks on this event log object. For instance, if you wish to create a Directly-Follows Graph from the event log, you can use the following line of code :

from pm4py.algo.discovery.dfg import algorithm as dfg_algorithm

dfg = dfg_algorithm.apply(log)