rdplyrtransitionmarkov-chainsmarkov

Creating 1 step transition matrix, find probability that someone moves to a particular city


I'm looking for a way to find the transition matrix (in R) with probabilities where someone moves. This is how my df looks:

    City_year1         City_year2
   <fct>               <fct>  
 1 Alphen aan den Rijn NA     
 2 Tynaarlo            NA     
 3 Eindhoven           NA     
 4 Emmen               Emmen  
 5 Emmen               Emmen  
 6 Schagen             Schagen
 7 Bergen              NA     
 8 Schagen             Schagen
 9 Schagen             Schagen
10 Amsterdam           Rotterdam      

# .... with 200.000 more rows

How do I easily create a transition matrix with the probabilities that some one moves from Amsterdam in year 1 to Rotterdam in year 2, based on the data available in this df. Extra info: The number of unique values in year 1 is not necessarily equal to the #unique values in year 2. I have tried to use Markov functions, but without success.

I hope someone can help me!


Solution

  • table(df) will give you a matrix of counts of transitions, and you can convert those counts to probabilities (proportions) with prop.table:

    prop.table(table(df), margin = 1)
    

    The margin = 1 means that probabilities in rows will sum to 1.

    Using the original data in the question:

    df =     read.table(text = 'City_year1         City_year2
      1 Alphen_aan_den_Rijn NA     
    2 Tynaarlo            NA     
    3 Eindhoven           NA     
    4 Emmen               Emmen  
    5 Emmen               Emmen  
    6 Schagen             Schagen
    7 Bergen              NA     
    8 Schagen             Schagen
    9 Schagen             Schagen
    10 Amsterdam           Rotterdam', header = T)
    
    result = prop.table(table(df), margin = 1)
    result
    # City_year2
    # City_year1            Emmen Rotterdam Schagen
    # Alphen_aan_den_Rijn                        
    # Amsterdam               0         1       0
    # Bergen                                     
    # Eindhoven                                  
    # Emmen                   1         0       0
    # Schagen                 0         0       1
    # Tynaarlo                                   
    
    unclass(result)
    # City_year2
    # City_year1            Emmen Rotterdam Schagen
    # Alphen_aan_den_Rijn   NaN       NaN     NaN
    # Amsterdam               0         1       0
    # Bergen                NaN       NaN     NaN
    # Eindhoven             NaN       NaN     NaN
    # Emmen                   1         0       0
    # Schagen                 0         0       1
    # Tynaarlo              NaN       NaN     NaN