[SOLVED] Creating 1 step transition matrix, find probability that someone moves to a particular city

Creating 1 step transition matrix, find probability that someone moves to a particular city

I'm looking for a way to find the transition matrix (in R) with probabilities where someone moves. This is how my df looks:

    City_year1         City_year2
   <fct>               <fct>  
 1 Alphen aan den Rijn NA     
 2 Tynaarlo            NA     
 3 Eindhoven           NA     
 4 Emmen               Emmen  
 5 Emmen               Emmen  
 6 Schagen             Schagen
 7 Bergen              NA     
 8 Schagen             Schagen
 9 Schagen             Schagen
10 Amsterdam           Rotterdam      

# .... with 200.000 more rows

How do I easily create a transition matrix with the probabilities that some one moves from Amsterdam in year 1 to Rotterdam in year 2, based on the data available in this df. Extra info: The number of unique values in year 1 is not necessarily equal to the #unique values in year 2. I have tried to use Markov functions, but without success.

I hope someone can help me!

Solution

table(df) will give you a matrix of counts of transitions, and you can convert those counts to probabilities (proportions) with prop.table:

prop.table(table(df), margin = 1)

The margin = 1 means that probabilities in rows will sum to 1.

Using the original data in the question:

df =     read.table(text = 'City_year1         City_year2
  1 Alphen_aan_den_Rijn NA     
2 Tynaarlo            NA     
3 Eindhoven           NA     
4 Emmen               Emmen  
5 Emmen               Emmen  
6 Schagen             Schagen
7 Bergen              NA     
8 Schagen             Schagen
9 Schagen             Schagen
10 Amsterdam           Rotterdam', header = T)

result = prop.table(table(df), margin = 1)
result
# City_year2
# City_year1            Emmen Rotterdam Schagen
# Alphen_aan_den_Rijn                        
# Amsterdam               0         1       0
# Bergen                                     
# Eindhoven                                  
# Emmen                   1         0       0
# Schagen                 0         0       1
# Tynaarlo                                   

unclass(result)
# City_year2
# City_year1            Emmen Rotterdam Schagen
# Alphen_aan_den_Rijn   NaN       NaN     NaN
# Amsterdam               0         1       0
# Bergen                NaN       NaN     NaN
# Eindhoven             NaN       NaN     NaN
# Emmen                   1         0       0
# Schagen                 0         0       1
# Tynaarlo              NaN       NaN     NaN