I'm looking for a way to find the transition matrix (in R) with probabilities where someone moves. This is how my df looks:
City_year1 City_year2
<fct> <fct>
1 Alphen aan den Rijn NA
2 Tynaarlo NA
3 Eindhoven NA
4 Emmen Emmen
5 Emmen Emmen
6 Schagen Schagen
7 Bergen NA
8 Schagen Schagen
9 Schagen Schagen
10 Amsterdam Rotterdam
# .... with 200.000 more rows
How do I easily create a transition matrix with the probabilities that some one moves from Amsterdam in year 1 to Rotterdam in year 2, based on the data available in this df. Extra info: The number of unique values in year 1 is not necessarily equal to the #unique values in year 2. I have tried to use Markov functions, but without success.
I hope someone can help me!
table(df)
will give you a matrix of counts of transitions, and you can convert those counts to probabilities (proportions) with prop.table
:
prop.table(table(df), margin = 1)
The margin = 1
means that probabilities in rows will sum to 1.
Using the original data in the question:
df = read.table(text = 'City_year1 City_year2
1 Alphen_aan_den_Rijn NA
2 Tynaarlo NA
3 Eindhoven NA
4 Emmen Emmen
5 Emmen Emmen
6 Schagen Schagen
7 Bergen NA
8 Schagen Schagen
9 Schagen Schagen
10 Amsterdam Rotterdam', header = T)
result = prop.table(table(df), margin = 1)
result
# City_year2
# City_year1 Emmen Rotterdam Schagen
# Alphen_aan_den_Rijn
# Amsterdam 0 1 0
# Bergen
# Eindhoven
# Emmen 1 0 0
# Schagen 0 0 1
# Tynaarlo
unclass(result)
# City_year2
# City_year1 Emmen Rotterdam Schagen
# Alphen_aan_den_Rijn NaN NaN NaN
# Amsterdam 0 1 0
# Bergen NaN NaN NaN
# Eindhoven NaN NaN NaN
# Emmen 1 0 0
# Schagen 0 0 1
# Tynaarlo NaN NaN NaN