While I'm using Ruby/Rails to solve this particular problem, the specific issue is not unique to Ruby.
I'm building an app that can send group/mms messages to multiple people, and then processes those texts when the others reply.
The app will have a different number for each record, and each record can be involved in multiple group conversations.
For example, record_1
can be involved in a conversation with user_1, user_2
, but can also be involved in a separate conversation with user_2, user_3
, and record_2
can have a separate conversation with user_1, user_2
.
When I send a message the fields resemble:
{
from: "1234566789",
to: [
"1111111111",
"2222222222",
...
],
body: "..."
}
Where the from
is my app number, and the to []
is an array of phone numbers for everyone else involved in the conversation.
When one of the other participants replies to the group message, I'll get a webhook from my text messaging provider that has the from
as that person's phone number and the to []
would include my app number and everyone else's numbers.
The identifier for a conversation is the unique combination of the phone numbers involved.
However, having an array of ["1234567890", "1111111111", "2222222222"]
is difficult to work with, and I would like a string representation that I can index in my database and quickly find.
If I have a to: ["1234567890", "1111111111", "2222222222]
array of the phone numbers, I'm thinking about using Digest::MD5.hexdigest to.sort.to_s
.
This would give me a unique identifier such as 49a5a960c5714c2e29dd1a7e7b950741
, that I can index in my DB and use to uniquely reference conversations.
Are there any concerns with using the MD5 hash to solve my specific problem? Anytime I have the same numbers involved in a conversation, I want it to produce the same hash. Does MD5 guarantee the same result given the same ordered input?
Is there another approach to uniquely identify conversations by the participants?
Yes, MD5 does give you that guarantee, unless someone is trying to attack your system. It is possible to create colliding MD5 hashes but they will never happen by accident.
So if in your situation the hash will only ever be benign (i.e. created by your code, not created by someone trying to mount an attack of some kind), then using MD5 is fine.
Or you could switch to using SHA256 instead of MD5 which doesn't have this risk associated with it.