What is the common approach to data sharding based on regions? a.k.a GDPR enforcement - EU data stays in EU.
If I were to save users' email
s in users
table - I would need to keep data for US and EU people separated somehow. Example mysql
table:
CREATE TABLE users(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
name VARCHAR(30),
email VARCHAR(30),
otherSensetiveData VARCHAR(30))
In general I just want to know how to solve this problem.
If you have a data residency requirement in the EU, then you either need two servers, or you need to store all the data in the EU.
If you shard the data (split it over multiple servers), then unique keys in general have some complications.
There are at least four popular solutions to generating globally unique id values:
Use auto-increment, but ensure they don't allocate the same id values by using auto_increment_increment
set to the number of shards, and auto_increment_offset
set to a distinct value between 0 and the number of shards. For example if you have 2 shards, auto_increment_increment
would be set to 2 on both shards, and auto_increment_offset
would be set to 0 on the US shard and 1 on the EU shard.
Use a compound primary key, one column being auto-increment, and the other column being constrained to a distinct shardid. It's up to you to define the table differently on each shard.
CREATE TABLE users(
id INT NOT NULL AUTO_INCREMENT,
shardid INT NOT NULL CHECK (shardid = 1),
PRIMARY KEY(id, shardid)
);
Do not use the built-in auto-increment features of MySQL, but instead create a globally unique id generator service, which both the US and the EU app instances call to get the next id. This is something the client app should call, and then pass the value as a query parameter to an INSERT statement. If it's too slow for the remote side to call this service on every INSERT, then the remote app may fetch a batch of id values in advance and store them locally, always keeping a "supply" of id values to use.
Use a UUID or globally unique string. This is in part encoded by the server id of the MySQL instance, so it's bound to be unique. You could use a trigger in your MySQL database to fill in the primary key with a UUID.
CREATE TRIGGER t BEFORE INSERT on users FOR EACH ROW SET id = UUID();
Sharding is a complex subject, and you need to choose the solution that works best for your app.
I'd recommend you first talk to a qualified legal professional familiar with the GDPR, to confirm that you really have a requirement for data residency. In some cases, you don't, according to articles like https://www.mcafee.com/blogs/enterprise/data-security/data-residency-a-concept-not-found-in-the-gdpr/ (though that article is not legal advice).