We have 6 Servers (4 Applications servers and 2 DB Servers)
We are using HAProxy to load balance between the Application and API servers (2/2)
Now the issue I'm having is that the system administrator setup a Master/Slave on the MySQL but it's always failing and until now we cannot use the slave since most data are always corrupted and we always need to fix it and each time we are getting different errors .
We tried to make some sort of load balancing for the read/write (write on master , read on slave) but we were not able to use that since slave data are not always correct .
What I'm wondering is how the big guys proceed when dealing with high load servers where you always need the data to be accurate and cannot take any risk?
Can someone tell me his own experience and what he used ?
What i found : Percona XtraDB Cluster , but before going into this direction need input ...
Thank's !!
The "table is full" error means your slave doesn't have enough space to perform the ALTER TABLE. You need to get larger disks to resolve that error.
But the subtext is that no one is monitoring your database servers, and that's a bigger problem. You need to get a database administrator, or else get a professional service to do it.
What I'm wondering is how the big guys proceed when dealing with high load servers where you always need the data to be accurate and cannot take any risk?
First, get it out of your head that any system has no risk. That's impossible, if you plan to use the system at all. You can't eliminate the possibility of errors, but you can be prepared to recover from them seamlessly.
The big guys do the following:
Hire operations staff including system administrators, network administrators, database administrators to take care of the servers.
Monitor everything. Use software to track system load, disk space, errors, and many other things continuously. The best option is New Relic. For MySQL slave integrity, use a tool like pt-table-checksum.
Redundancy. Create standby systems and data to take over when (not if) the primary system fails.
You probably want to learn about the field of high availability architecture. Check out this talk: Scalable Internet Architectures