mysqlperconagaleramysql-clusterpercona-xtradb-cluster

Designing XtraDB cluster


We have an application which consists of microservices all connected to the same Percona DB instance. Currently it is just one instance with 16 cores/32 GB memory without replication. One of our issues is that sometimes one of our microservices causes such high load on the database (even just reading) which makes all the microservices unusable.

We are thinking about creating a Percona cluster of three nodes with choosing nodes for each microservice. The services which mostly "write" would connect to one instance and the rest would connect to the other two instances. This way if some microservice causes high load with reads, it shouldn't completely overwhelm our infrastructure.

My questions:

  1. Is this even good idea? Shouldn't we rather let ProxySQL deal with splitting the traffic? ProxySQL would porbably mean no isolation.
  2. Should we rather have more instances with less CPU or rather less instances with more CPU? Having more instances would mean more isolation for running microservices in case of high load.
  3. Is it a good idea having nodes with different CPUs? For instance let the "writing instance" have more CPU in comparison with the "reading instances".
  4. If we direct microservices to "their Percona instance", can we still have some kind of HA when their instance completely dies?

Note: We would probably use Percona XtraDB click-to-deploy in GCE: https://console.cloud.google.com/marketplace/details/click-to-deploy-images/percona?project=goout-cloud&folder&organizationId=74390800864


Solution

    1. Yes, this is a good idea. Using ProxySQL with PXC is also a good idea. By using ProxySQL, you can: A) implement "writer" HA by putting two nodes into the same hostgroup, one with super high weight (10000000) and the other at low (10). If high-weight node goes offline, ProxySQL will seamlessly start sending traffic to the other node. B) put all nodes into a separate "reader" hostgroup with the same weights, thus load-balancing write traffic. C) If desired, create a 3rd hostgroup with just 1 node and create a query-rule to pattern match on schema, user, or query pattern for your "high load" query and direct execution to that specific node. ProxySQL will also let you cache some of those heavy-hitting queries.

    2. Personally, I would pick less instances with higher CPU unless you know your network is rock solid. In PXC, all nodes must synchronously ACK all transactions. The more nodes you have, the longer the latency those operations can take. The fastest you can commit is the time between the two slowest nodes. Please make sure you always have an odd number of nodes, unless you get advanced with the pc.weight setting (but that's super tricky to get right).

    3. With MySQL in general, all nodes should be the same configuration. If your master is more powerful than slaves, generally speaking, slaves will not be able to keep up with the volume. With PXC, this means you will experience flow-control events more frequently which can translate to application stalls. If node2 cannot write as fast node1, node2 sends out flow-control messages, (cries for help), asking the other nodes to slow down while it catches up.

    4. Yes, using ProxySQL as described in #1.

    Side note, query optimization is the #1 way to "speed things up." Don't always throw hardware at the problem. It is worth the time to examine your slow query log and attempt to improve queries. Sometimes, a single index can make a night/day difference.

    Disclaimer: I'm Percona's Senior Instructor and have delivered numerous full-day PXC and ProxySQL intensive tutorial sessions.