mysqlmysql-cluster

MySQL Cluster (NDB) spread data (rows) to 4 data-nodes servers


I want to spread my MySQL rows to 4 servers (Horizontal sharding). I searched, and I found that I have to use NDB cluster (MySQL cluster). I made a MySQL server/client with NDB manager server and 4 data nodes servers and this is my manager configuration :

[ndbd default]
NoOfReplicas=1

[ndb_mgmd]
NodeId=1
hostname=192.168.159.133
datadir=/var/lib/mysql-cluster

[ndbd]
NodeId=2
hostname=192.168.159.131
datadir=/home/db

[ndbd]
NodeId=3
hostname=192.168.159.132
datadir=/home/db

[ndbd]
NodeId=4
hostname=192.168.159.137
datadir=/home/db

[mysqld]
NodeId=20
hostname=192.168.159.133

I know replication means make a backup of each row to another server and I don't want it (I just want to spread my data (rows) to 4 servers, not making backup) so I set NoOfReplicas to 1 to have 4 nodegroups ...

Am I right ? Does my data (rows) spread on that 4 data-node servers ?

If I right, what is different between doing this, or just using Innodb and make foreign partitions (on remote servers) ? Because it's a lot easier to make partition (key) by my own and attach each partition to remote servers !!!

Is using NDB has more advantage (about what I want to do (spread data, not replication (making backup))) ?

And final question, is it right to have NDBCLUSTER table with also own partitions ???!!! What is that mean, anyway ? For example, I have 4 data-nodes and for a NDBCLUSTER table, I create 8 partitions too (key partition) ! How that's work? Does it create 8 partitions on each data-node ? Or just split those partitions on those 4 data-nodes server (for example, each data-node holds 2 partitions) ?


Solution

  • Row spread

    Yes, by using 4 data nodes (ndbmtd) and NoOfReplicas=1 you will have a 4 node groups and the rows will be spread out over the four nodes, and each row will only be on one data node.

    Worth mention for others reading this answer is that using NoOfReplicas=1 will not allow reconfiguring or upgrading the data nodes will cluster is online, also will cluster go off line if any of the nodes fail. In short the high availability properties that NDB Cluster supports are suppressed.

    8 partitions

    By explicitly create 8 partitions (for example by using PARTITION BY KEY() PARTITIONS 8) you will get 8 partitions, 2 on each data node.

    Using an explicit partition count is not recommended. There is typically no gain doing that, and it does not support reorganize the partitions if one later decides to add more data nodes.

    It is recommended to use the default partitioning, that is not using any partitioning clause at all. That will deduce the number of partitions from the cluster configuration. For example if a data nodes uses several threads the number of partitions will increase to match the number of so called LDM threads in the data nodes.

    NDB vs InnoDB

    NDB is by design a distributed database system and the application would not need to know about the distribution.

    In front of the NDB cluster one can have one MySQL server (mysqld) or more. If more than one are used application can still read and write any row from any of the MySQL servers. For InnoDB you need 4 MySQL servers and application need to know which server have which row and need to connect to the right server to access a row.

    With NDB cluster transactions are distributed that may access rows from different data nodes. For InnoDB application would need to use XA transactions.