I'm creating a new app and this time using MySQL instead of cassandra because it's much harder to learn and setup. I learned about MySQL clusters and chose to use this database because all my devs are very familiar with the MySQL query syntax and database structure.
But when MySQL is told to get a record off of multiple nodes, does it tell all the nodes to search for the record or is there some way it has to know what nodes the data is stored on?
This feature is called coordinator node and it's in cassandra but I couldn't find anything on it for MySQL
Note there is no way from MySQL to explicitly request a record from multiple nodes as in question:
mysql is told to get a record off of multiple nodes
Of course a Ndb cluster typically consists of several data nodes which have the same data, but one can not in a MySQL query request records from explicit data nodes.
Short answer:
A record is only fetched from one data node.
There is no asking for the same data from several data nodes occurring.
The details of the data distribution of a table is part of the Ndb internal definition of the table.
In MySQL Cluster there is no special coordinator node needed for row data retrieval or modification.
Slighlty more details:
All data nodes and also all NdbAPI nodes, such as MySQL servers (mysld), know about how what data is distributed on which data nodes.
The unit of distribution is a table partition, and each partition will have a replica (copy) on each data node within a node group. (There are also fully replicated tables there each partition have a replica on all data nodes.)
If you ask for a record by a primary key, the record will be fetched from one data node.
If you ask for a record by a unique key using the secondary index, the primary key will first be fetched from one data node using the secondary index, and then that primary key will be used internally by data node to fetch the record from one data node (which could be the same data node).
If you ask for a record by some other condition several scans of the tables partitions will be issued, but each partition will only be scanned by one data node.
Some more details:
The mysqld or any other NdbAPI node does not actually directly request the record from a data node that has it.
Rather it connects to a data node to act as a transaction coordinator, TC.
The TC data node in turn requests the actual record from suitable data node which will send the record directly back to the requesting mysqld without passing the TC.
In the case mysqld transaction starts with a key lookup, that information is used to select a transaction coordinator on a data node there the record is to avoid an extra network hop.
Note:
There is also a notion of proximity of nodes that Ndb can use to optimize its selection of TC data node. Like if mysqld is on same host as a data node, or configured to be in the same location domain as some data node(s).