I want to set up a local cluster of TiDB for the benchmark. Here are some my doubts:
Can multiple TiDB instances connect to the same PD and TiKV cluster?
Yes, you can add as many tidb-servers as you want to fulfill your needs.
If positive, will transactions submitted to different TiDB instances satisfy snapshot isolation level?
Yes, TiDB is a distributed database which provides snapshot isolation by default. And different transactions from different tidb-servers can also satisfy the snapshot isolation level. TiDB uses the Percolator transaction model to implement the distributed transaction. For more implementation details, you can refer to this article: https://pingcap.com/blog/2016-11-17-mvcc-in-tikv/
At the storage layer, does each TiKV node keep the entire dataset? (The replication factor is equal to the TiKV node number?)
No. TiDB internally shards table into small range-based chunks that we refer to as "regions". Each region defaults to approximately 100MiB in size. The replication factor is default to 3. Each tikv-server in the cluster holds hundreds of thousands of regions.
If negative, how to configure the replication factor?
PD reads the configuration file (conf/pd.yml) and uses the max-replicas configuration in it. For more detail, you can refer to https://github.com/pingcap/docs/blob/master/FAQ.md#is-the-number-of-replicas-in-each-region-configurable-if-yes-how-to-configure-it