I am learning raft. by far, I understand that when client make change state request to to a raft system, leader node need to replicate changes to majority nodes, and at a fixed heartbeat interval.
I assume any client will see a latency bigger than the interval.
While Etcd is a raft implementation, and it's default heartbeat interval was 100ms, I expect a latency greater than 100ms when write a key-value pair to it.
I wrote some code to prove it:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.westfarmer</groupId>
<artifactId>raft-etcd-observation</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>raft-etcd-observation</name>
<description>Project to prove concepts of raft via real world etcd cluster</description>
<dependencies>
<dependency>
<groupId>io.etcd</groupId>
<artifactId>jetcd-core</artifactId>
<version>0.6.1</version>
</dependency>
<dependency>
<groupId>io.etcd</groupId>
<artifactId>jetcd-launcher</artifactId>
<version>0.6.1</version>
</dependency>
</dependencies>
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
</properties>
</project>
package cn.westfarmer.raft.poc;
import java.util.concurrent.ExecutionException;
import io.etcd.jetcd.ByteSequence;
import io.etcd.jetcd.Client;
import io.etcd.jetcd.KV;
import io.etcd.jetcd.launcher.Etcd;
import io.etcd.jetcd.launcher.EtcdCluster;
public class RaftETCDClusterTest {
public static void main(String[] args) throws InterruptedException, ExecutionException {
System.out.println("----------- Etcd Raft Observation Begin -----------");
try(EtcdCluster cluster = Etcd.builder().withNodes(3).build()){
cluster.start();
Client client = Client.builder().endpoints(cluster.clientEndpoints()).build();
KV kvClient = client.getKVClient();
final int times = 100; // 循环次数
final long[] latencies = new long[100];
for(int i = 0; i< times; i++) {
ByteSequence key = ByteSequence.from(("test_key" + i).getBytes());
ByteSequence value = ByteSequence.from(("test_value" + i).getBytes());
long ts = System.currentTimeMillis();
kvClient.put(key, value).get();
latencies[i] = System.currentTimeMillis() - ts;
System.out.println("latency:" + latencies[i] + "ms");
}
cluster.stop();
}
}
}
but the results is beyond my expectation:
----------- Etcd Raft Observation Begin -----------
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
SLF4J: Defaulting to no-operation MDCAdapter implementation.
SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for further details.
latency:254ms
latency:7ms
latency:4ms
latency:5ms
latency:11ms
latency:10ms
latency:5ms
latency:5ms
latency:7ms
latency:8ms
latency:7ms
latency:5ms
latency:7ms
latency:6ms
latency:6ms
latency:5ms
latency:6ms
latency:4ms
latency:5ms
latency:4ms
latency:5ms
latency:5ms
latency:6ms
latency:4ms
latency:4ms
latency:3ms
latency:5ms
latency:4ms
latency:4ms
latency:3ms
latency:5ms
latency:5ms
latency:4ms
latency:4ms
latency:4ms
latency:4ms
latency:4ms
latency:4ms
latency:4ms
latency:4ms
latency:4ms
latency:3ms
latency:4ms
latency:4ms
latency:3ms
latency:4ms
latency:3ms
latency:4ms
latency:4ms
latency:3ms
latency:3ms
latency:5ms
latency:4ms
latency:3ms
latency:3ms
latency:3ms
latency:4ms
latency:3ms
latency:3ms
latency:4ms
latency:3ms
latency:3ms
latency:3ms
latency:4ms
latency:4ms
latency:8ms
latency:5ms
latency:3ms
latency:6ms
latency:5ms
latency:2ms
latency:3ms
latency:4ms
latency:6ms
latency:3ms
latency:3ms
latency:4ms
latency:4ms
latency:3ms
latency:3ms
latency:3ms
latency:3ms
latency:4ms
latency:2ms
latency:3ms
latency:2ms
latency:3ms
latency:4ms
latency:3ms
latency:2ms
latency:3ms
latency:2ms
latency:3ms
latency:3ms
latency:3ms
latency:3ms
latency:3ms
latency:3ms
latency:5ms
latency:4ms
why the first write have 200ms+ latency ? How can etcd have such low latency for write ?
The first one is probably cold start issue - takes time for all data structures to warm up, connections to be established, etc.
As for the latency, it is not typical for a leader to wait to replicate data, usually data is replicated as soon as a follower is ready to get it (typically, when previous request was processed).
The heart beat time out is the upper limit for time between AppendEntriedRPC on followers to make sure no unnecessary elections are happening.
Btw, https://raft.github.io/ has a list of libraries, you could see how those are implemented.