The following manifest creates a Prometheus server with two replicas and two shards:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: prometheus
name: prometheus
namespace: default
spec:
serviceAccountName: prometheus
replicas: 2
shards: 2
serviceMonitorSelector:
matchLabels:
team: frontend
What is the difference between replicas
and shards
?
Sharding in Prometheus involves splitting the metrics across multiple servers, to improve performance (especially query performance) and scalability. Each shard is responsible for collecting and storing a subset of the total metrics.
Replication involves creating multiple copies of the data across multiple servers, to increase availability and fault tolerance. Each replica contains a full copy of the data, and any changes made to one replica are eventually propagated to the others.
This is true for any app - shard and replication are generic concepts used to describe this and not something specific to prometheus. This is widely used in Databases.