When running a CockroachDB cluster, how can I view the disk bandwidth being consumed by nodes? This can be useful to tell if disk bandwidth is a bottleneck. Also, is there any visibility into the source of the disk writes?
CockroachDB collects write metrics both from the host's operating system and internally through its own accounting of writes. In the CockroachDB DB Console, the "Hardware" dashboard includes 'Disk Write' and 'Disk Write iops' graphs. These are the metrics reported from the operating system, including write volume external to the process.
These graphs can be a great first step gain visibility into the write volume in your node. If you're concerned that your nodes are hitting their bandwidth or IOPs limits, look for plateaus. You can also drill down to a single node, view the maximum throughput usage and compare that to the documented limits for your storage medium.
If you want to drill deeper, CockroachDB records write volume of many internal operations. In the DB Console, click on "Advanced Debug" on the left menu. Then select "Custom Time Series Chart." Here you can create custom graphs pulling in metrics that aren't surfaced in some of the premade dashboards. Some of the disk-bandwidth metrics available are:
rocksdb.compacted-bytes-written
: This metric records write volume for storage engine compactions. Compactions run in the background and keep the storage engine organized so that reads are fast. The more data being written to the database, the more compactions will need to write.rocksdb.flushed-bytes
: This metric records write volume for storage engine flushes. All data written to CockroachDB is first written to an append-only write ahead log in the order the data is received, and added to an in-memory 'memtable'. When enough records are accumulated in the 'memtable,' they're flushed to a sorted format. This records that amount. If flushed bytes is high, then a lot of new data is being written to storage, either from queries, jobs or internal systems.rocksdb.ingested-bytes
: This metric records write volume for bulk operations. These writes are primarily from node rebalancing, IMPORTs or RESTOREs.sys.host.disk.write.bytes
: This is the same metric from the hardware dashboard, and captures the write volume as reported by the host operating system.timeseries.write.bytes
: The metrics that CockroachDB collects are recorded within CockroachDB itself as well. This captures the volume of those writes.