I'm working on a project that fetches certain metrics from Solr, stores them on an index on Elastic Search and further represents them graphically on Grafana. Had certain queries on garbage collection in Solr, they're as follows:
Thanks in advance!
GC should not alter application logic itself: metrics like hits depends on the contents of Solr's datastore, not on whether GC runs, or how. Errors might be influenced by GC behaviour, mainly timeouts caused by large GC pauses, which is what I'd say you need to track first.
I would focus on GC times (both new and old generation) as these will be the major disruption to the application (Solr). You can do this via JMX (there is extensive documentation here: https://wiki.apache.org/solr/SolrJmx, you can find GC beans under java.lang:type=GarbageCollector). Just hook your profiler or monitoring tool of choice.
On the console you can dump GC metrics using "jstat -gc $PID", where the most relevant metrics could be ([docs][1]):
YGC: Number of young generation garbage collection events.
YGCT: Young generation garbage collection time.
FGC: Number of full GC events.
FGCT: Full garbage collection time.
Note that times are cumulative. If you add an interval to the jstat command it'll output stats continuously.
Based on these numbers, if you experience GC pause spikes, and those correlate to Solr Error spikes you can look into reducing them.
While we're at it, another good way to track GC times is to enable:
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDetails
Which will log more specific details for each GC, including times, as well as duration of each pause (caused by GC but also other reasons) in lines like this:
Total time for which application threads were stopped: 0.1135087 seconds
Let me know if that answers your question.