I am running fluentd as a DaemonSet in a Kubernetes cluster. fluentd writes the log entries to OpenSearch. Take a look at https://github.com/fluent/fluentd-kubernetes-daemonset
I must lay some background before my question: The way it works is that Kubernetes pods write to stdout, and the container runtime writes this to a certain location, namely /var/logs/pods/<pod_specific_location>
. The format of these log files is as such:
31-12-23T12:00:00.123456Z stdout F my great log message
Now, fluentd is configured to pick it from there, and using the cri parser plugin transforms it to:
{
"time": "31-12-23T12:00:00.123456Z",
"stream": "stdout",
"logtag": "F",
"message": "my great log message"
}
Now, say I run a pod in my cluster that writes the following log message:
hello
Further down the road, using the kubernetes metadata plugin, fluentd enriches this record with kubernetes metadata, such as namespace name, pod name, etc. etc., so it'll look something like:
{
"stream":"stdout",
"logtag":"F",
"time":"31-12-23T12:00:00.123456Z",
"message": "my great log message"
"docker":
{"container_id":"9077644273956d3f3e9d171240f412b3b6e959984a5fd99adfcb77f9b998a370"},
"kubernetes":
{"container_name":"demo-app",
"namespace_name":"foo",
"pod_name":"foo-ns-app",
"container_image":"docker.io/yoavklein3/net-tools:latest",
"container_image_id":"docker.io/yoavklein3/net-tools@sha256:3fd9646a14d97ecc2d236a5bebd88faf617bc6045f1e4f32c49409f1c930879a",
"pod_id":"a69fb942-c0ab-457d-b752-ffa3fa27e574",
"pod_ip":"10.0.2.224",
"host":"ip-10-0-2-5.ec2.internal",
"master_url":"https://172.20.0.1:443/api",
"namespace_id":"6bdf5fe9-9a5a-4501-ab6c-deddd241e071",
"namespace_labels":{"kubernetes.io/metadata.name":"foo"}}}
Now, using the opensearch plugin it is sent to Opensearch.
Now, when I open Opensearch Dashboards, I can see a field called @timestamp
, and I just can't figure out where this field comes from:
This is a document in OpenSearch (apologies for not sticking to the example above exactly, but the concept remains the same):
{
"_index": "logstash-2023.06.06",
"_type": "_doc",
"_id": "sVHjj4gByMQm1Wc45hv2",
"_version": 1,
"_score": null,
"_source": {
"stream": "stdout",
"logtag": "F",
"time": "2023-06-06T08:47:35.874884092Z",
"docker": {
"container_id": "9077644273956d3f3e9d171240f412b3b6e959984a5fd99adfcb77f9b998a370"
},
"kubernetes": {
"container_name": "demo-app",
"namespace_name": "foo",
"pod_name": "foo-ns-app",
"container_image": "docker.io/yoavklein3/net-tools:latest",
"container_image_id": "docker.io/yoavklein3/net-tools@sha256:3fd9646a14d97ecc2d236a5bebd88faf617bc6045f1e4f32c49409f1c930879a",
"pod_id": "a69fb942-c0ab-457d-b752-ffa3fa27e574",
"pod_ip": "10.0.2.224",
"host": "ip-10-0-2-5.ec2.internal",
"master_url": "https://172.20.0.1:443/api",
"namespace_id": "6bdf5fe9-9a5a-4501-ab6c-deddd241e071",
"namespace_labels": {
"kubernetes.io/metadata.name": "foo"
}
},
"data": "This is from FOO namespace",
"@timestamp": "2023-06-06T08:47:35.882677347+00:00",
"tag": "kubernetes.var.log.containers.foo-ns-app_foo_demo-app-9077644273956d3f3e9d171240f412b3b6e959984a5fd99adfcb77f9b998a370.log"
},
"fields": {
"@timestamp": [
"2023-06-06T08:47:35.882Z"
],
"time": [
"2023-06-06T08:47:35.874Z"
]
},
"sort": [
1686041255882
]
}
NOTE: the message
field is missing, and there's the data
field instead. This is due to parsing the message
field as JSON. You can ignore this, it's completely irrelevant, just noting if you're confused.
I don't think that the source of this @timestamp
field is the Opensearch plugin. Why? because when I run fluentd with opensearch NOT in a kubernetes cluster, but rather using other input plugins, I can't see this field.
I can see a field called @timestamp, and I just can't figure out where this field comes from...
This field is added by the opensearch plugin, the value is the point in time when the message is ingested.
The field is only added if either logstash_format
is true or include_timestamp
is true.