elasticsearchfilebeat

Strip array off of ndjson data set using elasticsearch pipeline


I have data being ingested into elasticsearch (currently using version 7.3) sent to it from filebeat (7.3). The logs are in ndjson format.

A section of the data is in the format {"tagset": {"username": { "domain\\username": []}} The array is empty in all cases.

("domain" and "username" being the actual domain\username of the user in the domain. Meaning, the "username" in this sense is always different for every log entry.)

The "domain\username" is really the value of the "username" key. But it is being treated as an array in this dataset and is thus being indexed wrongly in elasticsearch.

I am trying to strip the array off of the "domain\username" and make "domain\username" the value of "username" (or the value of another field that can be made/set).

Currently, I am not using logstash but rather trying to handle this using an elasticsearch pipeline. (Though, using logstash is not totally out of the question). I have tried changing this using grok and other methods to no avail (i.e. I am probably doing something wrong).

Thanks in advance for any assistance.


Solution

  • The following solved the issue. Seems messy and may not be the best way of going about it, but it works:

    PUT _ingest/pipeline/my_pipeline
    {
        "description":"",
        "processors":[
        {
            "set":{
                "field":"user_name",
                "value":"{{ tagset.username }}"
            }
        },
        {
            "remove":{
                "field":"tagset.username",
                "ignore_failure":true
            }
        },
        {
            "convert":{
                "field":"user_name",
                "type":"string"
            }
        },
        {
            "gsub":{
                "field":"user_name",
                "pattern":"""\{DOMAIN\\\\""",
                "replacement":""
            }
        },
        {
            "gsub":{
                "field":"user_name",
                "pattern":"=\\[\\]\\}",
                "replacement":""
            }
        }]
    }
    

    This produces a value that is just the username minus the domain name, slashes and brackets.