elasticsearchvega-liteopensearchvegawazuh

How to have graphs overlapping in vega-lite [wazuh - elk - opensearch]


I'm trying to create a single chart on wazuh via Vega visualization that allows me to show two overlapping charts. In input I take logs in which the date (date_id) is reported in the form of a string with YYYY-MM-DD format and an integer month_total corresponding to the number of monthly bans carried out on a telegram channel. My aim would be to show overlaid both the monthly ban line graph and a linear regression graph (for the same monthly bans) so as to understand the trend.

My problem, however, is that I can build both graphs individually but then I can't make them appear overlapped. I guess the problem is that I can't get a single X-axis to be used that has the same data format and range. In fact, as you can see from the photos below, if I use two different date formats then the graphs are at least shown next to each other (but that's not what I want anyway) while if I use the same format then the regression line takes the upper hand on the other graph which is no longer shown.

Real graph that I have now


The graph that I would like to have


When I change the date formats to the same format

In this last graph, for example, I imagine that the problem of non-overlap is given by the fact that the regression line is actually made up of many dates within itself, so much so that they are also shown graphically. In your opinion, is it possible to request that only the two extreme values of the regression line be shown so that perhaps the X axis can be identical for the two graphs?

Or do you perhaps know other ways that allow such overlap? Thank you very much in advance for your help!

PS: This is my vega code:

{
  $schema: https://vega.github.io/schema/vega-lite/v5.json
  description: Linear Regression Line Graph for Telegram ban
  data: {
    url: {
      index: wazuh-alerts-*
      body: {
        query: {
          bool: {
            must: [
              {
                match: {
                  data.last_day_of_month: "true"
                }
                match: {
                  data.last_day_of_month: "true"
                }
              }
              %dashboard_context-must_clause%
              {
                range: {
                  data._id: {
                    %timefilter%: true
                  }
                }
              }
            ]
          }
        }
        sort: [
          {
            data._id: {
              order: asc
            }
          }
        ]
        size: 10000
        _source: [
          data
        ]
      }
    }
    format: {
      property: hits.hits
    }
  }
  transform: [
    {
      calculate: datum._source.data._id
      as: date_id
    }
    {
      calculate: datum._source.data.month_total
      as: month_total
    }
    {
      filter: datum.date_id != null && datum.month_total != null
    }
  ]
  layer: [
    {
      mark: point
      encoding: {
        x: {
          field: date_id
          type: nominal
          //title: Data
          axis: {
            grid: true
          }
        }
        y: {
          field: month_total
          type: quantitative
        }
        tooltip: [
          {
            field: date_id
            type: nominal
            title: Data
          }
          {
            field: month_total
            type: quantitative
            title: Totale mese
          }
        ]
      }
    }
    {
      mark: line
      encoding: {
        x: {
          field: date_id
          type: nominal
        }
        y: {
          field: month_total
          type: quantitative
        }
        color: {
          value: red
        }
      }
    }
    {
      transform: [
        {
          calculate: utcParse(datum.date_id, '%Y-%m-%d')
          as: date
        }
        {
          regression: month_total
          on: date
          method: linear
        }
      ]
      mark: line
      encoding: {
        /*
        // Code used when the regression line uses the YYYY-MM-DD format and does not allow the display of the other graph
        x: {
          field: date
          type: temporal
          format: %Y-%m-%d
          scale: {
            type: utc
          }
          axis: {
            labelExpr: timeFormat(datum.value, '%Y-%m-%d')
          }
        }
        */
        x: {
          field: date
          type: nominal
        }
        y: {
          field: month_total
          type: quantitative
        }
        color: {
          value: blue
        }
        tooltip: [
          {
            field: date
            type: temporal
            format: %Y-%m-%d
            scale: {
              type: utc
            }
            title: Data
          }
          {
            field: month_total
            type: quantitative
            title: Totale mese
          }
        ]
      }
    }
  ]
}

And this is an input log example:

{
  "_index": "wazuh-alerts-4.x-2024.12.16",
  "_id": "xKZOz5MBNpnkM_7VuEE0",
  "_version": 1,
  "_score": 0,
  "_source": {
    "input": {
      "type": "log"
    },
    "timestamp": "2024-12-16T11:50:43.536+0000",
    "source": "wazuh",
    "@version": "1",
    "manager": {
      "name": "wazuh.manager"
    },
    "data": {
      "_id": "2016-12-31",
      "last_day_of_month": "true",
      "month_total": "2652",
      "banned_today": "110"
    },
    "location": "API-Webhook",
    "full_log": "Dec 16 12:50:43 kali telegram: {\"_id\": \"2016-12-31\", \"banned_today\": \"110\", \"month_total\": \"2652\", \"last_day_of_month\": true}",
    "predecoder": {
      "program_name": "telegram",
      "timestamp": "Dec 16 12:50:43",
      "hostname": "kali"
    },
    "rule": {
      "firedtimes": 2893,
      "level": 3,
      "description": "Scraper Telegram per ban giornalieri canali",
      "groups": [
        "telegram"
      ],
      "mail": false,
      "id": "100004"
    },
    "@timestamp": "2024-12-16T11:50:43.536Z",
    "agent": {
      "id": "000",
      "name": "wazuh.manager"
    },
    "id": "1734349843.963034",
    "decoder": {
      "name": "telegram"
    }
  },
  "fields": {
    "rule.id": [
      "100004"
    ],
    "source": [
      "wazuh"
    ],
    "full_log": [
      "Dec 16 12:50:43 kali telegram: {\"_id\": \"2016-12-31\", \"banned_today\": \"110\", \"month_total\": \"2652\", \"last_day_of_month\": true}"
    ],
    "data.month_total": [
      "2652"
    ],
    "manager.name": [
      "wazuh.manager"
    ],
    "predecoder.timestamp": [
      "Dec 16 12:50:43"
    ],
    "@version": [
      "1"
    ],
    "agent.name": [
      "wazuh.manager"
    ],
    "id": [
      "1734349843.963034"
    ],
    "data.banned_today": [
      "110"
    ],
    "timestamp": [
      "2024-12-16T11:50:43.536Z"
    ],
    "data.last_day_of_month": [
      "true"
    ],
    "predecoder.program_name": [
      "telegram"
    ],
    "data._id": [
      "2016-12-31"
    ],
    "predecoder.hostname": [
      "kali"
    ],
    "input.type": [
      "log"
    ],
    "rule.description": [
      "Scraper Telegram per ban giornalieri canali"
    ],
    "rule.mail": [
      false
    ],
    "@timestamp": [
      "2024-12-16T11:50:43.536Z"
    ],
    "agent.id": [
      "000"
    ],
    "decoder.name": [
      "telegram"
    ],
    "location": [
      "API-Webhook"
    ],
    "rule.firedtimes": [
      2893
    ],
    "rule.groups": [
      "telegram"
    ],
    "rule.level": [
      3
    ]
  }
}

Solution

  • The vega-lite code suggests your x axis is being graphed this way because it's as a type of nominal. A linear regression line has to be plotted against a quantitative variable, it won't behave as you expect to run against a category. Would it work to parse your dates as times or numbers instead of as ordinal categories?

    Here are two public examples that may help you put the necessary pieces together