elasticsearch database-design application-design

ELK indexes design according to development goal

I used elastic in the past to analyze logs but I don't have any experience in elastic "architecture". I have an application that I deployed to multiple machines (200+). I want to connect to each machine and gather metadata like logs, metrics, db stats and so on..

With that data I want to be able to :

Find problems in each machine and notify about them (finding problems requires joining data between different sources, for example, finding exception in log1 requires me to go check the db)
Analyze common issues for all machines and implement ML model that will be able to predict issues.

I need to create indexes, and I thought about 2 options:

Create one index per each machine and then all the data related to each machine will be available in its index.
Create index per data source. For example, all db logs from all machines will be available in one dedicated index. Another index will contain only data that related to machine metrics (cpu/ram usage..)

What would be the best to create those indexes?

Solution

Ok, now that I got a better understanding of your needs, here's my suggestion:

I strongly recommend to not create an index per machine. I don't know much about your use case(s) but I assume you want to search the data either in kibana or by implementing search requests in your application.

Let's say you are interested in the ram usage of every machine. You would need to execute 200 search requests against elasticsearch since the data (ram usage) is spread over 200 indices (of course one could create aliases but these had to be updated for every new machine). Furthermore you wouldn't be able to do basic aggregations like which machine has the highest ram usage? in a convenient way. In my opinion there are plenty more disadvantages like index-management, shard-allocation etc.

So what's a better solution?

As you have already suggested, you should create an index per datasource. With that, your indices have a dedicated "purpose", e.g. one index that stores database data, the other system metrics and so on. Referring to my examples above, you only would need to execute one search request to determine a) the ram usage of every machine and b) which machine has the highest ram usage. However, this would require, that every document contains a field that references the particular host like so:

PUT metrics/_doc/1
{
  "system":{
    "ram": {
      "usage": "45%",
      "free": "55%"
    }
  },
  "host":{
    "name": "YOUR HOSTNAME",
    "ip": "192.168.17.100"
  }
}

In addition to that I recommend using daily indices. So instead of creating one huge index for the system metrics you would create an index for every day like metrics-2020.01.01, metrics-2020.01.02 and so on. This approach has the following advantages:

your indices will be much smaller in size, making them better to manage and (re-)allocate.
after some time period, you can roughly estimate the data size and be able to define the number of shards much better. With only one huge index, you would constantly need to update the number of shards in order to handle your requests in a fast way.
furthermore, you can search your data on a day-by-day basis in a convenient way.
you are able to setup ilm-policies to automate the maintenance of your indices, e.g. delete metrics-indices that are older than X days.
...

I hope I could help you!