I am setting up Logstash to ingest Airflow logs. The following config is giving me the output I need:
input {
file {
path => "/my_path/logs/**/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
if [path] =~ /\/my_path\/logs\/containers\/.*/ or [path] =~ /\/my_path\/logs\/scheduler\/.*/ {
drop{}
}
else {
grok {
"match" => [ "message", "\[%{TIMESTAMP_ISO8601:log_task_execution_datetime}\]%{SPACE}\{%{DATA:log_file_line}\}%{SPACE}%{WORD:log_level}%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}" ]
"remove_field" => [ "message" ]
}
date {
"match" => [ "log_task_execution_datetime", "ISO8601" ]
"target" => "log_task_execution_datetime"
"timezone" => "UTC"
}
dissect {
"mapping" => { "path" => "/my_path/logs/%{dag_id}/%{task_id}/%{dag_execution_datetime}/%{try_number}.%{}" }
"add_field" => { "log_id_template" => "{%{dag_id}}-{%{task_id}}-{%{dag_execution_datetime}}-{%{try_number}}" }
}
}
}
output {
stdout {codec => rubydebug{metadata => true}}
}
But I do not like having to specify the path "/my_path/logs/" multiple times. In my input section, I tried to use:
add_field => { "[@metadata][base_path]" => "/my_path/logs/" }
and then, in the filter section:
if [path] =~ /[@metadata][base_path].*/ or [path] =~ /[@metadata][base_path].*/ {
drop{}
}
...
dissect {
"mapping" => { "path" => "[@metadata][base_path]%{dag_id}/%{task_id}/%{dag_execution_datetime}/%{try_number}.%{}" }
But it doesn't seem to work for the regex in the filter or in the dissect mapping. I get a similar issue when trying to use an environment variable as described here.
I have the - maybe naïve - notion that I should be able to use one variable for all references to the base path. Is there a way?
Using an environment variable in a conditional is not supported. There has been a github issue requesting it as an enhancement open since 2016. The workaround is to use mutate+add_field to add a field to [@metadata] then test that.
"mapping" => { "path" => "${[@metadata][base_path]}%{dag_id}/%{task_id} ...
should work. The terms in a conditional are not sprintf'd, so you cannot use %{}, but you can do a substring match. If FOO is set to /home/user/dir then
mutate { add_field => { "[@metadata][base_path]" => "${FOO}" } }
mutate { add_field => { "[path]" => "/home/user/dir/file" } }
if [@metadata][base_path] in [path] { mutate { add_field => { "matched" => true } } }
results in the [matched] field getting added. I do not know of a way to anchor the string match, so if FOO were set to /dir/ then that would also match.