azureterraformkql

Applying a KQL query to the transform_kql parameter through terraform


I have been asked to use terraform to create a Data Collection Rule that includes a Data source for a custom table, requiring a KQL query (transformKql) to properly format the incoming data. In summary, I cannot figure out how to code the KQL query. Here is the KQL Query that works if entered directly into the Azure Portal:

source | extend d=todynamic(RawData) | extend TimeCreated = extract(@"\d*[-]\d*[-]\d*\s\d*[:]\d*[:]\d*[.]\d*",0,RawData) | extend AppPriority = tostring(split(d,"-",3)) | extend ReqId = tostring(split(d,"-",4)) | extend Prog = tostring(split(d,"-",5)) | extend Msg = tostring(split(d,"-",6))

Here's the same string with the escape characters. This is how the string appears in the tf file:

source | extend d=todynamic(RawData) | extend TimeCreated = extract(@\"\\d*[-]\\d*[-]\\d*\\s\\d*[:]\\d*[:]\\d*[.]\\d*\",0,RawData) | extend AppPriority = tostring(split(d,\"-\",3)) | extend ReqId = tostring(split(d,\"-\",4)) | extend Prog = tostring(split(d,\"-\",5)) | extend Msg = tostring(split(d,\"-\",6))

Any help/suggestion is appreciated.

I keep getting the following error:

azurerm_monitor_data_collection_rule.dcr_name: Creating...
╷
│ Error: creating Data Collection Rule (Subscription: "11111111-2222-3333-4444-555555555555"
│ Resource Group Name: "rg_name"
│ Data Collection Rule Name: "dcr-name"): datacollectionrules.DataCollectionRulesClient#Create: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidPayload" Message="Data collection rule is invalid" Details=[{"code":"InvalidTransformQuery","message":"Error occurred while compiling query in query: SemanticError:0x00000006 at 1:30 : Undefined symbol: RawData","target":"properties.dataFlows[0]"}]

I've also tried using the Heredoc formatting without success.


Solution

  • After considering different scenarios of KQL query, I found that the issue with the todynamic() function used in the source query. If I removed that specific function, it worked in all the scenarios as shown below.

    If RawData is not a proper JSON or a compatible dynamic type, then converting it to dynamic might cause a conflict. So, check the type of the data does RawData has and then try the conversion from one type to another.

    Run source | project typeof(RawData) in KQL log analytics workspace to retrieve the type of the content and then apply the type conversion in terraform.

    By referring to the sample template azurerm_monitor_data_collection_rule and also sample data collection rules from MSDoc, I have given the below scenarios of queries as I mentioned above and worked as expected according to their respective type conversions.

    1.

    resource "azurerm_monitor_data_collection_rule" "example" {
      name                        = "jahflampul23"
      resource_group_name         = azurerm_resource_group.example.name
      location                    = azurerm_resource_group.example.location
      data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.example.id
    
      destinations {
        log_analytics {
          workspace_resource_id = azurerm_log_analytics_workspace.example.id
          name                  = "example-destination-log"
        }
    
        event_hub {
          event_hub_id = azurerm_eventhub.example.id
          name         = "example-destination-eventhub"
        }
    
        storage_blob {
          storage_account_id = azurerm_storage_account.example.id
          container_name     = azurerm_storage_container.example.name
          name               = "example-destination-storage"
        }
    
        azure_monitor_metrics {
          name = "example-destination-metrics"
        }
      }
      data_flow {
        streams      = ["Microsoft-InsightsMetrics"]
        destinations = ["example-destination-metrics"]
      }
    
      data_flow {
        streams      = ["Microsoft-InsightsMetrics", "Microsoft-Syslog", "Microsoft-Perf"]
        destinations = ["example-destination-log"]
      }
    
      data_flow {
        streams       = ["Custom-MyTableRawData"]
        destinations  = ["example-destination-log"]
        output_stream = "Microsoft-Syslog"
        transform_kql = "source | extend data = RawData"
      }
    
      data_sources {
        syslog {
          facility_names = ["*"]
          log_levels     = ["*"]
          name           = "example-datasource-syslog"
          streams        = ["Microsoft-Syslog"]
        }
    
        iis_log {
          streams         = ["Microsoft-W3CIISLog"]
          name            = "example-datasource-iis"
          log_directories = ["C:\\Logs\\W3SVC1"]
        }
    
        log_file {
          name          = "example-datasource-logfile"
          format        = "text"
          streams       = ["Custom-MyTableRawData"]
          file_patterns = ["C:\\JavaLogs\\*.log"]
          settings {
            text {
              record_start_timestamp_format = "ISO 8601"
            }
          }
        }
    
        performance_counter {
          streams                       = ["Microsoft-Perf", "Microsoft-InsightsMetrics"]
          sampling_frequency_in_seconds = 60
          counter_specifiers            = ["Processor(*)\\% Processor Time"]
          name                          = "example-datasource-perfcounter"
        }
    
        windows_event_log {
          streams        = ["Microsoft-WindowsEvent"]
          x_path_queries = ["*![System/Level=1]"]
          name           = "example-datasource-wineventlog"
        }
    
        extension {
          streams            = ["Microsoft-WindowsEvent"]
          input_data_sources = ["example-datasource-wineventlog"]
          extension_name     = "example-extension-name"
          extension_json = jsonencode({
            a = 1
            b = "hello"
          })
          name = "example-datasource-extension"
        }
      }
      stream_declaration {
        stream_name = "Custom-MyTableRawData"
        column {
          name = "TimeGenerated"
          type = "datetime"
        }
        column {
          name = "RawData"
          type = "string"
        }
        column {
          name = "Properties"
          type = "dynamic"
        }
      }
      depends_on = [
        azurerm_log_analytics_solution.example
      ]
    }
    

    enter image description here

    2.

     data_flow {
      streams       = ["Custom-MyTableRawData"]
      destinations  = ["example-destination-log"]
      output_stream = "Microsoft-Syslog"
      transform_kql = "source | extend data = tostring(AdditionalContext) | project TimeGenerated = Time, computer"
    }
    

    enter image description here