unicodekqlazure-data-explorerazure-sentinel

Is there a replace multiple / parse unicode in string function?


I'm working with AZ/ KQL and I'm trying to find out if there is a function to be able to replace multiple values in a string, or alternatively (and better still) a function to replace unicode to string in a log line.

For instance I have the log:

["\u0027\"\u003e\u003csvg/onload=alert(document.domain)\u003e"]

And I need to replace the \u0027 values to string, so for instance this log, ideally ends up looking something like:

["'\"><svg/onload=alert(document.domain)>"]

For context, the log source is the AWS connector for AZ sentinel, which doesn't seem to have a function (I thought perhaps I was missing the function) but to me it looks like CloudTrail can't handle the true values in the logfile, so defaults to the Unicode, but parsing them out this side seems impossible atm.

I've tried the unicode_codepoints_to_string(), replace_string(), iff/iif() followed up with replace_string(), with the following code:

| extend parsd = iff(tolower(AdditionalEventData) contains @'\u003e', replace_string(tostring(AdditionalEventData),@'\u003e',@'>'), AdditionalEventData)
| extend parsd = iff(tolower(parsd) contains @'\u003c', replace_string(tostring(AdditionalEventData),@'\u003c',@'<'), parsd)

and

| extend parsd = replace_regex(AdditionalEventData,@'U+.\d{1,5}',unicode_codepoints_to_string(AdditionalEventData))

I tried the deprecated make_string() operator, but again, all of these were majority against the entire string, I am guessing I would need to do some kind of indexof(substring( but I am not all that experienced in those operators.

I was even going to try to replace individually into separate columns, then follow that up with strcat() but that seems like a really extreme route and would take forever to account for all the unicode chars, as well as then strcatting them back to a single column.

Does anybody know of a good solution for this? Any help would be greatly appreciated. Thank you for reading.


Solution

  • I would strongly recommend fixing this issue pre-ingestion.

    print log = @'["\u0027"\u003e\u003csvg/onload=alert(document.domain)\u003e"]'
    | mv-apply parts = extract_all(@"(.*?)(\\u[[:xdigit:]]{4}|$)", log) on 
      ( 
        summarize fixed_log = array_strcat(make_list(strcat(parts[0], iff(isnotempty(parts[1]), unicode_codepoints_to_string(toint(strcat("0x", substring(parts[1], 2, 4)))), ""))), "")
      )
    
    +----------------------------------------------------------------+--------------------------------------------+
    |                              log                               |                 fixed_log                  |
    +----------------------------------------------------------------+--------------------------------------------+
    | ["\u0027"\u003e\u003csvg/onload=alert(document.domain)\u003e"] | ["'"><svg/onload=alert(document.domain)>"] |
    +----------------------------------------------------------------+--------------------------------------------+
    

    Fiddle