kqlazure-data-explorerazure-log-analytics

KQL: Every time a word is mentioned in a column append it to a new column as a list


lets say i want to create a new column fruits, where its all the fruits mentioned in the column description. how would i do this efficiently?

let FruitLogs = datatable(Timestamp: datetime, Description: string)
[
    datetime(2024-07-01T10:00:00Z), "I love eating Apple and Banana for breakfast.",
    datetime(2024-07-01T12:00:00Z), "Cherry and Date are great for snacks.",
    datetime(2024-07-01T15:00:00Z), "I had a Fig and Grape salad for lunch.",
    datetime(2024-07-01T18:00:00Z), "For dessert, I enjoyed Mango and Nectarine."
];

expected results: enter image description here


Solution

  • A possible solution is to define the keywords and use regex.

    Is this the fastest possible way to do this on large datasets ?

    // Define a dynamic array of possible fruits
    let possibleFruits = dynamic([
        "Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig", 
        "Grape", "Honeydew", "Kiwi", "Lemon", "Mango", "Nectarine",
        "Orange", "Papaya", "Quince", "Raspberry", "Strawberry",
        "Tangerine", "Ugli fruit", "Vanilla", "Watermelon", "Xigua", "Yam", "Zucchini"
    ]);
    let FruitLogs = datatable(Timestamp: datetime, Description: string)
    [
        datetime(2024-07-01T10:00:00Z), "I love eating Apple and Banana for breakfast.",
        datetime(2024-07-01T12:00:00Z), "Cherry and Date are great for snacks.",
        datetime(2024-07-01T15:00:00Z), "I had a Fig and Grape salad for lunch.",
        datetime(2024-07-01T18:00:00Z), "For dessert, I enjoyed Mango and Nectarine."
    ];
    // Create the regex pattern from the possible fruits array
    let pattern = strcat('(?:\\b(', strcat_array(possibleFruits, '|'), ')\\b)');
    // Use the pattern to extract mentions of fruits from the Description column
    FruitLogs
    | extend mentionedFruits = extract_all(pattern, dynamic([]), Description)