lets say i want to create a new column fruits
, where its all the fruits mentioned in the column description. how would i do this efficiently?
let FruitLogs = datatable(Timestamp: datetime, Description: string)
[
datetime(2024-07-01T10:00:00Z), "I love eating Apple and Banana for breakfast.",
datetime(2024-07-01T12:00:00Z), "Cherry and Date are great for snacks.",
datetime(2024-07-01T15:00:00Z), "I had a Fig and Grape salad for lunch.",
datetime(2024-07-01T18:00:00Z), "For dessert, I enjoyed Mango and Nectarine."
];
A possible solution is to define the keywords and use regex.
Is this the fastest possible way to do this on large datasets ?
// Define a dynamic array of possible fruits
let possibleFruits = dynamic([
"Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig",
"Grape", "Honeydew", "Kiwi", "Lemon", "Mango", "Nectarine",
"Orange", "Papaya", "Quince", "Raspberry", "Strawberry",
"Tangerine", "Ugli fruit", "Vanilla", "Watermelon", "Xigua", "Yam", "Zucchini"
]);
let FruitLogs = datatable(Timestamp: datetime, Description: string)
[
datetime(2024-07-01T10:00:00Z), "I love eating Apple and Banana for breakfast.",
datetime(2024-07-01T12:00:00Z), "Cherry and Date are great for snacks.",
datetime(2024-07-01T15:00:00Z), "I had a Fig and Grape salad for lunch.",
datetime(2024-07-01T18:00:00Z), "For dessert, I enjoyed Mango and Nectarine."
];
// Create the regex pattern from the possible fruits array
let pattern = strcat('(?:\\b(', strcat_array(possibleFruits, '|'), ')\\b)');
// Use the pattern to extract mentions of fruits from the Description column
FruitLogs
| extend mentionedFruits = extract_all(pattern, dynamic([]), Description)