duplicates kql azure-data-explorer kusto-explorer

Deduplicate Records in Kusto/Survive Non-Null Values in Case of Duplicates

I have a table in Kusto where I'm trying to deduplicate. Here is the table structure:

column1	column2	column3	column4
1234	business	orange	paypal
1234	business	orange
3456	business	apple
4845	business	banana	venmo

I'm struggling to deduplicate this table so in case of duplicates per combination of column1 , column2 , column3, only non-null values of the column4 survives. please note, in some cases, like column1 == 3456, there is a single records per column1, column2, and column3 where the column4 is in fact null and those records must survive.

tbl
| distinct column1 , column2 , column3, column4
| sort by column4  desc nulls last , column1 , column2 , column3
| extend priority_ = row_number(1, prev(column1) != column1 and prev(column2) != column2 and prev(column3) != column3 )
| where priority_ == 1

Because I don't know how to deprioritize the null values of column4, this returns incorrect results. Can you help me figure out how to correctly survive the non-null values in case there are duplicates?

Solution

Check if this works for you:

datatable(column1:string, column2:string, column3:string, column4:string)
[
    "1234", "business", "orange", "paypal",
    "1234", "business", "orange", "",
    "3456", "business", "apple", "",
    "4845", "business", "banana", "venmo"
]
| summarize make_set(column4) by column1, column2, column3 
// check whether there's only a single element for {column1,column2,column3} combination
| extend single_element = array_length(set_column4) == 1
| mv-expand colummn4 = set_column4 to typeof(string)
// include only non empty or combinations with only a single column4 value (even if it's empty)
| where single_element or isnotempty(colummn4)
| project-away set_column4, single_element

column1	column2	column3	colummn4
1234	business	orange	paypal
3456	business	apple
4845	business	banana	venmo