In extension to below url previous post
materialized view || Low memory condition || ADX || KQL
I am trying to create Materialized view on top of another materialized view but facing below error.
followed below document but not sure where I am doing wrong, need your help how to over come this issue.
Below is my code, creating materialized view on top another materialized view.
.create table MySourceTable(col1:int, col2:int, Value:int, EventUTCDateTime:datetime);
.set-or-append MySourceTable <|
let MySourceTable = datatable(col1:int, col2:int, Value:int, EventUTCDateTime:datetime)
[
1,2,10,'2022-01-01 01:00:01',
1,2,10,'2022-01-01 01:00:01',
1,2,30,'2022-01-01 01:05:01',
4,5,40,'2022-01-01 01:30:01'
];
MySourceTable;
creating Materialized view for removing duplicates from source table.
.create async materialized-view with ( backfill=true, effectiveDateTime=datetime(2022-01-01))
DeduplicatedTable on table MySourceTable
{
MySourceTable
| project col1, col2, Value, EventUTCDateTime
| distinct *
| extend EventUTCDate = bin(EventUTCDateTime, 30m)
| project col1, col2, Value, EventUTCDate
| summarize take_any(*) by col1, col2,Value, EventUTCDate
};
Once removed duplicates, need to apply aggregations by using materialized view on top of another materialized view
.create materialized-view DailyUsage on materialized-view DeduplicatedTable
{
DeduplicatedTable
| summarize count(), dcount(col1) by Day=bin(EventUTCDate, 1d)
};
Getting below error while creating above statement, creating materialized view on top of another materialized view.
Error
Cannot create materialized view 'DailyUsage': Materialized view can only be created on top of another materialized view which includes a single any()/anyif()/take_any()/take_anyif() aggregation.
This looks like a bug, caused when the take_any(*)
doesn't actually contain any columns, since all are included in the group by keys (so it's actually removed by the optimizer). In the meanwhile, as said in other post, if there's a reduced set of columns which still identify the record as unique, it's better to include only that minimal set. For example: if col1, col2, EventUTCDate
are sufficient for deduplication, then it's better to define the view as T | summarize take_any(Value) by col1, col2, EventUTCDate
. This will be much more efficient. If you must include all columns in group by keys, then you can workaround the current error by defining T | summarize take_any(1) by col1, col2, Value, EventUTCDate
. This will add another column (with constant value 1) to the view, which you can project-away
during query time. This should workaround the current issue.
In addition - you should remove the distinct *
from the 1st view. It's not needed since the deduplication is done in the take_any()
aggregation, and the view will perform much better without it.