I have a gigantic db with a json field object storing around 30 sub-objects per entry.
Looks like that:
{
"field1": {"someOther": "stuffInside", [...]},
"field2": {"someOther": "stuffInside", [...]},
[...]
}
All these fields contain one field with a bigint. My objective is to be able to match the closest entry from a given int, for that purpose I made a function that does exactly that:
def get_closest_match(
field: str,
match: int,
model: django.db.models.Model,
return_all_match: bool = False,
sub_filter: dict or None = None,
):
log.debug(f"Finding nearest in {model.__name__} with {field} {match}")
result = model.objects.annotate(abs_diff=Func(F(field) - match, function='ABS'))
if sub_filter:
result = result.filter(**sub_filter)
result = result.order_by('abs_diff')
if not return_all_match:
return result.first()
return result
While this function works great for int fields, I wasn't able to make it compatible with jsonfields, is there a way to make my function work with an int inside a jsonfield ?
My first idea was to make a separate table storing only those bigints so I could use my method, which in fact works, but as mentionned above, each entry has around 30 objects and I have millions of entries so performance wise it is not very optimal (It also make data duplication which is not good)
I am actually super dumb, before trying the second table fix I tried to use my method above with a cast on the json field, which in fact worked but didn't return the closest match because the ordering was false.
After some big debuging here is the correct version of the said method with json field support:
def get_closest_match(
field: str,
match: int,
model: django.db.models.Model,
return_all_match: bool = False,
sub_filter: dict or None = None,
):
log.debug(f"Finding nearest in {model.__name__} with {field} {match}")
result = model.objects.annotate(abs_diff=Func(Cast(F(field), models.BigIntegerField()) - match, function='ABS'))
if sub_filter:
result = result.filter(**sub_filter)
result = result.order_by('abs_diff')
if not return_all_match:
return result.first()
return result
The issue was just that my Cast
was casting an Int instead of a BigInt... So yep completely works with JSONField if you use the correct syntaxing aka jsonobject__key_1__key_2
more info on django docs
PS: The method above is fully modular and can be used to perform the queries I need with JSON Fields, however I haven't tested other uses cases with regular ints, it shouldn't have any issues but in case you could always add a new arg that contains the desired field for the cast (or just make a condition to not cast if field is an int or smth), I don't see a scenario where this would be an issue but you never know