pythondjangodjango-modelsdjango-orm

Can I reuse output_field instance in Django ORM or I should always create a duplicate?


I have a Django codebase that does a lot of Case/When/ExpressionWrapper/Coalesce/Cast ORM functions and some of them sometimes need a field as an argument - output_field.

from django.db.models import FloatField, F
some_param1=Sum(F('one_value')*F('second_value'), output_field=FloatField())
some_param2=Sum(F('one_value')*F('second_value'), output_field=FloatField())
some_param3=Sum(F('one_value')*F('second_value'), output_field=FloatField())
some_param4=Sum(F('one_value')*F('second_value'), output_field=FloatField())
some_param5=Sum(F('one_value')*F('second_value'), output_field=FloatField())

Sometimes I find myself wondering why am I always creating the same instance of any Field subclass over and over again. Is there any difference if I just pass one instance and share it between expressions? F.E

from django.db.models import FloatField, F

float_field = FloatField()

some_param1=Sum(F('one_value')*F('second_value'), output_field=float_field)
some_param2=Sum(F('one_value')*F('second_value'), output_field=float_field)
some_param3=Sum(F('one_value')*F('second_value'), output_field=float_field)
some_param4=Sum(F('one_value')*F('second_value'), output_field=float_field)
some_param5=Sum(F('one_value')*F('second_value'), output_field=float_field)

I coulnd't find it in a documentation and the source code is not documented well regarding this parameter.

P.S. The example is fake, just imagine a big annotate function that does a lot of processing using Case/When/ExpressionWrapper/Coalesce/Cast and has a lot of duplicated Field instances as output_field.


Solution

  • You can reuse the field. Using this output_field=… [Django-doc] serves two purposes:

    1. the type sometimes requires specific formatting, typically for GIS columns, since a point, polygon, etc. needs to be converted to text so that Django can understand it; and
    2. to know what lookups transformations, etc. can be applied on it.

    Indeed, if we use:

    queryset = queryset.annotate(
        some_param1=Sum(
            F('one_value') * F('second_value'), output_field=CharField()
        )
    )

    then Django will assume that some_param1 is a CharField (here this does not make much sense), and thus you can use:

    queryset.filter(some_param1__lower='a')

    since __lower is defined as a lookup on a CharField. But for a FloatField, it does not make much sense.

    But the field is not specialized or altered. It is thus more of a "signal" object to specify what can be done with it.

    That being said, I don't see much reasons to convert code to prevent constructing a FloatField. If we use %timeit, we get:

    In [1]: %timeit FloatField()
    3.82 µs ± 379 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    

    So the construction takes approximately 3.82 microseconds. Typically a view has a lot more work to do than that, so writing a query that is itself more efficient, or saving a roundtrip to the database, will (very) likely outperform any optimization with respect to saving a FloatField by a few magnitudes.