pythonrepr

What are the best practices for __repr__ with collection class Python?


I have a custom Python class which essentially encapsulate a list of some kind of object, and I'm wondering how I should implement its __repr__ function. I'm tempted to go with the following:

class MyCollection:
   def __init__(self, objects = []):
      self._objects = []
      self._objects.extend(objects)

   def __repr__(self):
      return f"MyCollection({self._objects})"

This has the advantage of producing a valid Python output which fully describes the class instance. However, in my real-wold case, the object list can be rather large and each object may have a large repr by itself (they are arrays themselves).

What are the best practices in such situations? Accept that the repr might often be a very long string? Are there potential issues related to this (debugger UI, etc.)? Should I implement some kind of shortening scheme using semicolon? If so, is there a good/standard way to achieve this? Or should I skip listing the collection's content altogether?


Solution

  • The official documentation outlines this as how you should handle __repr__:

    Called by the repr() built-in function to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned. The return value must be a string object. If a class defines __repr__() but not __str__(), then __repr__() is also used when an “informal” string representation of instances of that class is required.

    This is typically used for debugging, so it is important that the representation is information-rich and unambiguous.

    Python 3 __repr__ Docs

    Lists, strings, sets, tuples and dictionaries all print out the entirety of their collection in their __repr__ method.

    Your current code looks to perfectly follow the example of what the documentation suggests. Though I would suggest changing your __init__ method so it looks more like this:

    class MyCollection:
       def __init__(self, objects=None):
           if objects is None:
               objects = []
          self._objects = objects
    
       def __repr__(self):
          return f"MyCollection({self._objects})"
    

    You generally want to avoid using mutable objects as default arguments. Technically because of the way your method is implemented using extend (which makes a copy of the list), it will still work perfectly fine, but Python's documentation still suggests you avoid this.

    It is good programming practice to not use mutable objects as default values. Instead, use None as the default value and inside the function, check if the parameter is None and create a new list/dictionary/whatever if it is.

    https://docs.python.org/3/faq/programming.html#why-are-default-values-shared-between-objects

    If you're interested in how another library handles it differently, the repr for Numpy arrays only shows the first three items and the last three items when the array length is greater than 1,000. It also formats the items so they all use the same amount of space (In the example below, 1000 takes up four spaces so 0 has to be padded with three more spaces to match).

    >>> repr(np.array([i for i in range(1001)]))
    'array([   0,    1,    2, ...,  998,  999, 1000])'
    

    To mimic this numpy array style you could implement a __repr__ method like this in your class:

    class MyCollection:
       def __init__(self, objects=None):
          if objects is None:
              objects = []
          self._objects = objects
    
       def __repr__(self):
           # If length is less than 1,000 return the full list.
          if len(self._objects) < 1000:
              return f"MyCollection({self._objects})"
          else:
              # Get the first and last three items
              items_to_display = self._objects[:3] + self._objects[-3:]
              # Find the which item has the longest repr
              max_length_repr = max(items_to_display, key=lambda x: len(repr(x)))
              # Get the length of the item with the longest repr
              padding = len(repr(max_length_repr))
              # Create a list of the reprs of each item and apply the padding
              values = [repr(item).rjust(padding) for item in items_to_display]
              # Insert the '...' inbetween the 3rd and 4th item
              values.insert(3, '...')
              # Convert the list to a string joined by commas
              array_as_string = ', '.join(values)
              return f"MyCollection([{array_as_string}])"
    
    >>> repr(MyCollection([1,2,3,4]))
    'MyCollection([1, 2, 3, 4])'
    
    >>> repr(MyCollection([i for i in range(1001)]))
    'MyCollection([   0,    1,    2, ...,  998,  999, 1000])'