pythoncollectionsdictionaryset

Python "set" with duplicate/repeated elements


Is there a standard way to represent a "set" that can contain duplicate elements.

As I understand it, a set has exactly one or zero of an element. I want functionality to have any number.

I am currently using a dictionary with elements as keys, and quantity as values, but this seems wrong for many reasons.

Motivation: I believe there are many applications for such a collection. For example, a survey of favourite colours could be represented by: survey = ['blue', 'red', 'blue', 'green']

Here, I do not care about the order, but I do about quantities. I want to do things like:

survey.add('blue')
# would give survey == ['blue', 'red', 'blue', 'green', 'blue']

...and maybe even

survey.remove('blue')
# would give survey == ['blue', 'red', 'green']

Notes: Yes, set is not the correct term for this kind of collection. Is there a more correct one?

A list of course would work, but the collection required is unordered. Not to mention that the method naming for sets seems to me to be more appropriate.


Solution

  • You are looking for a multiset.

    Python's closest datatype is collections.Counter:

    A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

    For an actual implementation of a multiset, use the bag class from the collections-extended package on pypi. Note that this is for Python 3 only. If you need Python 2, here is a recipe for a bag written for Python 2.4.