Is there a standard way to represent a "set" that can contain duplicate elements.
As I understand it, a set has exactly one or zero of an element. I want functionality to have any number.
I am currently using a dictionary with elements as keys, and quantity as values, but this seems wrong for many reasons.
Motivation: I believe there are many applications for such a collection. For example, a survey of favourite colours could be represented by: survey = ['blue', 'red', 'blue', 'green']
Here, I do not care about the order, but I do about quantities. I want to do things like:
survey.add('blue')
# would give survey == ['blue', 'red', 'blue', 'green', 'blue']
...and maybe even
survey.remove('blue')
# would give survey == ['blue', 'red', 'green']
Notes: Yes, set is not the correct term for this kind of collection. Is there a more correct one?
A list of course would work, but the collection required is unordered. Not to mention that the method naming for sets seems to me to be more appropriate.
You are looking for a multiset.
Python's closest datatype is collections.Counter
:
A
Counter
is adict
subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. TheCounter
class is similar to bags or multisets in other languages.
For an actual implementation of a multiset, use the bag
class from the collections-extended package on pypi. Note that this is for Python 3 only. If you need Python 2, here is a recipe for a bag
written for Python 2.4.