When processing large amounts of data I often find myself doing the following:
HashSet<String> set = new HashSet<String> ();
//Adding elements to the set
ArrayList<String> list = new ArrayList<String> (set);
Something like "dumping" the contents of the set in the list. I usually do this since the elements I add often contain duplicates I want to remove, and this seems like an easy way to remove them.
With only that objective in mind (avoiding duplicates) I could also write:
ArrayList<String> list = new ArrayList<String> ();
// Processing here
if (! list.contains(element)) list.add(element);
//More processing here
And thus no need for "dumping" the set into the list. However, I'd be doing a small check before inserting each element (which I'm assuming HashSet does as well)
Is any of the two possibilities clearly more efficient?
The set will give much better performance (O(n)
vs O(n^2)
for the list), and that's normal because set membership (the contains
operation) is the very purpose of a set.
Contains for a HashSet
is O(1)
compared to O(n)
for a list, therefore you should never use a list if you often need to run contains
.