I'm wondering whether I can get a consensus on which method is the better approach to creating a distinct set of elements: a C# HashSet or using IEnumerable's .Distinct(), which is a Linq function?
Let's say I'm looping through query results from the DB with DataReader, and my options are to add the objects I construct to a List<SomeObject> or to a HashSet<SomeObject> With the List option, I would wind up having to do something like:
myList = myList.Distinct().ToList<SomeObject>();
With the HashSet, my understanding is that adding elements to it takes care of the non-duplication by itself, assuming you've overrided the GetHashCode() and Equals() methods in SomeObject. I'm concerned mainly with the risks and performance aspects of the options.
Thanks.
"Better" is a tricky word to use - it can mean so many different things to different people.
For readability, I would go for Distinct() as I personally find this more comprehensible.
For performance, I suspect a hand-crafted HashSet implementation might perform mildly quicker - but I doubt it would be very different as the internal implementation of Distinct will no doubt itself use some form of hashing.
For what I think of as "best" implementation... I think you should use Distinct but somehow push this down to the database layer - i.e. change the underlying database SELECT before you fill the DataReader.