javadictionarysettime-complexitydisjoint-sets

LinkedHashMap vs LinkedHashSet for retrieving specific elements & retrieving in insertion order


I'm doing a problem that clearly requires usages of Set but I need to retrieve elements from a set in insertion order + retrieve specific elements.

But finding the specific element was too slow( I guess O(n) since I have to go through the entire set to find & return it ).

So I went with a LinkedHashMap<someClass,someClass> where the key-value mappings contained the same objects.

Although this was faster, it used up twice as much memory which is especially concerning if my key/value(both same anyways) happened to take up alot of space.

I'm hoping if anyone has a better solution to my problem or optimizations.

Edit: By the way the comments for this SO answer may help

Edit:

 public Set<CompanyDummy> run(int numberOfCompanies) 
 {
        Set<CompanyDummy> companies=new LinkedHashSet<>();
        
        //Create companies
        CompanyDummy company;
        for(int idx=0;idx<=numberOfCompanies-1;idx++)
        {
            company=new CompanyDummy(idx);
            companies.add(company);
        }
        
        
        //Some code here to fill up each CompanyDummy element with engineers
        
        //At this point,there will be input 
        //specifying which companies to merge via their index(not reference)
        //Problem guarantees those companies exist. Hence, my reason why 
        //I didn't  do something like
        //if (set.contains(value)) return value;
        
        //Do note when we merge companies u & v, v is closed down
        
        for(int idx=0;idx<=transactions-1;idx++)
        {
            companyID= scanner.nextInt();
            anotherCompanyID= scanner.nextInt();
            
            //This part is where I search through companies to find if one exists
            //which is O(n)
            //Replacing sets with maps somehow makes the program faster
            //despite LinkedHashSet being backed by LinkedHashMap
            company=findCompany(companies, companyID);
            anotherCompany=findCompany(companies, anotherCompanyID);
            
            if(company!=null && anotherCompany!=null)
            {
                company.union(anotherCompany);
                companies.remove(anotherCompany);
            }
            

        }
 }
 
 private CompanyDummy findCompany(Set<CompanyDummy> companies,int id)
 {
        
        for(CompanyDummy company : companies)
        {
            if(company.getIndex()==id)
            {
                return company;
            }
        }
        
        return null;
  }

}


class CompanyDummy
{
private int index;
private Set<Integer> engineers;//The Integer here just denotes the engineer

public  CompanyDummy(int index) 
{
    this.index=index;
}

public int getindex()
{
    return index;
}

public void union(CompanyDummy someCompany)
{
    this.engineers.addAll(someCompany.engineers);
}

}


Solution

  • OK, so now that I see the code, I can understand the problem well enough to give you a solution.

    You are not simply retrieving a specific element from the set. You are actually retrieving an element with a specific value in a specific field if it is a member of the set. That's why you can't use the set's (or map's) fast lookup operations1.

    Solution:

    If you want to do the retrieval in better than O(N), you need a Map<Integer, CompanyDummy>. The map must be populated so that it maps from an id to the CompanyDummy with that id as its index. Then you can replace the findCompany calls with Map.get calls.

    Note that replacing a Set<CompanyDummy> with a Map<CompanyDummy, CompanyDummy> will not help for this problem. Both will give you O(N) performance.


    You say this about your experiment with replacing a LinkedHashSet<CompanyDummy> with a LinkedHashMap<CompanyDummy, CompanyDummy>:

    Although this was faster, it used up twice as much memory which is especially concerning if my key/value(both same anyways) happened to take up a lot of space.

    I doubt that either of those things were actually true.

    1. Space utilization of the two data structures should be identical, given that that a LinkedHashSet is a actually LinkedHashMap under the covers.

    2. It is (IMO) unlikely that there was a significant real difference in performance. Certainly not a 2-fold difference. The most likely explanation is that your methodology for measuring and comparing performance does take proper account of possible timing distortions caused by (various) JVM startup overheads; e.g. class loading, heap resizing and garbage collection and JIT compilation.


    1 - Just for completeness, if it wasn't a requirement that your set / map is ordered in insertion order, you could use a TreeSet<CompanyDummy> with a custom Comparator that orders CompanyDummy objects by their index values. You could then use OrderedSet.ceiling with a dummy CompanyDummy instances to probe the set in O(logN).