javahibernatejpajoinjpql

How to use multiple JOIN FETCH in one JPQL query


I have below entities:

public class Category {
   private Integer id;

   @OneToMany(mappedBy = "parent")
   private List<Topic> topics;
}

public class Topic {
   private Integer id;

   @OneToMany(mappedBy = "parent")
   private List<Posts> posts;

   @ManyToOne
   @JoinColumn(name = "id")
   private Category parent;
}

public class Post {
   private Integer id;

   @ManyToOne
   @JoinColumn(name = "id")
   private Topic parent;
   /* Post fields */
}

and I want to fetch all categories with joined topics and joined posts using JPQL query. I wrote query like below:

SELECT c FROM Category c
JOIN FETCH c.topics t
JOIN FETCH t.posts p WHERE 

But I got the error

org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags

I found articles about this error, but these articles only describe situation where in one entity are two collections to join. My problem is a little different and I don't know how to solve it.

It is possible to do it in one query?


Solution

  • Considering we have the following entities:

    JPA entity domain model

    And you want to fetch some parent Post entities along with all the associated comments and tags collections.

    If you are using more than one JOIN FETCH directives:

    List<Post> posts = entityManager.createQuery("""
        select p
        from Post p
        left join fetch p.comments
        left join fetch p.tags
        where p.id between :minId and :maxId
        """, Post.class)
    .setParameter("minId", 1L)
    .setParameter("maxId", 50L)
    .getResultList();
    

    Hibernate will throw the MultipleBagFetchException:

    org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags [
      com.vladmihalcea.book.hpjp.hibernate.fetching.Post.comments,
      com.vladmihalcea.book.hpjp.hibernate.fetching.Post.tags
    ]
    

    The reason why Hibernate throws this exception is that it does not allow fetching more than one bag because that would generate a Cartesian product.

    The worst "solution" others might try to sell you

    Now, you will find lots of answers, blog posts, videos, or other resources telling you to use a Set instead of a List for your collections.

    That's terrible advice. Don't do that!

    Using Sets instead of Lists will make the MultipleBagFetchException go away, but the Cartesian Product will still be there, which is actually even worse, as you'll find out the performance issue long after you applied this "fix".

    To prove this, let's assume we have the following entity counts:

    public static final int POST_COUNT = 50;
    public static final int POST_COMMENT_COUNT = 20;
    public static final int TAG_COUNT = 10;
    

    And we changed the collections from List to Set and managed to run this query:

    Query jpqlQuery = entityManager.createQuery("""
        select p
        from Post p
        left join fetch p.comments
        left join fetch p.tags
        where p.id between :minId and :maxId
        """, Post.class)
    .setParameter("minId", 1L)
    .setParameter("maxId", 50L);
    
    List<Tuple> posts = entityManager.createNativeQuery(
        SQLExtractor.from(jpqlQuery)
        , Tuple.class)
    .setParameter(1, 1L)
    .setParameter(2, 50L)
    .getResultList();
    
    assertEquals(POST_COUNT * POST_COMMENT_COUNT * TAG_COUNT, posts.size());
    

    The SQL result set will contain 50 X 20 x 10 = 10,000 records. That's the implicit Cartesian Product that you should avoid.

    You can use this test case to see how fetching multiple Set collections generates the Cartesian Product.

    The one-query solution

    The only way to fetch multiple relations in a single SQL query without generating an implicit Cartesian Product is to use MULTISET.

    This feature is offered by jOOQ or Blaze Persistence.

    Sinc the question is about Hibernate, then you can use Blaze Persistence.

    First, you will have to define several EntityView interfaces that map the entity properties you are interested in fetching:

    @EntityView(Post.class)
    public interface PostView {
        @IdMapping
        Long getId();
    
        String getTitle();
    }
    
    @EntityView(PostComment.class)
    public interface PostCommentView {
        @IdMapping
        Long getId();
    
        String getReview();
    }
    
    @EntityView(Tag.class)
    public interface TagView {
        @IdMapping
        Long getId();
    
        String getName();
    }
    
    @EntityView(Post.class)
    public interface PostWithCommentsAndTagsView 
            extends PostView {
    
        @Mapping(fetch = MULTISET)
        List<PostCommentView> getComments();
    
        @Mapping(fetch = MULTISET)
        List<TagView> getTags();
    }
    

    Afterward, we need to define the Blaze Persistence CriteriaBuilderFactory, EntityViewConfiguration, and EntityViewManager:

    CriteriaBuilderConfiguration config = Criteria.getDefault();
    CriteriaBuilderFactory criteriaBuilderFactory = config
        .createCriteriaBuilderFactory(
            entityManagerFactory()
        );
    
    EntityViewConfiguration entityViewConfiguration = EntityViews
        .createDefaultConfiguration()
        .addEntityView(PostView.class)
        .addEntityView(PostCommentView.class)
        .addEntityView(TagView.class)
        .addEntityView(PostWithCommentsAndTagsView.class);
    
    EntityViewManager entityViewManager = entityViewConfiguration
        .createEntityViewManager(criteriaBuilderFactory);
    

    With these classes in place, our query looks as follows:

    List<PostWithCommentsAndTagsView> posts = doInJPA(entityManager -> {
        return entityViewManager.applySetting(
            EntityViewSetting.create(
                PostWithCommentsAndTagsView.class
            ),
            criteriaBuilderFactory.create(
                entityManager, 
                Post.class
            )
        )
        .where(Post_.ID)
        .betweenExpression(":minId")
        .andExpression(":maxId")
        .setParameter("minId", 1L)
        .setParameter("maxId", 50L)
        .getResultList();
    });
    
    for(PostWithCommentsAndTagsView post : posts) {
        assertEquals(POST_COMMENT_COUNT, post.getComments().size());
        assertEquals(TAG_COUNT, post.getTags().size());
    }
    

    When executing this Criteria query, Blaze Persistence generates the following SQL query on PostgreSQL:

    select 
        p1_0.id,
        (
            select json_agg(
                json_build_object(
                    'f0', '' || c1_0.id, 'f1', '' || c1_0.review
                )
            )
            from post_comment c1_0
            where p1_0.id = c1_0.post_id
        ),
       (
        select 
            json_agg(
                json_build_object(
                    'f0', '' || t1_0.tag_id, 'f1', '' || t1_1.name
                )
            )
            from post_tag t1_0
            join tag t1_1 on t1_1.id = t1_0.tag_id
            where p1_0.id = t1_0.post_id
        ),
       p1_0.title
    from post p1_0
    where p1_0.id between 1` and 50
    

    Compared to the Cartesian Product, this query will return a result set containing only 50 records because each post will have the comments and tags aggregated as JSON arrays.

    The reason why the Cartesian Product is avoided is due to the aggregation done in JSON, which allows us to return the comments and the tags as JSON arrays that Blaze Persistence will map on the List<PostCommentView> and List<TagView>.

    This test case on GitHub shows you how the MULTISET solution works.

    The two-query solution

    You can do the following trick:

    List<Post> posts = entityManager.createQuery("""
        select p
        from Post p
        left join fetch p.comments
        where p.id between :minId and :maxId
        """, Post.class)
    .setParameter("minId", 1L)
    .setParameter("maxId", 50L)
    .getResultList();
    
    posts = entityManager.createQuery("""
        select p
        from Post p
        left join fetch p.tags t
        where p.id between :minId and :maxId
        """, Post.class)
    .setParameter("minId", 1L)
    .setParameter("maxId", 50L)
    .getResultList();
    
    assertEquals(POST_COUNT, posts.size());
    

    Unlike the query that generated the Cartesian Product, this solution will use two queries that fetch 50 x 20 and 50 x 10 records. So, instead of fetching 10,000 records, we only fetch 1000 + 500 = 1,500 records.

    The two-query solution uses the Hibernate 6 syntax, which allows you to avoid the use of distinct when using a JOIN FETCH.

    For Hibernate 5, check out this article for more details about how to use it properly for such queries.

    As long as you fetch at most one collection using JOIN FETCH, you will be fine.

    By using multiple queries, you will avoid the Cartesian Product since any other collection but the first one is fetched using a secondary query.

    This test case on GitHub shows you how the two-query solution works.

    Always avoid the FetchType.EAGER strategy

    If you're using the FetchType.EAGER strategy at mapping time for @OneToMany or @ManyToMany associations, then you could easily end up with a MultipleBagFetchException.

    You are better off switching from FetchType.EAGER to Fetchype.LAZY since eager fetching is a terrible idea that can lead to critical application performance issues.

    What about multi-level collections that are nested?

    Both the MULTISET and the N-query solution will work for nested collections.

    Here's a test case on GitHub that creates:

    The demo for the N-query solution is available here and the multi-level fetching is one with 3 entity queries.

    The MULTISET solution is available here.

    So, these solutions work no matter where the collection is located relative to the parent entity.

    Conclusion

    Avoid FetchType.EAGER and don't switch from List to Set just because doing so will make Hibernate hide the MultipleBagFetchException under the carpet. Fetch just one collection at a time, and you'll be fine.

    As long as you do it with the same number of queries as you have collections to initialize, you are fine. Just don't initialize the collections in a loop, as that will trigger N+1 query issues, which are also bad for performance.