I've got an entity model like this:
class Article {
@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY)
public Set<UserPropertyValue<?>> getUserPropertyValues() {
return userPropertyValues;
}
}
@Entity
@Inheritance(strategy = InheritanceType.JOINED)
public abstract class UserPropertyValue<T> {
}
@Entity
public class IntegerUserPropertyValue extends UserPropertyValue<Integer> {
}
@Entity
public class StringUserPropertyValue extends UserPropertyValue<String> {
}
// about 10 more inheritors like this, removed for clarity
Now when I run article.getUserPropertyValues()
in Hibernate 5, I get this PostgreSQL query (cut for clarity):
select long_list_of_fields
from Article_UserPropertyValue userproper0_
inner join
UserPropertyValue userproper1_ on userproper0_.userPropertyValues_id = userproper1_.id
left outer join
IntegerUserPropertyValue userproper1_2_ on userproper1_.id = userproper1_2_.id
left outer join
StringUserPropertyValue userproper1_10_ on userproper1_.id = userproper1_10_.id
where userproper0_.Article_id = ?
On a database with about 2M UserPropertyValue records, this runs in 50ms, and uses index scans.
Now we're upgrading to Hibernate 6, and I get this query:
select long_list_of_fields
from Article_UserPropertyValue upv1_0
join
(
UserPropertyValue upv1_1
left join
IntegerUserPropertyValue upv1_3 on upv1_1.id = upv1_3.id
left join
StringUserPropertyValue upv1_11 on upv1_1.id = upv1_11.id
)
on upv1_1.id = upv1_0.userPropertyValues_id
where upv1_0.Article_id = ?
Notice how Hibernate is now joining on a subquery, instead of joining all child tables directly to the main table.
Something in the PostgreSQL query planner decides to do a sequential scan on table UserPropertyValue.
Does anyone know it there is some magic setting to revert back to the old behaviour?
To answer my own question; I didn't find a way to alter the Hibernate query, but I did find a setting in PostgreSQL that fixed the performance of the new join syntax.
Increasing join_collapse_limit
from the default of 8 to 50 brought the execution time in line with the old syntax. I have 12 joined tables (examples were edited for brevity), I assume this led the optimizer to use a sequential scan.