javahibernateh2scrollableresults

ScrollableResultSet.next() gradually slows down when using Session.save() whilst scrolling


I am using a ScrollableResults object to scroll through about 500,000 to 1,000,000 rows from a table. Whilst scrolling I create a different entity using the resulting entity from each iteration and use session.save() to persist this object. Below is example code, where the real code is more complex but essentially doing the same thing.

Session = getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
ScrollableResults results = session.createQuery("from Foo_Table f join f.bars b")
    .scroll(ScrollMode.FORWARD_ONLY);
int i = 0;
while(results.next())
{
    Foo foo = (Foo) results.get(0);
    Bar bar = new Baz(foo);
    bar.setFoo(foo);

    session.save(bar)

    if(i % 50 == 0)
    {
        session.flush();
        session.clear();
    }
}

tx.commit();
session.close();

The important entities:

@Entity
@Table(name = "FOO_TABLE")
public class Foo_Entity implements Serializable {
    @Id    
    @Column(name = "Foo_ID", nullable=false)
    private String id;

    @OneToMany(fetch = FetchType.EAGER, //FetchType.LAZY fixes the slow down 
               mappedBy = "fooParent", cascade = CascadeType.ALL)
    private Set<Bar> bar_entities = new HashSet<>(0);
}

@Entity
@Table(name = "BAR_TABLE")
public class Bar_Entity implements Serializable {
    @Id
    @GeneratedValue
    @Column(name="Id")
    private Long id;

    @ManyToOne
    @JoinColumn(name="foo_pk")
    private Foo fooParent;

    // setFoo, getFoo...

}

When I time this transaction, the running time starts at about 100ms per 500 iterations but gradually rises to a few seconds per 500 iterations after about 20,000 iterations. As a result the transaction has extremely poor performance. The only line of code which is taking any time is the results.next(), which gradually takes longer and longer to execute.

The issue is resolved if I change the fetch type for Bar entities in Foo from eager to lazy. I don't understand why using an eager fetch type for a set that is not yet filled causes problem with scrolling through entities that contain the relationship. The set is indeed filled during scrolling on session.flush(), but in my scenario the set is typically filled with only one to two elements, which is why I would prefer to have this fetch type as eager.

Does anyone know why this slow down happens for this particular scenario?

Note that this question was first posted before I realised that changing the fetch type solved the problem, so the question has now shifted from "How can I fix this" to "why is this a problem?"


Solution

  • Missing index on the BAR_TABLE.foo_pk column would slow things down with an eager fetch since a full table scan would be performed to load the BAR entity associated with each FOO entity,