javabatch-processingjsr352jberet

jdbcItemReader support for partitioning?


Using jdbcItemReader from https://github.com/jberet/jberet-support and wanting to use partitioning to speed up processing.

The number of partition (here 16) has the side effect to duplicate data written. Each partition perform the same work on the same data instead of splitting up the input data set into n distinct partitions.

EDITED: the code below show the proper way to implement it. Attention: your SQL query need to returned ordered data: N partitions, here 16, means that 16 reader will run the query!

 <reader ref="jdbcItemReader">
                <properties>
                    <property name="beanType" value="java.util.Map"/>
                    <property name="sql" value="select ......"/>
                    <property name="url"
                              value="jdbc:oracle:thin:......"/>
                    <property name="user" value="......."/>
                    <property name="password" value="....."/>
                    <property name="columnMapping" value="xxxx, xxxx"/>
                    <property name="columnTypes" value="String,String"/>
                    <property name="start" value="#{partitionPlan['partition.start']}"/>
                    <property name="end" value="#{partitionPlan['partition.end']}"/>
                    <!--CONCUR_READ_ONLY: If you set this as a value of the concurrency while creating the ResultSet object you cannot update the contents of the ResultSet you can only read/retrieve them.-->
                    <!--CONCUR_UPDATABLE: If you set this as a value of the concurrency while creating the ResultSet object you can update the contents of the ResultSet.-->
                    <!--TYPE_SCROLL_SENSITIVE: ResultSet is sensitive to the changes that are made in the database i.e. the modifications done in the database are reflected in the ResultSet.-->
                    <property name="resultSetProperties"
                              value="fetchSize=5500, resultSetConcurrency=CONCUR_READ_ONLY,
                              fetchDirection=FETCH_REVERSE,
                              resultSetType=TYPE_SCROLL_SENSITIVE,
                              resultSetHoldability=HOLD_CURSORS_OVER_COMMIT"/>
                </properties>
            </reader>
            <processor ref="myProcessort"/>
            <writer ref="myWriter"/>
        </chunk>

<!-- run your sql with a count to define partitions evenly -->
        <partition>
            <plan partitions="16" threads="16">
                <properties partition="0">
                    <property name="partition.start" value="0"/>
                    <property name="partition.end" value="500"/>
                </properties>
                <properties partition="1">
                    <property name="partition.start" value="500"/>
                    <property name="partition.end" value="1000"/>
                </properties>
       <!-- ... -->
                <properties partition="15">
                    <property name="partition.start" value="5000"/>
                    <property name="partition.end" value="5500"/>
                </properties>

Solution

  • You need to define your step partition as you did in your 2nd XML snippet. Then define start and end properties in your jdbcItemReader, and these 2 properties reference the partition properties partition.start and partition.end respectively.

    The 2 partition properties can be named differently, as long as they are consistent in partition and item-reader elements.

    For example,

    <reader ref="jdbcItemReader">
      <properties>
        <property name="start" value="#{partitionPlan['partition.start']}"/>
        <property name="end" value="#{partitionPlan['partition.end']}"/>
    </properties>
    </reader>