sqlindexingbcpsybase-ase15

PCI IN slow with cluster index


We are working with ASE SYBASE 15.7

We have a huge historical table: 700 millions of records and one CLUSTER INDEX: c1,c2,c3,c4 with ALLOW_DUP_ROWS

Every month we load 700,000 records en the table via BCP IN These records had a lot of duplicate records because fields: c2 and c3 had the same value for all records The time of BCP IN was 6 hours

We changed the logical of INSERT's in the monthly table and now all of fields of CLUSTER INDEX: c1,c2,c3,c4 are diferents.

After this change, the BCP IN process takes 2 hours!

Why the time now is lower? We are not clear about the reasons

Thanks a lot!

Rod


Solution

  • I'm guessing your table's locking scheme is allpages.

    If a clustered index on a allpages table allows dup keys/rows, the dup key/row entries are maintained in what's known as overflow pages (while from an older manual, this info on overflow pages is still accurate).

    Each time a dup key/row is inserted into the table, the entire chain of overflow pages is scanned from beginning to end, and then the new entry is added on the end of the chain.

    As you can imagine, as this chain grows longer and longer, it takes more and more time for each successive insert to occur.

    If you have a history of your data loads I'm guessing you'll find that over time, as the volume of (dup) rows grew, so did the load times ... and this would've been attributable to the greater amount of time required to scan ever increasing overflow page chains.

    By making the new data 'unique' you've likely eliminated most of the overhead for scanning overflow page chains. I say 'likely' because it's not clear from your post if some of the new data could contain key values that already exist in the table (ie, the keys may be unique within a given batch of data to be inserted, but the keys aren't necessarily unique within the table).

    I usually recommend that clients do NOT place non-unique clustered indexes on allpages locked tables solely because of the overhead for overflow page chain processing.


    Chances are that if you eliminated the non-unique clustered index then you may see your data loads run even quicker, eg: