exact-onlineinvantive-sqlinvantive-dap

Exact Online query with joins runs more than 15 minutes


I am using the following query:

set use-result-cache false

set use-http-cache false

create or replace table settings@inmemorystorage
as
select '29676ec4-61b5-45eb-a5a3-6feffe03d1d3' sor_id
,      '[[Exploded]]' exploded_signal_text
,      '{res:itgen_eol_sales_order}' entity_name_singular
,      '{res:itgen_eol_sales_orders}' entity_name_plural
from   me

select ...
from   settings@inmemorystorage stg
left
outer
join   ExactOnlineREST..salesorders sor 
on     sor.orderid = stg.sor_id
left 
outer
join   ExactOnlineREST..salesorderlines soe
on     soe.orderid = stg.sor_id
left                                                
outer
join   BillOfMaterialItemDetails bom 
on     bom.billofmaterialitemdetails_billofmaterial_item_id_attr_guid = soe.item
left 
outer
join   ExactOnlineREST..items itm
on     itm.ID = bom.item_id_attr_guid
left 
outer
join   ExactOnlineREST..itemsread itr
on     itr.code = bom.item_code_attr 
where  sor.orderid is not null
and    itm.class_10 in ('A', 'D', 'M', 'S', 'Z')

to retrieve data from Exact Online. In my test environment it runs approximately 1 second to apply the Bill of Material explosion on a sales order (approximately 5 reads on the XML and REST API of Exact Online). However, on a customer site it runs more than 15 minutes. It seems to be related to the retrieval of the items (articles) used in the Bill of Material; in my test environment there are approximately 100 items, whereas the customer site has 250.000 items.

However, this query is used in an interactive program and should run within 2,5 seconds.

I've tried to combine itemsread and items to restrict the items retrieved, but they have different fields which are needed from both tables.

How can I optimize this query to run faster with a large volume of data?


Solution

  • The problem is within the second query: there are many items and the APIs of Exact Online have a throughput of maybe 300 items per seconds. So this is gone take forever without changes.

    There are two alternative routes:

    1. Optimize query
    2. Use caching

    The query optimization ensures great performance and little resource usage on first and subsequent use. The use of caching improves response time on second use, but requires more resources than an optimized query.

    Optimize Exact Online query

    To optimize the query, you will need to instruct the optimizer how to handle the joins more correctly since there are no statistics and referential data available by default on Exact Online. I would add the following hints:

    select /*+ join_set(soe, orderid, 100) join_set(itm, id, 100) join_set(itr, code, 100) */ ...
    from   settings@inmemorystorage stg
    ...
    

    The first hint join_set(soe, orderid, 100) instructs the optimizer to change the join algorithm from hash joins to a loop by index for the join with soe on orderid (right hand side) when there are at most 100 rows returned from the previous step in the execution path. In this case, there will be exactly one row returned from settings. The same holds for the join on itm and itr.

    For a large Exact Online environment, this will ensure that you have always 5 lookups when there are less than 60 lines on a sales order. That typically takes 1 second.

    Use caching (Data Cache)

    When you configure a PostgreSQL, SQL Server, Oracle or MySQL database as a Invantive Data Cache backing database provider, you can have the outcome of parts of the queries memorized in a normal database. The optimizer automatically queries this normal database using ANSI SQL when the data cache is still sufficiently "fresh".

    For instance:

    select /*+ ods(true, interval '7 days') */ ...
    

    tells the optimizer to use the data cache for all Exact Online data when the data was placed in the data cache not more than 7 days ago. When it is older than 7 days, it creates a new version, stores it in the data cache and uses it.

    When you need near real-time changes in Exact Online to be included, you will have to configure data replication. It then retrieves through web hooks all insert/update/delete events and applies them on your data cache. But the cache can still be maybe 30 minutes old, since the propagation can take some time.

    The performance is a lot better than without cache and without optimization. In general, the throughput with 250.000 items will be 1.000 times better, so comparable with using 250 items. Given the typical page size of 60 of Exact Online, it will feel as 5 + 5 + 3 = 13 I/Os, so approximately 2,6 seconds which is near the boundaries given of 2,5 seconds.

    Please note that the bill of materials table you are using will NEVER use a lookup by index, since there is no index available at this moment. So if you have a large bill of materials across all products, you must data cache for a reasonable performance.