When I load big tables into an IcCube schema, the number of lines being fetched per second differs strongly between different tables from the same database. Unfortunately I have no idea, how to optimize this, since I don't know, what this speed depends of.
For example I have one schema with 3 different tables, which have more than 20 million rows each. One tables is loaded with 15.000 rows/s, and another one with 100.000 rows/s
Are there any best practices, how to maximize that speed? Does it depend on the number of dimensions, on the number of measures, on the number of calculated measures, that are using this table or other things? And how severe are the different dependencies?
Assuming you're talking about the speed of processing of the tables used for building the cubes (indeed it is unlikely you want to process dimensions out of several millions of rows).
Schema Definition
Ensure there is no schema backup being activated and that the storage policy is the default (i.e., memory).
Dimension Processing
Dimension shouldn't be as big as fact but it might be the case. There is one constraint that might slow down it's 'Names unique in parent'. Just remove it if the dimensions isn't loading as fast as needed.
Facts Processing
icCube is building a facts (aka. measure group) by loading first the data from the underlying datasource and then processing those data. The LOADING and PROCESSING can happen at the same time and each one can be run in more than one thread to speed it up (you can see more details in icCube.xml
: loadReadingThreadCount, loadProcessingThreadCount, ...
).
In the end, icCube cannot build the facts faster than the data is delivered from the data source. So the very first point to check is the speed of the data source (in your case the speed of the SQL queries).
Data Source Speed
Since icCube v6.8, the configuration icCube.loadProcessingFactsMode = NONE
allows for processing the facts without building them. This way, looking at the schema statistics of the facts, you'll actually see the speed of the SQL queries. To make it easier, ensure you're loading the facts one by one using : icCube.loadReadingThreadCount = 1
. Then you can have a look to the schema statistics in the "Loading Details" for each facts. E.g.,
F : Cashflow.Facts
Rows Count : 100'000'000
Rows/Sec : 225'000
From the results, you can troubleshoot the SQL queries if the speed seems slow and/or contacts icCube, if the speed is far below what a native client for your data source would do.
The next step is to see if several facts can be loaded in parallel at the same speed. As you have 3 facts, you can try:
icCube.loadProcessingFactsMode = NONE
icCube.loadReadingThreadCount = 3
Again, check the resulting schema statistics and see if the datasource can deliver the data at the speed you like.
icCube Processing Speed
Once you know about the actual speed of the datasource and the number of loads in parallel you can use, you can determine the optimal number of processing threads; you can start with 2 (out of the box configuration):
icCube.loadProcessingFactsMode = FULL
icCube.loadReadingThreadCount = 3
icCube.loadProcessingThreadCount = 2
and increase it until you reach a limit where no more gain is obtained; we got for example a customer who reached 600'000 rows /sec using the following setup:
icCube.loadReadingThreadCount = 3
icCube.loadProcessingThreadCount = 6
During the processing, icCube is mainly building an internal index. The speed depends primarly on the number of hierarchies to index (you can see them in each facts statistics). Some extra information is available in the statistics:
F : Cashflow.Facts
Elapsed : 58m14s
- whole processing : 2h16m22s -- greater than elapsed because of several threads of processing
- resolve members : 1h25m49s -- from member "keys" to internal icCube members
- index & columns : 50m30s -- mainly about building the index
Page Lock Time : 25.45s -- the lower the better
Page Lock Count : 105258
Cache Hit : 8'135'974'783 -- the higher the better (used by resolve members above)
Cache Miss : 241'674'486
Some more information about the caches used for processing the "resolve members":
RM. Cache Dim : dim | type | nil | prev | hit | miss
: Stocks | LRU:2048 | 0 | 205'350 | 2'359'923 | 2'785'420
: ...
For each indexing dimension, you can see the 'prev/hit/miss' statistics. The best is to have all members resolved in the 'prev' column (a fast cache using prev. row information), then in the 'hit' column. Beware of high 'miss' as depending on your dimension this can be very costly.
Processing Queue
You can grep the logs for the pattern 'processing-queue' that is telling you if the processing is saturating the resource available. If the queue is full you can increase the number of processing thread and check again the queue and gain. Ideally the queue should remain empty.
Core Count
A quick note, the number of loadReading/Processing ThreadCount
should be lower then the number of cores available as the processing is very much CPU/RAM bound.
Java Garbage Collector
You can as well investigate the GC behavior (pauses) using grep "GC" in the log files. In case of issues, you can increase the amount of RAM and/or experiment with G1 using the following Java option when starting icCube: -XX:+UseG1GC
.
Hope that helps.