I am doing a performance comparison test for multiple select statements executed one after the other in a for
loop. The application tries to fetch all people having a certain age from an Ignite cache containing 20,000 randomly generated entries.
I've been planning to make it do one SELECT
query for every age, starting from 18
up to 98
, meaning 81 SELECTs.
However, it seems that Ignite is very, very slow, compared to Postgres. Postgres is able to perform 50 fetches fro those 81 entries in just 5 seconds, while Ignite performs 1 select for each age group in more than 11 seconds.
The Person
entity queried for is defined like this:
@Entity
public class Person {
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private Integer id;
private String name;
@QuerySqlField(index=true)
private Integer age;
// Getters & Setters...
}
I am also using Entity
and Id
, because my application may connect to two different datasources (Ignite and Postgres), based on the Spring profile.
The Ignite repository interface looks like this:
@RepositoryConfig(cacheName="Person")
@Repository
public interface IgnitePersonRepository extends IgniteRepository<Person, Integer> {
@Query("SELECT * FROM Person WHERE age = ?")
List<Person> findByAge(int age);
}
and the Query
annotation is the one from Ignite, not from JPA.
If necessary, this is also the logic of the Service class:
public double findByAge(int lowerAge, int upperAge, int totalSelects) {
List<Person> elements = new ArrayList<>();
stopwatch.start();
for (int i = lowerAge; i<upperAge; i++) {
for (int noSelects = 0; noSelects < totalSelects; noSelects++) {
elements.addAll(ignitePersonRepository.findByAge(i));
}
}
return stopwatch.getElapsedTime();
}
where in my initial tests, lowerAge=18
, upperAge=98
and totalSelects=1
.
What could be the problem here? I also explicitly specified that the age
field inside Person
should be indexed. To note that Ignite performs far better on my write tests (compared to Postgres). Is this how it should behave?
I see that you have an index on the age field. Note that an index is a b-tree index. A B-tree index will perform very well on a field that has high cardinality and high uniqueness. By contrast an age field for 20,000 entries will have low cardinality relative to number of entries. One item that might help is to store your data using affinity with the age field. This will force all entries with the same age to reside in the same partitions and therefore the same host. This will eliminate the network transfer of data to entries with the same age as these will already be on the same host. One other item is that you did not mention how many hosts your Ignite cluster has. Note that affinity would only help if you are using a multi-node cluster. Lastly be aware that each SQL statement in Ignite is a Map / Reduce operation which has an overhead to get each parsed and distributed.. If you need all ages from 18 to 98 then use 1 single statement with a between clause and compare that against PostGres. Hope that helps!