aerospikeaerospike-ce

Select random bin from a set in Aerospike Query Language?


I want to select a sample of random 'n' bins from a set in the namespace. Is there a way to achieve this in Aerospike Query Language?

In Oracle, we achieve something similar with the following query:

SELECT * FROM <table-name> sample block(10) where rownum < 101 

The above query fetches blocks of size of 10 rows from a sample size of 100.

Can we do something similar to this in Aerospike also?


Solution

  • Rows are like records in Aerospike, and columns are like bins. You don’t have a way to sample random columns from a table, do you?

    You can sample random records from a set using ScanPolicy.maxRecords added to a scan of that set. Note the new (optional) set indexes in Aerospike version 5.6 may accelerate that operation.

    Each namespace has its data partitioned into 4096 logical partitions, and the records in the namespace evenly distributed to each of those using the characteristics of the 20-byte RIPEMD-160 digest. Therefore, Aerospike doesn't have a rownum, but you can leverage the data distribution to sample data.

    Each partition is roughly 0.0244% of the namespace. That's a sample space you can use, similar to the SQL query above. Next, if you are using the ScanParition method of the client, you can give it the ScanPolicy.maxRecords to pick a specific number of records out of that partition. Further you can start after an arbitrary digest (see PartitionFilter.after) if you'd like.

    Ok, now let's talk data browsing. Instead of using the aql tool, you could be using the Aerospike JDBC driver, which works with any JDBC compatible data browser like DBeaver, SQuirreL, and Tableau. When you use LIMIT on a SELECT statement it will basically do what I described above - use partition scanning and a max-records sample on that scan. I suggest you try this as an alternative.