cassandradatastax-astra

Is there a limit to the number of partition keys in a SELECT statement?


Is there a limitation in terms of the number of Cassandra partition keys when doing a select ?

SELECT * FROM series_pfc_gas
  WHERE as_of='2024-07-25'
  AND name IN ('ZTP_FLX_B_LUX_ALL', 'ZTP_FLX_B_TCP_ALL', 'ZTP_FLX_B_EED_ALL', 'NCG_FLX_B_LUX_ALL', 'NCG_FLX_B_TCP_ALL', 'NCG_FLX_B_EED_ALL', 'PEG_FLX_B_LUX_ALL', 'PEG_FLX_B_TCP_ALL', 'PEG_FLX_B_EED_ALL', 'TTF_FLX_B_LUX_ALL', 'TTF_FLX_B_TCP_ALL', 'TTF_FLX_B_EED_ALL', 'ZTP_STC_B_LUX_ALL', 'ZTP_STC_B_TCP_ALL', 'ZTP_STC_B_EED_ALL', 'NCG_STC_B_LUX_ALL', 'NCG_STC_B_TCP_ALL', 'NCG_STC_B_EED_ALL', 'PEG_STC_B_LUX_ALL', 'PEG_STC_B_TCP_ALL', 'PEG_STC_B_EED_ALL', 'TTF_STC_B_LUX_ALL', 'TTF_STC_B_TCP_ALL', 'TTF_STC_B_EED_ALL')
  AND time >= '2024-01-01T00:00:00.000+01:00' AND time < '2029-01-01T00:00:00.000+01:00'

will return the following error message:

InvalidRequest: Error from server: code=2200 [Invalid query] \
  message="Select query cannot be completed because it selects \
  24 partitions keys - more than the maximum allowed 20"

And this is the structure of the tables :

CREATE TABLE py2api.series_op_gas (
    name text,
    as_of timestamp,
    time timestamp,
    day int,
    month int,
    quarter int,
    se_year int,
    season int,
    value double,
    week int,
    wk_year int,
    year int,
    PRIMARY KEY ((name, as_of), time)
) WITH CLUSTERING ORDER BY (time ASC)

An example of some rows:

 name              | as_of                           | time                            | day | month | quarter | se_year | season | value      | week | wk_year | year
-------------------+---------------------------------+---------------------------------+-----+-------+---------+---------+--------+------------+------+---------+------
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-12 05:00:00.000000+0000 |  12 |     9 |       3 |    2023 |      1 |          0 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-13 05:00:00.000000+0000 |  13 |     9 |       3 |    2023 |      1 | 4.4409e-16 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-14 05:00:00.000000+0000 |  14 |     9 |       3 |    2023 |      1 | 4.4409e-16 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-15 05:00:00.000000+0000 |  15 |     9 |       3 |    2023 |      1 |          0 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-16 05:00:00.000000+0000 |  16 |     9 |       3 |    2023 |      1 | 4.4409e-16 |   37 |    2023 | 2023

Anyone has another structure in mind for composing the key in Cassandra ?


EDIT 1

I tried moving the Partition key as a Clustering key but then I cannot use the time clustering key for filtering with it.

CREATE TABLE pfmpy2api.series_op_gas_test_2 (
             ...     name text,
             ...     as_of timestamp,
             ...     time timestamp,
             ...     day int,
             ...     month int,
             ...     quarter int,
             ...     se_year int,
             ...     season int,
             ...     value double,
             ...     week int,
             ...     wk_year int,
             ...     year int,
             ...     PRIMARY KEY ((as_of), time, name)
             ...) WITH CLUSTERING ORDER BY (time ASC, name ASC)

But then I get the following error when executing the same query above:

InvalidRequest: Error from server: code=2200 [Invalid query] \
 message="PRIMARY KEY column "name" cannot be restricted (preceding \
 column "time" is restricted by a non-EQ relation)

So one solution would be (in order to keep a solution with a table in Cassandra and not to break those anti patterns, while having a single select) to use a duplicate column in the partition key (because without this column I wouldn't get a unique primary key) and as a normal column and then my query works.

CREATE TABLE pfmpy2api.series_op_gas_test (
             ...     market text,
             ...     name text,
             ...     as_of timestamp,
             ...     time timestamp,
             ...     day int,
             ...     month int,
             ...     quarter int,
             ...     se_year int,
             ...     season int,
             ...     value double,
             ...     week int,
             ...     wk_year int,
             ...     year int,
             ...     PRIMARY KEY ((market, as_of), time)
             ) WITH CLUSTERING ORDER BY (time ASC)
             ...

This would allow my query without affecting the partition keys selection.


EDIT 2

The answers of @Eric Ramirez and @stevenlacerda are helpful and I tried and compared the performance of doing one big query like above where the name filed is in (val1, value2, ... , valn) and another one where I took name field is equal to single individual values in the query and did a comparison in the performance.

# Run time 0.869622178026475 seconds - for the big query for book in (...)
# Run time 0.7818770059966482 seconds - for the small queries for each book

    from time import monotonic
    start_time = monotonic()    

    op_power_query = """
                    SELECT * FROM series_op_power WHERE as_of=? AND name IN ? AND time >= ? AND time < ?
                    """

    prepared = db.session.prepare(op_power_query)
    bound_stmt = prepared.bind((
        parse_ts(as_of),
        books,
        start_date,
        end_date))

    execution = db.session.execute(bound_stmt, timeout=None)
    df_op = pd.DataFrame(list(execution))

    print(f"Run time {monotonic() - start_time} seconds")
    start_time = monotonic()

    op_simple_power_query = """    
                SELECT * FROM series_op_power WHERE as_of=? AND name = ? AND time >= ? AND time < ?
                """

    df_op = pd.DataFrame()
    for book in books:
        prepared = db.session.prepare(op_simple_power_query)
        bound_stmt = prepared.bind((
            parse_ts(as_of),
            book,
            start_date,
            end_date))

        execution = db.session.execute_async(bound_stmt, timeout=None)
        df_op_temp = pd.DataFrame(execution.result())
        df_op = pd.concat([df_op, df_op_temp], 1)

    print(f"Run time {monotonic() - start_time} seconds")

Thanks for the suggestions !


Solution

  • In Apache Cassandra 4.1, a new feature was added called Guardrails -- a framework that provides operators the ability to enforce limits for optimum operation of the database (CASSANDRA-17146). One of those guardrails is to limit the number of partition keys restricted with the IN() clause (CASSANDRA-17186).

    Cassandra itself does not impose a limit out-of-the-box because the guardrails for the IN() clause are disabled by default in cassandra.yaml:

    # Guardrail to warn or fail when querying with an IN restriction selecting more partition keys than threshold.
    # The two thresholds default to -1 to disable.
    # partition_keys_in_select_warn_threshold: -1
    # partition_keys_in_select_fail_threshold: -1
    

    Guardrails is a feature that was designed in Astra DB several years ago and DataStax donated it to the Apache Cassandra project in 2020 (enhancement proposal CEP-3). Astra DB has this guardrail configured with a failure threshold of 20:

    partition_keys_in_select_failure_threshold: 20
    

    But even 20 partition keys is a very high threshold. As a general rule, we recommend using the IN() clause with only 2 or 3 partition keys because anything more puts a lot of pressure on the coordinator.

    When you use the IN() clause on partition keys, the coordinator has to issue multiple separate requests for each partition key in the list.

    In your case where there are 24 partition keys, the coordinator has to fire off 24 separate read requests instead of just 1. This is not recommended and should be avoided where possible. Instead use IN() on clustering columns that is restricted to a single partition key:

    SELECT ... FROM table_name
      WHERE partition_key = ?
      AND clustering_col IN (...)
    

    to filter the rows within a single partition.

    If you need to retrieve multiple partitions, we recommend making multiple asynchronous requests from your application then aggregating the results on the client side. Asynchronous requests are going to be much more performant and will have predictable performance since that is what Cassandra is designed for.

    I should point out that multi-partition reads is anti-pattern in Cassandra. It indicates two things:

    1. You have an analytics query (instead of OLTP) so consider using Apache Spark for analytics workloads.
    2. The data model does not match the application's access pattern so consider remodelling your data so tables are optimised for the app queries.