pythongoogle-cloud-bigtablebigtable

Is it possible to fetch multiple column family/qualifier combinations in Bigtable through the Python API?


I'm currently fetching multiple column qualifiers through the Bigtable Python API using filters, like:

row_filters.ColumnQualifierRegexFilter("c1|c2")

A column qualifier only needs to be unique within its column family though, so I would like to be able to use the same column qualifier in different column families, and only fetch the correct ones, e.g. say that families f1, f2, and f3 contain the columns c1, c2, and c3 but I just want f1:c1,c2 and f2:c3.

I could do this just fine with cbt: cbt <...other args...> read columns="f1:c1,f1:c2,f2:c3"

but if I just make a RowFilterChain with FamilyNameRegexFilter("f1|f2") and ColumnQualifierRegexFilter("c1|c2|c3") I'd get all the columns in both families.

Is there a way to specify column family and qualifier together in the API, or will I only be able to achieve this through column qualifiers that are unique throughout the whole table?


Solution

  • To fetch multiple combinations of filtered column families and column qualifiers, you'll need to combine chain filters and interleave filters.

    For your example to query f1:c1,f1:c2,f2:c3, you would do something like this:

    filter 1: Chain(column family f1, col  c1|c2)
    filter 2: Chain(column family f2, col c3)
    
    your filter: Interleave (filter 1, filter 2)
    

    Using the Python client that will look like this:

    client = bigtable.Client(project=project_id, admin=True)
    instance = client.instance(instance_id)
    table = instance.table(table_id)
    
    filter1 = row_filters.RowFilterChain(
        filters=[
            row_filters.FamilyNameRegexFilter("f1"),
            row_filters.ColumnQualifierRegexFilter("c1|c2")
        ]
    )
    filter2 = row_filters.RowFilterChain(
        filters=[
            row_filters.FamilyNameRegexFilter("f2"),
            row_filters.ColumnQualifierRegexFilter("c3")
        ]
    )
    rows = table.read_rows(
        filter_= row_filters.RowFilterUnion(
            filters=[
                filter1,
                filter2
            ]   
        )
    )