sqlpostgresqlcountbytea

Count Zero and/or Non-Zero bytes in a bytea/string


I have a field in bytea format, and I'm trying to calculate how many zero bytes are in the field (postgresql).

Example String:

0x202ee0ed0000000000000000000000000000000000000000000000000000000000014370000000000000000000000000000000000000000000000000000000003f8affe7

I originally tried to do this by counting how often 00 occurred, but this can be inaccurate since that doesn't check if it's a byte or two 0s that happen to be next to each other.

I'm currently using this regex_replace (found from another question) to force a _ in between bytes so that I can then count 00 occurrences, but this slows down the query by multiple magnitudes vs a simple replace on the order of 100k-1m+ rows: regexp_replace(data::text, '(..)', E'\\1_', 'g') - Produces something like 20_2e_e0...

I'm wondering if anyone knows of a more performant way to count the number of zero or non-zero bytes in a bytea/string?


Solution

  • There is no very simple and efficient way in SQL, I think. The best I can come up with is:

    SELECT cardinality(
              string_to_array(
                 encode('\xDEADF00D0000', 'escape'),
                 '\000'
              )
           ) - 1;
    
     ?column? 
    ══════════
            2
    (1 row)