regexsedxxdalternationregex-alternation

Make sed regex alternations follow left to right precedence?


I'm trying to use a regex to format some binary from xxd -b, but to demonstrate this simply I'll show you what I expect to happen:

Regex to delete: /1x|1.*/

Text: 1x21y3333333313333 -> 2

Where all occurrences of 1x are deleted, then everything starting at the first 1 that shows up should be deleted. It should be immediately obvious what's going on, but if it's not, play with this. The key is that if 1x is matched, the rest of the pattern should be aborted.

Here is the output from echo "AA" | xxd -b (the bindump of AA\n):

0000000: 01000001 01000001 00001010                             AA.

My goal is to 1. delete the first 0 for every byte (ascii = 7 bits) and 2. delete the rest of the string so only the actual binary is kept. So I have piped it into sed 's/ 0//g':

0000000:100000110000010001010                             AA.

Adding the second step, sed -E 's/ 0| .*//g':

0000000:

Obviously, I expect to instead get:

0000000:100000110000010001010

Things I've tried but haven't done the job:

I will use perl instead in the meantime, but this behaviour baffles me and maybe there's a reason (lesson) here?


Solution

  • If I understand your question correctly, this produces what you want:

    $ echo "AA" | xxd -b | sed -E 's/ 0|  .*//g'
    00000000:100000110000010001010
    

    The key change here is the use of two blanks in front of .* so that this only matches the part that you want to remove.

    Alternatively, we can remove blank-zero first:

    $ echo "AA" | xxd -b | sed -E 's/ 0//g; s/ .*//'
    00000000:100000110000010001010