bashcombinationsbrace-expansionword-list

Generating words from 0.00.00_aa to 9.99.99_zz


I want to generate via bash script.

The desired output should be something like this:

0.00.0    
0.00.00
0.00.01
...
1.26.0
1.26.00
1.26.01
1.26.02
...
0.00.0_a
...
0.00.0_z
0.00.00_a
...
0.00.01_a
...
9.99.99_z
...
0.00.0_aa
...
0.00.00_aa
...
1.26.99_zz
...
9.99.99_zz

I find this:

printf "%03d\n" {0..999}

But with this script output is:

000
001
002
...
997
998
999

So, how to modify this script to get my desired output?


Solution

  • Concatenate multiple brace expansions to build their cartesian product. That is, to generate 00 01 ... 99 you can write {0..9}{0..9}. Since bash 4.0 you can also write {00..99}. This only works for numbers. For letters, you still have to write {a..z}{a..z}.

    For the single 0 in 0 00 01 02 ... 99 you can nest brace expansions like so: {0,{00..99}}. Same goes for the missing letters where we use the empty string: {,{a..z}}.


    WARNING: The following commands take up a lot of memory. The output might be "only" around 750 MB on disk but the running bash processes used more than 16 GB memory for me. If you have insufficient memory/swap the command might just get killed (if you are lucky) or your system freezes, requiring you do a hard reboot.

    For a better solution, see the end of this answer.


    Now lets put everything together:

    printf %s\\n {0..9}.{00..99}.{0,{00..99}}{,_{,{a..z}}{a..z}} > outputFile
    

    This brace expansion generates 71'003'000 lines, printing them to stdout would take ages, so we redirected the output to the file outputFile instead. You can confirm that this generates at least the lines from your example by running grep -Fxf exampleAsAFile outputFile. Alternatively, run this simplified command where we replaced 0..9 by 0..1 and a..z by a..b, then inspect the result manually:

    printf %s\\n {0..1}.{0..1}{0..1}.{0,{0..1}{0..1}}{,_{,{a..b}}{a..b}}
    

    Even though we just generated all the required lines, the order is different from your example. To adapt the order you could run the result through a Schwartzian transform sort, but that would be a waste of ressources. Instead you can use multiple brace expansions such that everything is generated in the right order:

    printf %s\\n \
      {0..9}.{00..99}.{0,{00..99}} \
      {0..9}.{00..99}.{0,{00..99}}_{a..z} \
      {0..9}.{00..99}.{0,{00..99}}_{a..z}{a..z} \
      > outputFile
    

    Reducing Memory Footprint

    To reduce the memory footprint you can split off a prefix into a for loop. Where exactly to split depends on your preference and system. Less braces in the loop means more memory but faster execution (as long as you have enough memory). More braces in the loop means slower execution but less memory (as long as the prefix is shorted than half of the brace expansion; making it longer will have only negative effects).

    # use only if order doesn't matter. 
    # takes 1m30s and 24 MB of memory
    for prefix in {0..9}.{00..99}; do
        printf "$prefix.%s\n" {0,{00..99}}{,_{,{a..z}}{a..z}}
    done > outputFile
    

    or

    # takes 2m and 24 MB of memory
    for prefix in {0..9}.{00..99}; do
      printf "$prefix.%s\n" {0,{00..99}} >> part1
      printf "$prefix.%s\n" {0,{00..99}}_{a..z} >> part2
      printf "$prefix.%s\n" {0,{00..99}}_{a..z}{a..z} >> part3
    done
    cat part{1..3} > outputFile