shellawkreadline

text file -- how to sort adjacent lines that have the same level of indentation


UPDATE: The root problem has been solved by fixing a number of Sequelize migrations that always run before mysqldump is called, as discussed in the comments that are below the article that is linked in the next paragraph. However, the core technical challenge is still interesting.

I have a problem with mysqldump that might be solved by configuring mysqldump differently, but probably will be solved by just piping the output through a shell script.

Basically, mysqldump always outputs the tables in the same order, but it list all columns (other than id) for each table in random order.

So, the first run might output this...

create TABLE `ONE` ( 
  `id` int NOT NULL AUTO_INCREMENT,
  `column_a` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_b` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_c` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
create TABLE `TWO` ( 
  `id` int NOT NULL AUTO_INCREMENT,
  `column_x` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_y` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_z` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

...and on the second run, it might produce something like this:

create TABLE `ONE` ( 
  `id` int NOT NULL AUTO_INCREMENT,
  `column_c` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_b` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_a` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
create TABLE `TWO` ( 
  `id` int NOT NULL AUTO_INCREMENT,
  `column_y` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_x` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `column_z` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

I'd like to pipe the result through a shellscript that always sorts the lines in the same way. What script would achieve this? The script needs to run on a build agent that runs on Ubuntu, so if it is possible and practical to use standard GNU tools like awk then that would be superior to using custom tools.


Solution

  • Assumptions:

    One approach:

    One GNU awk implementation of this approach:

    awk '
    BEGIN                { PROCINFO["sorted_in"]="@val_str_asc"     # sort arrays by value, sorted as string in asc[ending] order
                           delete lines                             # designate variable "lines" as an array
                         }
    
    /^[[:space:]]/    &&                                            # if line starts with white space (ie, it is indented) and ...
    !/AUTO_INCREMENT/ &&                                            # line does not contain string "AUTO_INCREMENT" and ...
    / NULL/              { lines[++cnt] = $0                        # line contains a single white space + string "NULL", then save current line in array
                           next                                     # skip to next line of input
                         }
    
    cnt                  { for (i in lines)                         # loop through indices of array in @val_str_asc sorted order
                               print lines[i]                       # print array value to stdout
                           delete lines                             # reset array
                           cnt = 0                                  # reset counter (aka array index)
                         }
    1                                                               # print current line of input to stdout
    ' sample.sql
    

    NOTES:

    This generates:

    create TABLE `ONE` (
      `id` int NOT NULL AUTO_INCREMENT
      `column_a` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
      `column_b` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
      `column_c` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;
    create TABLE `TWO` (
      `id` int NOT NULL AUTO_INCREMENT,
      `column_x` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
      `column_y` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
      `column_z` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB;