pythonbashtextseded

Swap two blocks of text programatically


I have an XML file made up of multiple blocks that are very similar. Here are two:

    <Grid Name="EMFieldMany" GridType="Uniform">
        <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
        <Attribute AttributeType="Scalar" Name="Er" Center="Node">
            <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/field/Er-0
            </DataItem>
        </Attribute>
        <Geometry GeometryType="VXVYVZ">
            <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/r
            </DataItem>
            <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/theta
            </DataItem>
            <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/z
            </DataItem>
        </Geometry>
    </Grid>
    <Grid Name="EMFieldMany" GridType="Uniform">
        <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
        <Attribute AttributeType="Scalar" Name="Er" Center="Node">
            <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/field/Er-1
            </DataItem>
        </Attribute>
        <Geometry GeometryType="VXVYVZ">
            <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/r
            </DataItem>
            <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/theta
            </DataItem>
            <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/z
            </DataItem>
        </Geometry>
    </Grid>

Typically I have hundreds of similar <Grid> objects in a single file. Now, I want to programatically swap the positions of the <DataItem Name="r"> and <DataItem Name="z"> blocks in each <Grid> object such that the <DataItem>s are in the order z, theta, r. Also, for each Dimensions=" x y z " statement, each Dimensions attribute that contains three values, I want the attribute to be rewritten as Dimensions=" z y x ".

I don't really mind the programming language being used to do this. I'm on a Linux workstation with bash, python, perl... all the standard stuff.

EDIT: This answer uses sed to match blocks of text, but I'm not sure how to manipulate the selected block afterwards. This other answer swaps single lines up and down, but I'm not sure how to generalize to blocks of text, and to make it swap blocks.


Solution

  • As you mention sed, I suggest this perl solution however it's better parsing xml with xml parser.

    #!/usr/bin/perl
    
    # changing input line separator
    $/="</Grid>";
    
    while ( $_=<> ) {
        s@(\s*<DataItem Name="r".*?</DataItem>)(\s*<DataItem Name="theta".*?</DataItem>)(\s*<DataItem Name="z".*?</DataItem>)@$3$2$1@s;
        s@<DataItem Dimensions="\K(\d+) (\d+) (\d+) @$3 $2 $1 @;
        print;
    }
    

    Or the one-liner equivalent

    perl -pe 'BEGIN{$/="</Grid>"}s@(\s*<DataItem Name="r".*?</DataItem>)(\s*<DataItem Name="theta".*?</DataItem>)(\s*<DataItem Name="z".*?</DataItem>)@$3$2$1@s;s@<DataItem Dimensions="\K(\d+) (\d+) (\d+) @$3 $2 $1 @;' <input.txt