perlwsman

Break XML File into multiple XML Files


I want to break XML output given by WSMAN into multiple XML files so that I can parse the output.

WSMAN gives me output as below which basically has two distinct XML files with each having its own root node:

<?xml version="1.0" encoding="UTF-8"?>
  <s:Body>
    <wsen:PullResponse>
      <wsen:Items>
        <n1:DCIM_SoftwareIdentity>
          <n1:ComponentType>BIOS</n1:ComponentType>
          <n1:InstanceID>DCIM:CURRENT#741__BIOS.Setup.1-1</n1:InstanceID>
          <n1:VersionString>1.3.6</n1:VersionString>
        </n1:DCIM_SoftwareIdentity>
      </wsen:Items>
    </wsen:PullResponse>
  </s:Body>
<?xml version="1.0" encoding="UTF-8"?>
  <s:Body>
    <wsen:PullResponse>
      <wsen:Items>
        <n1:DCIM_SoftwareIdentity>
          <n1:ComponentType>BIOS</n1:ComponentType>
          <n1:InstanceID>DCIM:INSTALLED#741__BIOS.Setup.1-1</n1:InstanceID>
          <n1:VersionString>1.3.6</n1:VersionString>
        </n1:DCIM_SoftwareIdentity>
      </wsen:Items>
    </wsen:PullResponse>
  </s:Body>

I cannot parse above output with XML::Simple as above output contains 2 root elements which is "junk" in terms of XML

Question/Statement:

I want to break above output into two distinct XML files with each containing its own root element as below:

<?xml version="1.0" encoding="UTF-8"?>
  <s:Body>
    <wsen:PullResponse>
      <wsen:Items>
        <n1:DCIM_SoftwareIdentity>
          <n1:ComponentType>BIOS</n1:ComponentType>
          <n1:InstanceID>DCIM:CURRENT#741__BIOS.Setup.1-1</n1:InstanceID>
          <n1:VersionString>1.3.6</n1:VersionString>
        </n1:DCIM_SoftwareIdentity>
      </wsen:Items>
    </wsen:PullResponse>
  </s:Body>

......

<?xml version="1.0" encoding="UTF-8"?>
  <s:Body>
    <wsen:PullResponse>
      <wsen:Items>
        <n1:DCIM_SoftwareIdentity>
          <n1:ComponentType>BIOS</n1:ComponentType>
          <n1:InstanceID>DCIM:INSTALLED#741__BIOS.Setup.1-1</n1:InstanceID>
          <n1:VersionString>1.3.6</n1:VersionString>
        </n1:DCIM_SoftwareIdentity>
      </wsen:Items>
    </wsen:PullResponse>
  </s:Body>

My logic:

1) Parse the output line by line

2) if you encounter ?xml version pattern, then create a new XML file and write ?xml version line and further lines to this new file until again you encounter ?xml version pattern.

3) follow Step 2 every time you encounter ?xml version pattern

Here is my code:

#!/usr/bin/perl -w
use strict;
use XML::Simple;
use Data::Dumper;

my $counter = 0;
my $fileName;

while (my $line = <DATA>)
{
    if ( $line =~ /\?xml version/ )
    {   
        $counter++;
        print "Creating the BIOS file \n";
        $fileName = "BIOS"."_".$counter;
    }   
    open (my $sub_xml_file, ">" , $fileName) or die "Canot create $fileName: $!\n";
    print $sub_xml_file $line;
}

__DATA__
## omitting this part as this contains the XML info listed above.

Now, my script does create the files BIOS_1 and BIOS_2 but it writes only the last line of above XML output to it:

# cat BIOS_1
  </s:Body>
# cat BIOS_2
  </s:Body>

Can you help me fixing my script in order to create two distinct XML files...


Solution

  • You never preserve $line for future loop passes.

    Load everything in memory approach:

    my $count;
    my $file; { local $/; $file = <>; }
    for my $xml (split /^(?=<\?xml)/m, $file) {
       my $fn = sprintf("BIOS_%d.xml", ++$count);
       open(my $fh, '>', $fn) or die $!;
       print $fh $xml;
    }
    

    Line at a time approach:

    my $fh;
    my $count;
    while (<>) {
       if (/^<\?xml/) {
          my $fn = sprintf("BIOS_%d.xml", ++$count);
          open($fh, '>', $fn) or die $!;
       }
    
       print $fh $_;
    }