I am getting an empty result when doing XML unmarshall in Go. I have researched other SO questions and I have noticed that the most common reason for this seems to be that the fields are not exported. This is not true in my case, since all the names begin with an uppercase letter.
The xml looks like this (with nearly 1.000.000 ROW
tags inside one single ROWDATA
):
<ROWDATA>
<ROW>
<ПІБ> ПОПКО РУСЛАН ВАСИЛЬОВИЧ</ПІБ>
<Місце_проживання>61112, Харківська обл., місто Харків, Московський район, ПРОСПЕКТ П'ЯТДЕСЯТИРІЧЧЯ ВЛКСМ, будинок 86, квартира 65</Місце_проживання>
<Основний_вид_діяльності>45.32 Роздрібна торгівля деталями та приладдям для автотранспортних засобів</Основний_вид_діяльності>
<Стан>зареєстровано</Стан>
</ROW>
</ROWDATA>
And this is what I have done:
package main
import (
"encoding/xml"
"fmt"
"golang.org/x/text/encoding/charmap"
"golang.org/x/text/transform"
"io/ioutil"
"os"
"strings"
)
type Rowdata struct {
XMLName xml.Name `xml:"ROWDATA"`
Rowdata []Row `xml:"ROW"`
}
type Row struct {
XMLName xml.Name `xml:"ROW"`
Location string `xml:"Місце_проживання"`
Director string `xml:"ПІБ"`
Activity string `xml:"Основний_вид_діяльності"`
City string `xml:"Стан"`
}
func main() {
xmlFile, err := os.Open("FOP_1.xml")
if err != nil {
fmt.Println(err)
}
defer xmlFile.Close()
byteValue, _ := ioutil.ReadAll(xmlFile)
koi8rString := transform.NewReader(strings.NewReader(string(byteValue)), charmap.Windows1251.NewDecoder())
decBytes, _ := ioutil.ReadAll(koi8rString)
var entries Rowdata
xml.Unmarshal(decBytes, &entries)
for i := 0; i < len(entries.Rowdata); i++ {
fmt.Println("Name: " + entries.Rowdata[i].Director)
}
}
And the last for loop never runs, because the length is zero. However, I have a similar example where the file was already UTF8, so no encoding transformation was needed, and it went well. I wonder if I messed up something while decoding?
UPDATE: I tested a simpler version with a string rather than a file in the Go Play Space, and it works fine! However, my local version with the file still doesn't work, so I suspect it might have something to do with the actual reading of the file...
UPDATE2: I just realized that xml.Unmarshall
returns:
xml: encoding "windows-1251" declared but Decoder.CharsetReader is nil%
That might be the cause of this... but what does it mean?
You mention that you have "nearly 1.000.000 ROW tags" and in your code you use ioutil.ReadAll(xmlFile)
to read it all into memory (twice!) - thats totally unnessesary and you might run out of memory. Instead of reading it into memory you should use "streaming" decoder, something like
import "golang.org/x/net/html/charset"
func main() {
xmlFile, err := os.Open("FOP_1.xml")
if err != nil {
fmt.Println(err)
}
defer xmlFile.Close()
parser := xml.NewDecoder(xmlFile)
parser.CharsetReader = charset.NewReaderLabel
for {
t, _ := parser.Token()
if t == nil {
break
}
switch se := t.(type) {
case xml.StartElement:
if se.Name.Local == "ROW" {
var item Row
parser.DecodeElement(&item, &se)
fmt.Println("Name: " + item.Director)
}
}
}
}