xmlgobleve

Index XML with the Go bleve text indexing library


How can I use the bleve text-indexing library, https://github.com/blevesearch/bleve, to index XML content?

I thought about using code like this XML parser in Go: https://github.com/dps/go-xml-parse, but then how do I pass what is parsed to Bleve to be indexed?

Update: My XML:

My XML looks like the following:

<page>
    <title>Title here</title>
    <image>image url here</title>
    <text>A sentence of two about the topic</title>
    <facts>
        <fact>Fact 1</fact>
        <fact>Fact 2</fact>
        <fact>Fact 3</fact>
    </facts>
</page>

Solution

  • You would create a struct defining the structure of your XML. You can then use the standard "encoding/xml" package to unmarshal XML into the struct. And from there you can index the struct with Bleve as normal.

    http://play.golang.org/p/IZP4nrOotW

    package main
    
    import (
        "encoding/xml"
        "fmt"
    )
    
    type Page []struct {
        Title string `xml:"title"`
        Image string `xml:"image"`
        Text  string `xml:"text"`
        Facts []struct {
            Fact string `xml:"fact"`
        } `xml:"facts"`
    }
    
    func main() {
        xmlData := []byte(`<page>
        <title>Title here</title>
        <image>image url here</image>
        <text>A sentence of two about the topic</text>
        <facts>
            <fact>Fact 1</fact>
            <fact>Fact 2</fact>
            <fact>Fact 3</fact>
        </facts>
    </page>`)
    
        inputStruct := &Page{}
        err := xml.Unmarshal(xmlData, inputStruct)
        if nil != err {
            fmt.Println("Error unmarshalling from XML.", err)
            return
        }
    
        fmt.Printf("%+v\n", inputStruct)
    }