solrdih

Why does the Solr Data Import Handler hashes the uniqueKey?


I have a very strange problem with Solr 4.6.0.

The uniqueKey field "id" contains a hash for every document instead of my string value. If add just one custom document with the update request handler in the Solr admin I get for example the ID value "book_45" that I specified, so that is correct.

But when I do a full import with the DIH (data import handler) then the id field contains a hash for every document like "[B@53bd370f" instead of my custom value. So the problem must be in the DIH.

My import script:

<dataConfig>
<dataSource
    type="JdbcDataSource"
    driver="com.mysql.jdbc.Driver"
    url="jdbc:mysql://host/database"
    user="user"
    password="password" />
<document name="project">
    <entity name="document" transformer="RegexTransformer"
        query="SELECT CONCAT('book_', b.id) AS book_id, b.slug, b.title, b.isbn,
                b.publisher, b.releaseYear AS release_year, b.language, b.pageCount AS page_count, b.description,
                b.print, b.addedBy_id AS added_by_id, b.dt AS created,
                GROUP_CONCAT(a.name SEPARATOR ';') AS authors
            FROM Book b
            LEFT JOIN author_book ab ON ab.book_id = b.id
            LEFT JOIN Author a ON a.id = ab.author_id
            GROUP BY b.id
            ">
        <field column="book_id" name="id" />
        <field column="slug" name="book_slug" />
        <field column="title" name="book_title" />
        <field column="isbn" name="book_isbn" />
        <field column="publisher" name="book_publisher" />
        <field column="release_year" name="book_release_year" />
        <field column="language" name="book_language" />
        <field column="page_count" name="book_page_count" />
        <field column="description" name="book_description" />
        <field column="print" name="book_print" />
        <field column="added_by_id" name="book_added_by_id" />
        <field column="created" name="book_created" />
        <field column="authors" splitBy=";" name="authors" />
    </entity>
</document>

The id field in my schema.xml (which is the same as in the default shipped core collection1):

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>id</uniqueKey>

Does anyone know what I am missing?


Solution

  • the [B@53bd370f is not a hash, but the result of a byte[].toString(). Whatever Mysql is returning is being treated as a byte[] instead of a String.

    Try casting the id to varchar or char like this:

     SELECT cast(CONCAT('book_', b.id) as CHAR) AS book_id...