solrsolr5

Solr sorting text in Polish


I have solr 5.2.1 and such definition of field which is used for sorting:

<fieldType name="polishSortVarchar" class="solr.ICUCollationField" locale="pl_PL" strength="secondary" />

After reindex sorting almost work as I want:

{
  "responseHeader": {
    "status": 0,
    "QTime": 2,
    "params": {
      "fl": "name_varchar",
      "sort": "sort_name_varchar asc",
      "indent": "true",
      "q": "*:*",
      "_": "1454575147254",
      "wt": "json",
      "rows": "10"
    }
  },
  "response": {
    "numFound": 5250,
    "start": 0,
    "docs": [
      {
        "name_varchar": "\"Europą\" na Antarktydę"
      },
      {
        "name_varchar": "1:0 dla Korniszonka"
      },
      {
        "name_varchar": "1001 faktów o roślinach"
      }
    ]
  }
}

As You see on first position is phrase with " on 1st char, I want filter special chars and sort only by letters (so this phrase will be sorted by 'E' on first position).

Anybody?


Solution

  • I can't find solution directly in SOLR, so I clean unnecessary chars during indexation.

    $sortValue = preg_replace('/[^A-Za-z0-9- zżźćńółęąśŻŹĆĄŚĘŁÓŃ]/u', '', $sortValue);