
How to apply custom analyzers on a field in Vespa schema

I have the following schema

schema product {
    document product {
        field brand_name type string {
            indexing: summary | index
        field brand_name_tokens type array<string> {    #computed field

I want to have a field called brand_name_tokens of type array<string> in the document which is derived from brand_name field as follows

I can do this processing before writing to Vespa. I would like to know if its possible to define this in the schema so that Vespa automatically computes this.


  • you can do this - there's a "split by regex". There's no "remove substring" but if you don't mind splitting on ™/® then you can do it with a slightly fancier regex.

    It looks a little different because you can't input unicode characters directly in the schema so you have to replace ™ with \xe2\x84\xa2:

    field brand_name_tokens type array<string> {
        indexing: input brand_name | split "([. ®]|\xe2\x84\xa2)+" | summary

    As this is a computed field, it should be defined outside the document product { block.

    See and the rest of the page for what you can do at indexing time. If you need to do more, you can write a custom Document Processor