regexelasticsearchnestelasticsearch-net

Elasticsearch pattern_syntax_exception Illegal repetition near index


I'm trying to create a pattern analyzer using a .NET Client (NEST ver. 7.17.4 with Elastic 8.4) and I'm getting the following exception.

Elasticsearch.Net.ElasticsearchClientException: 'Request failed to execute. Call: Status code 400 from: PUT /temp-index-for-integration-tests?pretty=true&error_trace=true. ServerError: Type: pattern_syntax_exception Reason: "Illegal repetition near index 80
([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])
                                                                          ^"'

This regex seems to be working fine when being created from dev console so not sure what I'm missing.

Below works:

PUT test-index-3
{
  "settings": {
    "analysis": {
      "analyzer": {
        "camel": {
          "type": "pattern",
          "pattern": "([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])"
        }
      }
    }
  }
}

This doesn't and throws exception on Indices.CreateAsync

public async Task IndexCreateAsync(IEnumerable<IFieldDefinition> fieldDefinitions, string indexName)
{
    Dictionary<PropertyName, IProperty> indexFields = fieldDefinitions
        .Select(f => _elasticsearchMapper.Map(f))
        .ToDictionary(p => p.Name, p => p);

    PutMappingRequest mappings = new PutMappingRequest(indexName)
    {
        Properties = new Properties(indexFields)
    };

    var patternAnalyzer = new PatternAnalyzer
    {
        Pattern = @"([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])",
        Lowercase = true
    };

    IndexState indexSettings = new IndexState
    {
        Mappings = mappings,
        Settings = new IndexSettings
        {
            Analysis = new Analysis
            {
                Analyzers = new Analyzers
                {
                    {
                        ElasticsearchConstants.TextAnalysis.CustomAnalyzers.CustomPatternRegexCasesAndSpecialChars,
                        patternAnalyzer
                    }
                }
            }
        }
    };

    await _elasticClient.Indices.CreateAsync(indexName, s => s.InitializeUsing(indexSettings));
}

Solution

  • Ok was able to find a solution. I'm guessing there is a different serialization when using Developer console in Elastic cloud and when using .NET client

    Basically what I needed to do was to replace all \\ characters with a single slash \.

    So when Regex was changed in .NET code to

    var patternAnalyzer = new PatternAnalyzer
                {
                    Pattern = @"([^\p{L}\d]+)|(?<=\D)(?=\d)|(?<=\d)(?=\D)|(?<=[\p{L}&&[^\p{Lu}]])(?=\p{Lu})|(?<=\p{Lu})(?=\p{Lu}[\p{L}&&[^\p{Lu}]])"
                };
    

    Everything started to work!