I'm trying to create a pattern analyzer using a .NET Client (NEST ver. 7.17.4 with Elastic 8.4) and I'm getting the following exception.
Elasticsearch.Net.ElasticsearchClientException: 'Request failed to execute. Call: Status code 400 from: PUT /temp-index-for-integration-tests?pretty=true&error_trace=true. ServerError: Type: pattern_syntax_exception Reason: "Illegal repetition near index 80
([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])
^"'
This regex seems to be working fine when being created from dev console so not sure what I'm missing.
Below works:
PUT test-index-3
{
"settings": {
"analysis": {
"analyzer": {
"camel": {
"type": "pattern",
"pattern": "([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])"
}
}
}
}
}
This doesn't and throws exception on Indices.CreateAsync
public async Task IndexCreateAsync(IEnumerable<IFieldDefinition> fieldDefinitions, string indexName)
{
Dictionary<PropertyName, IProperty> indexFields = fieldDefinitions
.Select(f => _elasticsearchMapper.Map(f))
.ToDictionary(p => p.Name, p => p);
PutMappingRequest mappings = new PutMappingRequest(indexName)
{
Properties = new Properties(indexFields)
};
var patternAnalyzer = new PatternAnalyzer
{
Pattern = @"([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])",
Lowercase = true
};
IndexState indexSettings = new IndexState
{
Mappings = mappings,
Settings = new IndexSettings
{
Analysis = new Analysis
{
Analyzers = new Analyzers
{
{
ElasticsearchConstants.TextAnalysis.CustomAnalyzers.CustomPatternRegexCasesAndSpecialChars,
patternAnalyzer
}
}
}
}
};
await _elasticClient.Indices.CreateAsync(indexName, s => s.InitializeUsing(indexSettings));
}
Ok was able to find a solution. I'm guessing there is a different serialization when using Developer console in Elastic cloud and when using .NET client
Basically what I needed to do was to replace all \\
characters with a single slash \
.
So when Regex was changed in .NET code to
var patternAnalyzer = new PatternAnalyzer
{
Pattern = @"([^\p{L}\d]+)|(?<=\D)(?=\d)|(?<=\d)(?=\D)|(?<=[\p{L}&&[^\p{Lu}]])(?=\p{Lu})|(?<=\p{Lu})(?=\p{Lu}[\p{L}&&[^\p{Lu}]])"
};
Everything started to work!