
Is it possible to set new field value when analyzing document being indexed in Elasticsearch?

For example:

  1. when indexing one document into elasticsearch;
  2. i want to analyze a field named description in the document by uax_url_email tokenizer/analyzer;
  3. if description does have any url, put the url into another field named urls array;
  4. finish index this document;

Now i can check whether field urls is empty to know whether description has any url.

Is this possible? Or does analyzer only contributes to the inverted index, not other fields?


  • You can use Ingest Pipeline Script processor with painless script. I hope this will help you.

    POST _ingest/pipeline/_simulate?verbose
      "pipeline": {
        "processors": [
            "script": {
              "description": "Extract 'tags' from 'env' field",
              "lang": "painless",
              "source": """
                def m = /(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])/.matcher(ctx["content"]);
                ArrayList urls = new ArrayList();
                ctx['urls'] = urls;
              "params": {
                "delimiter": "-",
                "position": 1
      "docs": [
          "_source": {
            "content": "My name is Sagar patel and i visit https://apple.com and https://google.com"

    Above Pipeline will generate result like below:

      "docs": [
          "processor_results": [
              "processor_type": "script",
              "status": "success",
              "description": "Extract 'tags' from 'env' field",
              "doc": {
                "_index": "_index",
                "_id": "_id",
                "_source": {
                  "urls": [
                  "content": "My name is Sagar patel and i visit https://apple.com and https://google.com"
                "_ingest": {
                  "pipeline": "_simulate_pipeline",
                  "timestamp": "2022-07-13T12:45:00.3655307Z"