I have installed elastic search version 6.2.3 over docker.
I face the following error when trying to install the following elasticsearch plugin
org.wikimedia.search:extra
Exception in thread "main" java.lang.IllegalArgumentException: plugin [extra] is incompatible with version [6.2.3]; was designed for version [5.5.2]
I tried to install the plugin using the following command:
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install org.wikimedia.search:extra:5.5.2.3
I was trying to install this plugin to load wikipedia dictionary into elasticsearch but the latest version for the plugin is 5.5.2
Two years ago wikimedia has made available dumps of the production elasticsearch indices. So to load wikipedia, also wiktionary, into elastic is now very simple
The indices are exported every week and for each wiki there are two exports.
The content index, which contains only article pages, called content;
The general index, containing all pages. This includes talk pages, templates, etc, called general;
you can find them here http://dumps.wikimedia.org/other/cirrussearch/current/
create a mapping according your needs. For example:
{
"mappings": {
"page": {
"properties": {
"auxiliary_text": {
"type": "text"
},
"category": {
"type": "text"
},
"coordinates": {
"properties": {
"coord": {
"properties": {
"lat": {
"type": "double"
},
"lon": {
"type": "double"
}
}
},
"country": {
"type": "text"
},
"dim": {
"type": "long"
},
"globe": {
"type": "text"
},
"name": {
"type": "text"
},
"primary": {
"type": "boolean"
},
"region": {
"type": "text"
},
"type": {
"type": "text"
}
}
},
"defaultsort": {
"type": "boolean"
},
"external_link": {
"type": "text"
},
"heading": {
"type": "text"
},
"incoming_links": {
"type": "long"
},
"language": {
"type": "text"
},
"namespace": {
"type": "long"
},
"namespace_text": {
"type": "text"
},
"opening_text": {
"type": "text"
},
"outgoing_link": {
"type": "text"
},
"popularity_score": {
"type": "double"
},
"redirect": {
"properties": {
"namespace": {
"type": "long"
},
"title": {
"type": "text"
}
}
},
"score": {
"type": "double"
},
"source_text": {
"type": "text"
},
"template": {
"type": "text"
},
"text": {
"type": "text"
},
"text_bytes": {
"type": "long"
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"title": {
"type": "text"
},
"version": {
"type": "long"
},
"version_type": {
"type": "text"
},
"wiki": {
"type": "text"
},
"wikibase_item": {
"type": "text"
}
}
}
}
}
once you have created the index you just type:
zcat enwiki-current-cirrussearch-general.json.gz | parallel --pipe -L 2 -N 2000 -j3 'curl -s http://localhost:9200/enwiki/_bulk --data-binary @- > /dev/null'
Enjoy!