pythonelasticsearchpyes

Elasticsearch clients for python, no solution


I am having a very bad week having chosen elasticsearch with graylog2. I am trying to run queries against the data in ES using Python.

I have tried following clients.

  1. ESClient - Very weird results, I think its not maintained, query_body has no effect it returns all the results.
  2. Pyes - Unreadable, undocumented. I have browsed sources and cant figure out how to run a simple query, maybe i am just not that smart. I can even run base queries in json format and then simply use the Python object/iterators to do my analysis on the results. But Pyes does not make it easy.
  3. Elasticutils - Another documented, but without a complete sample. I get the following error with code attached. I don't even know how it uses this S() to connect to the right host?

    es = get_es(hosts=HOST, default_indexes=[INDEX])

    basic_s = S().indexes(INDEX).doctypes(DOCTYPE).values_dict()

results:

 print basic_s.query(message__text="login/delete")
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 223, in __repr__
    data = list(self)[:REPR_OUTPUT_SIZE + 1]
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 623, in __iter__
    return iter(self._do_search())
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 573, in _do_search
    hits = self.raw()
  File "/usr/lib/python2.7/site-packages/elasticutils/__init__.py", line 615, in raw
    hits = es.search(qs, self.get_indexes(), self.get_doctypes())
  File "/usr/lib/python2.7/site-packages/pyes/es.py", line 841, in search
    return self._query_call("_search", body, indexes, doc_types, **query_params)
  File "/usr/lib/python2.7/site-packages/pyes/es.py", line 251, in _query_call
    response = self._send_request('GET', path, body, querystring_args)
  File "/usr/lib/python2.7/site-packages/pyes/es.py", line 208, in _send_request
    response = self.connection.execute(request)
  File "/usr/lib/python2.7/site-packages/pyes/connection_http.py", line 167, in _client_call
    return getattr(conn.client, attr)(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/pyes/connection_http.py", line 59, in execute
    response = self.client.urlopen(Method._VALUES_TO_NAMES[request.method], uri, body=request.body, headers=request.headers)
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 294, in urlopen
    return self.urlopen(method, url, body, headers, retries-1, redirect) # Try again
  File "/usr/lib/python2.7/site-packages/pyes/urllib3/connectionpool.py", line 255, in urlopen
    raise MaxRetryError("Max retries exceeded for url: %s" % url)
pyes.urllib3.connectionpool.MaxRetryError: Max retries exceeded for url: /graylog2/message/_search

I wish the devs of this good projects would provide some complete examples. Even looking at sources I am t a complete loss.

Is there any solution, help out there for me with elasticsearch and python or should I just drop all of this and pay for a nice splunk account and end this misery.

I am proceeding with using curl, download the entire json result and json load it. Hope that works, though curl downloading 1 million messages from elasticsearch may not just happen.


Solution

  • Honestly, I've had the most luck with just CURLing everything. ES has so many different methods, filters, and queries that various "wrappers" have a hard time recreating all the functionality. In my view, it is similar to using an ORM for databases...what you gain in ease of use you lose in flexibility/raw power.

    Except most of the wrappers for ES aren't really that easy to use.

    I'd give CURL a try for a while and see how that treats you. You can use external JSON formatters to check your JSON, the mailing list to look for examples and the docs are ok if you use JSON.