I'm implementing a recommender system running on a Ubuntu 12.4 Server using Titan Rexster (titan-server-0.4.4.zip) with the Elasticsearch backend. In order to connect to the Rexster Server I use the Bulbflow library for python.
Beta seemed to run fine for 3 weeks, but with the load "increasing" (only a couple of users ~10) the Rexster server stopped responding. I don't know whether my rexster configuration is wrong or I don't use the Bulbflow library correctly.
Here is my rexster-cassandra-es.xml:
<?xml version="1.0" encoding="UTF-8"?>
<rexster>
<http>
<server-port>8182</server-port>
<server-host>0.0.0.0</server-host>
<base-uri>http://MY_IP</base-uri>
<web-root>public</web-root>
<character-set>UTF-8</character-set>
<enable-jmx>false</enable-jmx>
<enable-doghouse>true</enable-doghouse>
<max-post-size>2097152</max-post-size>
<max-header-size>8192</max-header-size>
<upload-timeout-millis>30000</upload-timeout-millis>
<thread-pool>
<worker>
<core-size>20</core-size>
<max-size>40</max-size>
</worker>
<kernal>
<core-size>10</core-size>
<max-size>20</max-size>
</kernal>
</thread-pool>
<io-strategy>leader-follower</io-strategy>
</http>
<rexpro>
<server-port>8184</server-port>
<server-host>0.0.0.0</server-host>
<session-max-idle>1790000</session-max-idle>
<session-check-interval>3000000</session-check-interval>
<connection-max-idle>180000</connection-max-idle>
<connection-check-interval>3000000</connection-check-interval>
<enable-jmx>false</enable-jmx>
<thread-pool>
<worker>
<core-size>8</core-size>
<max-size>8</max-size>
</worker>
<kernal>
<core-size>4</core-size>
<max-size>4</max-size>
</kernal>
</thread-pool>
<io-strategy>leader-follower</io-strategy>
</rexpro>
<shutdown-port>8183</shutdown-port>
<shutdown-host>127.0.0.1</shutdown-host>
<script-engines>
<script-engine>
<name>gremlin-groovy</name>
<reset-threshold>-1</reset-threshold>
<imports>com.tinkerpop.gremlin.*,com.tinkerpop.gremlin.java.*,com.tinkerpop.gremlin.pipes.filter.*,com.tinkerpop.gremlin.pipes.sideeffect.*,com.tinkerpop.gremlin.pipes.transform.*,com.tinkerpop.blueprints.*,com.tinkerpop.blueprints.impls.*,com.tinkerpop.blueprints.impls.tg.*,com.tinkerpop.blueprints.impls.neo4j.*,com.tinkerpop.blueprints.impls.neo4j.batch.*,com.tinkerpop.blueprints.impls.orient.*,com.tinkerpop.blueprints.impls.orient.batch.*,com.tinkerpop.blueprints.impls.dex.*,com.tinkerpop.blueprints.impls.rexster.*,com.tinkerpop.blueprints.impls.sail.*,com.tinkerpop.blueprints.impls.sail.impls.*,com.tinkerpop.blueprints.util.*,com.tinkerpop.blueprints.util.io.*,com.tinkerpop.blueprints.util.io.gml.*,com.tinkerpop.blueprints.util.io.graphml.*,com.tinkerpop.blueprints.util.io.graphson.*,com.tinkerpop.blueprints.util.wrappers.*,com.tinkerpop.blueprints.util.wrappers.batch.*,com.tinkerpop.blueprints.util.wrappers.batch.cache.*,com.tinkerpop.blueprints.util.wrappers.event.*,com.tinkerpop.blueprints.util.wrappers.event.listener.*,com.tinkerpop.blueprints.util.wrappers.id.*,com.tinkerpop.blueprints.util.wrappers.partition.*,com.tinkerpop.blueprints.util.wrappers.readonly.*,com.tinkerpop.blueprints.oupls.sail.*,com.tinkerpop.blueprints.oupls.sail.pg.*,com.tinkerpop.blueprints.oupls.jung.*,com.tinkerpop.pipes.*,com.tinkerpop.pipes.branch.*,com.tinkerpop.pipes.filter.*,com.tinkerpop.pipes.sideeffect.*,com.tinkerpop.pipes.transform.*,com.tinkerpop.pipes.util.*,com.tinkerpop.pipes.util.iterators.*,com.tinkerpop.pipes.util.structures.*,org.apache.commons.configuration.*,com.thinkaurelius.titan.core.*,com.thinkaurelius.titan.core.attribute.*,com.thinkaurelius.titan.core.util.*,com.thinkaurelius.titan.example.*,org.apache.commons.configuration.*,com.tinkerpop.gremlin.Tokens.T,com.tinkerpop.gremlin.groovy.*</imports>
<static-imports>com.tinkerpop.blueprints.Direction.*,com.tinkerpop.blueprints.TransactionalGraph$Conclusion.*,com.tinkerpop.blueprints.Compare.*,com.thinkaurelius.titan.core.attribute.Geo.*,com.thinkaurelius.titan.core.attribute.Text.*,com.thinkaurelius.titan.core.TypeMaker$UniquenessConsistency.*,com.tinkerpop.blueprints.Query$Compare.*</static-imports>
</script-engine>
</script-engines>
<security>
<authentication>
<type>none</type>
<configuration>
<users>
<user>
<username>rexster</username>
<password>rexster</password>
</user>
</users>
</configuration>
</authentication>
</security>
<metrics>
<reporter>
<type>jmx</type>
</reporter>
<reporter>
<type>http</type>
</reporter>
<reporter>
<type>console</type>
<properties>
<rates-time-unit>SECONDS</rates-time-unit>
<duration-time-unit>SECONDS</duration-time-unit>
<report-period>10</report-period>
<report-time-unit>MINUTES</report-time-unit>
<includes>http.rest.*</includes>
<excludes>http.rest.*.delete</excludes>
</properties>
</reporter>
</metrics>
<graphs>
<graph>
<graph-name>newspaper</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<!-- <graph-location>/tmp/titan</graph-location> -->
<graph-read-only>false</graph-read-only>
<properties>
<storage.backend>cassandra</storage.backend>
<storage.index.search.backend>elasticsearch</storage.index.search.backend>
<storage.index.search.hostname>localhost</storage.index.search.hostname>
<storage.index.search.client-only>true</storage.index.search.client-only>
<storage.index.search.local-mode>false</storage.index.search.local-mode>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
</graphs>
</rexster>
I have changed the core-size and max-size of the threadpool for the worker and kernal, without that change the Rexster Server would hang / not respond even quicker.
What are appropriate values for the core-size and max-size?
For the use of bulbflow I create a new Graph object every time I need to perform a request. There are a lot of requests, so those objects are created very often.
Should I really create a new Graph object for every new request?
Is it possible to only create one Graph object and use it whenever a new request is sent to the graph database or do I run into session issues?
When everything is stuck and I forcefully terminate the program (ctrl-c), I get the following stacktrace:
Exception happened during processing of request from ('my_ip', 57489)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 310, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 638, in __init__
self.handle()
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 200, in handle
rv = BaseHTTPRequestHandler.handle(self)
File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
self.handle_one_request()
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 235, in handle_one_request
return self.run_wsgi()
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 177, in run_wsgi
execute(self.server.app)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/werkzeug/serving.py", line 165, in execute
application_iter = app(environ, start_response)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/user/dir/recommender/project/api/start.py", line 65, in put_user
graphdb.insert_user(user_id)
File "project/api/graphdb.py", line 14, in insert_user
user_with_id = g.users.index.lookup(user_sqlid=user_id)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/titan/index.py", line 270, in lookup
resp = self.client.lookup_vertex(self.index_name,key,value)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/titan/client.py", line 348, in lookup_vertex
return self.request.get(path,params)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/rest.py", line 101, in get
return self.request(GET, path, params)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/bulbs/rest.py", line 184, in request
http_resp = self.http.request(uri, method, body, headers)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1593, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1335, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/user/dir/env/venv_python/local/lib/python2.7/site-packages/httplib2/__init__.py", line 1291, in _conn_request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 430, in readline
data = recv(1)
In order to recover, I have to shut down rexster / titan and restart it. Whenever I stop the Rexster server (./bin/titan -c cassandra-es stop) I receive the following output:
Killing Titan + Rexster (pid 26779)...
Rexster shutdown timeout exceeded (60 seconds)
Killing Cassandra (pid 26201)...
Rexster is completely stuck.
Looking forward to receiving some helpful guidance.
The following thread on the Titan mailing list could be useful to you: Rexster REST API stops responding. However, I don't think they ever managed to solve that issue for the Titan and Rexster developers couldn't reproduce it.
This being said, I strongly suggest upgrading to Titan v1.0.0 which uses TinkerPop 3.0+ Gremlin server instead of TinkerPop 2.x Rexster. You'll get less bugs, more features and, especially, much more expressive Gremlin queries (see TinkerPop 3.0.1 documentation used by Titan v1.0.0). Titan v0.4.4 is a very old release and I don't think it's worth the trouble fixing this specific issue, especially if you're new to graphs.