Details
Description
Hue does not appear to be setting a timeout when it attempts to get a connection out of the thrift connection pool. This is exhibited by stack traces that look like this:
Thread CP WSGIServer Thread-4 140060258342656 (most recent call last): File "/usr/lib64/python2.6/threading.py", line 504, in __bootstrap self.__bootstrap_inner() File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner self.run() File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/desktop/core/src/desktop/lib/wsgiserver.py", line 1294, in run conn.communicate() File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/desktop/core/src/desktop/lib/wsgiserver.py", line 1196, in communicate req.respond() File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/desktop/core/src/desktop/lib/wsgiserver.py", line 568, in respond self._respond() File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/desktop/core/src/desktop/lib/wsgiserver.py", line 580, in _respond response = self.wsgi_app(self.environ, self.start_response) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/wsgi.py", line 206, in __call__ response = self.get_response(request) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py", line 112, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py", line 371, in inner return func(*args, **kwargs) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/views.py", line 592, in install_examples beeswax.management.commands.beeswax_install_examples.Command().handle(app_name=app_name, user=request.user) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/management/commands/beeswax_install_examples.py", line 68, in handle self._install_tables(user, app_name, tables) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/management/commands/beeswax_install_examples.py", line 96, in _install_tables table.install(django_user) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/management/commands/beeswax_install_examples.py", line 135, in install if self.create(django_user): File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/management/commands/beeswax_install_examples.py", line 156, in create results = db.execute_and_wait(query) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/server/dbms.py", line 479, in execute_and_wait handle = self.client.query(query) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 863, in query return self._client.execute_async_query(query, statement) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 650, in execute_async_query return self.execute_async_statement(statement=query_statement, confOverlay=configuration) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py", line 668, in execute_async_statement res = self.call(self._client.ExecuteStatement, req) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/desktop/core/src/desktop/lib/thrift_util.py", line 320, in __getattr__ superclient = _connection_pool.get_client(self.conf) File "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.969/lib/hue/desktop/core/src/desktop/lib/thrift_util.py", line 208, in get_client block=True, timeout=this_round_timeout) File "/usr/lib64/python2.6/Queue.py", line 168, in get self.not_empty.wait() File "/usr/lib64/python2.6/threading.py", line 239, in wait waiter.acquire()
After some debugging, we found that while the thrift_util.get_client code supports a timeout when fetching a connection from the thread pool, it's not actually specified in this stack trace. It appears that this was accidentally removed way back in 2010 with this patch:
- superclient = _connection_pool.get_client(self.klass, self.host, self.port, - kerberos_principal=self.kerberos_principal, - get_client_timeout=self.timeout_seconds, - service_name=self.service_name) + superclient = _connection_pool.get_client(self.conf)
I propose we add back the get_client_timeout. This may introduce some risk since this code has been in production for 5 years, but the default timeout_seconds is 2 minutes, so it seems like a pretty reasonable time to wait.
Attachments
Issue Links
- relates to
-
HUE-2526 Deadlock possible if there are too many concurrent hive/impala requests
- Resolved