Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: search-1.1.0
-
Fix Version/s: None
-
Component/s: ZooKeeper
-
Labels:None
-
Environment:CDH4.5 - 2 masters + 4 slaves
SOLR 1.1.0-1.cdh4.3.0.p0.21
Solr is replicated on 4 machines but no sharding is used.
Description
We got 2 Solr servers (over 4) stopping to index new data. Even if they were still responding to queries (with old data) they were reported as down on the solr cloud ui.
Looking at the errors in the logs it looks similar to this bug report:
https://issues.apache.org/jira/browse/SOLR-5325
If I increase zookeeper 'syncLimit' parameter, would it help in that case?
Do you have any roadmap about moving to new releases of solr with the fix?
Here is the stack trace on one of the slaves (the other slaves have the same kind of logs but a bit more of all of them):
3:29:08.967 PM WARN org.apache.zookeeper.ClientCnxn
Session 0x244cfc074c8f0ce for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
3:29:21.134 PM WARN org.apache.hadoop.ipc.Client
Address change detected. Old: hadoopmaster01.prod.ps/172.16.0.18:8020 New: hadoopmaster01.prod.ps:8020
3:29:21.135 PM ERROR org.apache.solr.common.cloud.DefaultConnectionStrategy
Reconnect to ZooKeeper failed:java.net.UnknownHostException: hadoopmaster01.prod.ps
at java.net.InetAddress.getAllByName0(InetAddress.java:1157)
at java.net.InetAddress.getAllByName(InetAddress.java:1083)
at java.net.InetAddress.getAllByName(InetAddress.java:1019)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
at org.apache.solr.common.cloud.SolrZooKeeper.<init>(SolrZooKeeper.java:40)
at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
3:29:21.135 PM WARN org.apache.hadoop.io.retry.RetryInvocationHandler
Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB. Trying to fail over immediately.
3:29:21.137 PM WARN org.apache.hadoop.io.retry.RetryInvocationHandler
Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB after 1 fail over attempts. Trying to fail over after sleeping for 1253ms.
9:55:31.497 AM ERROR org.apache.solr.servlet.SolrDispatchFilter
null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:252)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:249)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:249)
at org.apache.solr.common.cloud.ZkStateReader.updateAliases(ZkStateReader.java:556)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:169)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.solr.servlet.ProxyUserFilter.doFilter(ProxyUserFilter.java:241)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.solr.servlet.SolrHadoopAuthenticationFilter$2.doFilter(SolrHadoopAuthenticationFilter.java:140)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:384)
at org.apache.solr.servlet.SolrHadoopAuthenticationFilter.doFilter(SolrHadoopAuthenticationFilter.java:145)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.solr.servlet.HostnameFilter.doFilter(HostnameFilter.java:86)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
9:55:31.511 AM ERROR org.apache.solr.servlet.SolrDispatchFilter
null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /aliases.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)