Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Major
-
Resolution: Incomplete
-
Affects Version/s: 4.1.0, 4.2.0, 4.3.0, 4.4.0, 5.0.0
-
Fix Version/s: None
-
Component/s: app.editor, con.impala
-
Labels:None
-
Environment:
Impala F5 best practices document
http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf
recommends" Persistence Type: Source Address Affinity "
- see page 20 of that pdf.
We have most of Impala queries running from the same one server (a BI application) + Hue application.
So F5 Load balancer "balances" all Impala connections to the same one-two imapald backend, because all (or 99.9%) of connections are coming from that main application (one IP address) and Hue (another application).
This kills the purpose of balancing load of Impala query coordinators.It seems that the only reason Cloudera recommends source IP address session persistence is due to some Hue requirement:
From http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf
Hue requires persistent (or “sticky”) sessions, meaning its requests need to be serviced by the same node when possible. So in addition to the default pool, you also configure a Persistence Profile for this Virtual Server. Without persistent sessions, Hue can be disconnected from long-running queries.
Can somebody please explain why is this required for Hue, and is there is any way we can relax this requirement? Are there any tunable parameters in Hue/Impala - like required session persistence timeout etc.
It would be nice to disable session persistence on Load Balancer at all, so all impalads would be balanced across the cluster.
Any ideas how we can improve this?
Impala F5 best practices document http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf recommends " Persistence Type: Source Address Affinity " see page 20 of that pdf. We have most of Impala queries running from the same one server (a BI application) + Hue application. So F5 Load balancer "balances" all Impala connections to the same one-two imapald backend, because all (or 99.9%) of connections are coming from that main application (one IP address) and Hue (another application). This kills the purpose of balancing load of Impala query coordinators. It seems that the only reason Cloudera recommends source IP address session persistence is due to some Hue requirement: From http://www.cloudera.com/documentation/other/reference-architecture/PDF/Impala-HA-with-F5-BIG-IP.pdf Hue requires persistent (or “sticky”) sessions, meaning its requests need to be serviced by the same node when possible. So in addition to the default pool, you also configure a Persistence Profile for this Virtual Server. Without persistent sessions, Hue can be disconnected from long-running queries. Can somebody please explain why is this required for Hue, and is there is any way we can relax this requirement? Are there any tunable parameters in Hue/Impala - like required session persistence timeout etc. It would be nice to disable session persistence on Load Balancer at all, so all impalads would be balanced across the cluster. Any ideas how we can improve this?
Attachments
Issue Links
- links to