Uploaded image for project: 'RecordService (READ-ONLY)'
  1. RecordService (READ-ONLY)
  2. RS-138

Set rpc timeout before sending get protocol version request

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.3.0
    • Component/s: None
    • Labels:
      None

      Description

      Got the following error when running 8 concurrent mr jobs on a parquet table:

      2016-03-17 10:15:02,130 INFO [main] com.cloudera.recordservice.core.ThriftUtils: Connecting to RecordServiceWorker at vd0220.halxg.cloudera.com:13050, with timeout: 10000ms
      2016-03-17 10:15:02,130 INFO [main] com.cloudera.recordservice.core.ThriftUtils: Connected to RecordServiceWorker at vd0220.halxg.cloudera.com:13050
      2016-03-17 10:15:12,141 WARN [main] com.cloudera.recordservice.core.RecordServiceWorkerClient: Could not get service protocol version from RecordServiceWorker at vd0220.halxg.cloudera.com:13050. com.cloudera.recordservice.shade.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
      2016-03-17 10:15:12,213 INFO [main] com.cloudera.recordservice.core.RecordServiceWorkerClient: Closing RecordServiceWorker task: TUniqueId(hi:8666625902207997972, lo:8192009521421407619)
      2016-03-17 10:15:16,168 INFO [main] com.cloudera.recordservice.core.RecordServiceWorkerClient: Closing RecordServiceWorker connection.
      2016-03-17 10:15:16,168 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
      2016-03-17 10:15:16,182 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
      2016-03-17 10:15:16,239 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:impala (auth:SIMPLE) cause:java.io.IOException: Could not get service protocol version from RecordServiceWorker at vd0220.halxg.cloudera.com:13050. 
      2016-03-17 10:15:16,240 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Could not get service protocol version from RecordServiceWorker at vd0220.halxg.cloudera.com:13050. 
      	at com.cloudera.recordservice.core.RecordServiceWorkerClient.connect(RecordServiceWorkerClient.java:482)
      	at com.cloudera.recordservice.core.RecordServiceWorkerClient.access$1200(RecordServiceWorkerClient.java:45)
      	at com.cloudera.recordservice.core.RecordServiceWorkerClient$Builder.connect(RecordServiceWorkerClient.java:234)
      	at com.cloudera.recordservice.mr.RecordReaderCore.<init>(RecordReaderCore.java:68)
      	at com.cloudera.recordservice.mapreduce.RecordServiceInputFormatBase$RecordReaderBase.initialize(RecordServiceInputFormatBase.java:94)
      	at com.cloudera.recordservice.mapreduce.RecordServiceInputFormat$RecordServiceRecordReader.initialize(RecordServiceInputFormat.java:107)
      	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Caused by: com.cloudera.recordservice.shade.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
      	at com.cloudera.recordservice.shade.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
      	at com.cloudera.recordservice.shade.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
      	at com.cloudera.recordservice.shade.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
      	at com.cloudera.recordservice.shade.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
      	at com.cloudera.recordservice.shade.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
      	at com.cloudera.recordservice.shade.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
      	at com.cloudera.recordservice.thrift.RecordServiceWorker$Client.recv_GetProtocolVersion(RecordServiceWorker.java:103)
      	at com.cloudera.recordservice.thrift.RecordServiceWorker$Client.GetProtocolVersion(RecordServiceWorker.java:91)
      	at com.cloudera.recordservice.core.RecordServiceWorkerClient.connect(RecordServiceWorkerClient.java:441)
      	... 13 more
      Caused by: java.net.SocketTimeoutException: Read timed out
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.read(SocketInputStream.java:152)
      	at java.net.SocketInputStream.read(SocketInputStream.java:122)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      	at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
      	at com.cloudera.recordservice.shade.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
      	... 21 more
      

      Fix this issue via either increasing the recordservice.worker.connection.timeoutMs to 60 sec or setting rpc timeout before sending get protocol version request.
      The reason why these changes work is that before setting rpc timeout, it uses the connection timeout as the rpc timeout. While gettting protocol version is also a rpc, so we should set the rpc timeout before this request as well.

      Besides, when getting Read timeout error, we should add more useful suggestions in the log, eg. ask users to increase rpc timeout.

        Attachments

          Activity

            People

            • Assignee:
              lilicn Li Li
              Reporter:
              lilicn Li Li
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: