Uploaded image for project: 'RecordService (READ-ONLY)'
  1. RecordService (READ-ONLY)
  2. RS-77

RecordService Plan requests timeout due to slow response from HDFS on getFileBlockStorageLocations()

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.1.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We have seen very slow response times for getBlockStorageLocations (on the order of many minutes), even for smallish data sets.

       at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
      	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:426)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:204)
      	at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:289)
      	at org.apache.hadoop.hdfs.BlockStorageLocationUtil.queryDatanodesForHdfsBlocksMetadata(BlockStorageLocationUtil.java:145)
      	at org.apache.hadoop.hdfs.DFSClient.getBlockStorageLocations(DFSClient.java:1311)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockStorageLocations(DistributedFileSystem.java:257)
      

      I also see corresponding threads that look like they are waiting on HDFS connections?

       - waiting to lock <0x000000058d4d6db8> (a org.apache.hadoop.ipc.Client$Connection)
      	at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:368)
      	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1515)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1438)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
      	at com.sun.proxy.$Proxy14.getHdfsBlockLocations(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getHdfsBlocksMetadata(ClientDatanodeProtocolTranslatorPB.java:260)
      	at org.apache.hadoop.hdfs.BlockStorageLocationUtil$VolumeBlockLocationCallable.call(BlockStorageLocationUtil.java:348)
      	at org.apache.hadoop.hdfs.BlockStorageLocationUtil$VolumeBlockLocationCallable.call(BlockStorageLocationUtil.java:312)
      

      There is one thread that looks like it is doing some work (has locked the connection and is try to read, but is taking much longer than the 10s default timeout.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              lskuff Lenni Kuff
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: