Details
-
Type:
Bug
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 0.1.0
-
Fix Version/s: None
-
Component/s: None
-
Labels:None
Description
We have seen very slow response times for getBlockStorageLocations (on the order of many minutes), even for smallish data sets.
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:426) at java.util.concurrent.FutureTask.get(FutureTask.java:204) at java.util.concurrent.AbstractExecutorService.invokeAll(AbstractExecutorService.java:289) at org.apache.hadoop.hdfs.BlockStorageLocationUtil.queryDatanodesForHdfsBlocksMetadata(BlockStorageLocationUtil.java:145) at org.apache.hadoop.hdfs.DFSClient.getBlockStorageLocations(DFSClient.java:1311) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockStorageLocations(DistributedFileSystem.java:257)
I also see corresponding threads that look like they are waiting on HDFS connections?
- waiting to lock <0x000000058d4d6db8> (a org.apache.hadoop.ipc.Client$Connection) at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1438) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy14.getHdfsBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getHdfsBlocksMetadata(ClientDatanodeProtocolTranslatorPB.java:260) at org.apache.hadoop.hdfs.BlockStorageLocationUtil$VolumeBlockLocationCallable.call(BlockStorageLocationUtil.java:348) at org.apache.hadoop.hdfs.BlockStorageLocationUtil$VolumeBlockLocationCallable.call(BlockStorageLocationUtil.java:312)
There is one thread that looks like it is doing some work (has locked the connection and is try to read, but is taking much longer than the 10s default timeout.