Affects Version/s: CDH4.4.0
Fix Version/s: CDH4.6.0
Environment:- CDH version: CDH 4.4.0-1.cdh4.4.0.p0.39
- The HA has configed for HDFS and JobTracker
- OS info:Linux version 2.6.32-358.el6.x86_64 (email@example.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) )
- Other info: Ulimit -n : open files (-n) 65535
When running a mapreduce job, TaskTracker request a job's file block (on .staging directory) to DataNode:50010. If request fail because "java.io.IOException: Got error for OP_READ_BLOCK" Occurs (reason is that replication of that block is InvalidateBlocks and has removed on the datanote that TaskTracker request to) and the TCP socket that TaskTracker using is not closed, it make to someday, on the Cloudera Manager WebUI has occurs Warning:
"Open file descriptors: xxxxxx. File descriptor limit : xxxxxx...."
I think the problem above is in DatanodeInfo blockSeekTo(long target) - line 503 of Class DFSInputStream
The connection TaskTracker using is BlockReader, it created on line 538 :
blockReader = getBlockReader(targetAddr, chosenNode, src, blk,
accessToken, offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
buffersize, verifyChecksum, dfsClient.clientName);
and if this connection fail, TaskTracker will request to other DataNode, and old Connection is not closed here.
I think need small code before the code above to closed old Connection, for ex:
if (blockReader != null)