Details
Description
We hit an issue with HBase BucketCache in CDH 5.1. Quite a few of our objects are too large to fit into the BucketCache, producing log entries like:
2014-08-28 00:04:03,721 WARN [main-BucketCacheWriter-1] bucket.BucketCache: Failed allocating for block 02d4446744484285b8421af85c9ad3cd_13289593183 org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException: Allocation too big size=552293 at org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.allocateBlock(BucketAllocator.java:400) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$RAMQueueEntry.writeToCache(BucketCache.java:1156) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.doDrain(BucketCache.java:706) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$WriterThread.run(BucketCache.java:678) at java.lang.Thread.run(Thread.java:744)
After the regionserver has been up for around half an hour, it OOMs. The heap graph shows a steady increase, starting at the time the regions with the oversized objects were opened on the regionserver.
Some digging in a heap dump showed that the BucketCache ramCache is the culprit. In CDH 5.1, if an exception is thrown in BucketAllocator.allocateBlock, the ramCache entry is never removed.
https://issues.apache.org/jira/browse/HBASE-11678 appears to be the upstream patch for this issue, and has landed in trunk. Can this be backported for the next CDH 5.1 release please?