[DISTRO-290] DataBlockScanner spewing log messages for blocks not found - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: CDH2u3
Fix Version/s: CDH4.0.0
Component/s: HDFS
Labels:
None

Description

Bad data node block verification message filling up data node logs when DataBlockScanner can't find a block ID:
INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification failed for blk_xx_xx. Its ok since it not in datanode dataset anymore.

On 4/17, this cluster experienced a name node outage. The root volume had some I/O errors severe enough to require a reboot. Unfortunately, we were temporarily running in a configuration that wasn't dual-writing edits to an NFS mount. We wound up with a corrupt edits file on the root volume, and we had to restore from the secondary name node's snapshot, which is up to 5 minutes old. The bottom line is that we lost all inodes that were created after that last snapshot.

That lines up pretty well on a 3-week boundary, so maybe what we're seeing is something like:

1. New block gets created on a data node/gets enqueued for block verification 3 weeks later.
2. Name node dies.
3. Name node recovers from a stale snapshot, so a few inodes are lost, including the inode corresponding to the block in step 1.
4. Data node doesn't know that name node lost track of this block, so 3 weeks later, it tries to verify it.
5. Error handling logic doesn't quite handle this edge case, so data node freaks out.

BTW, we're back to dual-writing edits to an NFS mount now, after correction of some issues with the NFS servers in our infrastructure.

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Kathleen Ting

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

05/Aug/11 8:32 PM

Updated:

09/Sep/16 10:36 PM

Resolved:

09/Sep/16 10:36 PM