[DISTRO-673] Getting Error: flush failed for required journal Node (QJM) - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Bug
Affects Version/s: CDH4.4.0
Fix Version/s: None
Component/s: HDFS
Labels:
- performance

Description

Hi Guys,

Currently we are using cdh4.4.0 HA Enabled cluster, Now day's weekly once Namenode went down, We noticed bcoz of QJN's

2014-12-13 07:44:51,212 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [172.16.30.122:8485, 172.16.30.123:8485, 172.16.30.124:8485], stream=QuorumOutputStream starting at txid 1342968975))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.

QJ, nodes flush failed, After time out namenode went down,

In my case we are using three JournelNodes and nine ZooKeeper instance running,

What are step I did debug ::

1. Namenode went down time No Machine load and no memory related issue verified with monitoring tools.
2. Same time No logs written Journel nodes

Before went down, In my observation dfshealth page Journal Manager state threenode's txid are same[ If same txid means it's properly doing syncing ].

Please let me know any further info, I am happy to help you.

Please guide me, how to debug.

Attachments

Options
- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Attachments

NN_QJM.log
8 kB
16/Dec/14 1:24 AM

Activity

People

Assignee:

Unassigned

Reporter:

Dhanasekaran

Votes:

0 Vote for this issue

Watchers:

3 Start watching this issue

Dates

Created:

16/Dec/14 1:24 AM

Updated:

09/Sep/16 10:56 PM

Resolved:

09/Sep/16 10:56 PM