Uploaded image for project: 'Flume (READ-ONLY)'
  1. Flume (READ-ONLY)
  2. FLUME-503

Use HDFS sync API instead of rolling for durability

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Some versions of Hadoop (CDH3>b2 or 0.20-append branch) support a sync() API that guarantees data has been flushed to all of the nodes in the write pipeline. This should be equally as durable as closing an HDFS file.

      Flume should allow the use of sync() to make data durable on a regular basis without having to create lots of tiny files on HDFS.

      Related is the ability to use the getNumCurrentReplicas() API to detect when the number of replicas falls below the desired replication factor, and roll at that point (to pick up a new DN)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jon Jonathan Hsieh
                Reporter:
                todd Todd Lipcon
              • Votes:
                2 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: