Uploaded image for project: 'Flume (READ-ONLY)'
  1. Flume (READ-ONLY)
  2. FLUME-540

Large number of duplicates with agentE2ESink due to old log files

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v0.9.3
    • Fix Version/s: v0.9.4
    • Component/s: Sinks+Sources
    • Labels:
      None
    • Environment:
      Ubuntu 8.04

      Description

      I have a problem where about 1/3rd of my events are duplicates. I have a 3 master/3 collector configuration with an agent syslogTcp source -> agentE2EChain sink and a collectorSink to S3.

      My config looks like this (only with about 60 more agent nodes, all identically configured):

      log1 : collectorSource | collectorSink("s3n://bucket/aarontest/dt=%Y-%m-%d","ue",3600000);
      log2 : collectorSource | collectorSink("s3n://bucket/aarontest/dt=%Y-%m-%d","ue",3600000);
      log3 : collectorSource | collectorSink("s3n://bucket/aarontest/dt=%Y-%m-%d","ue",3600000);
      node1 : syslogTcp(5140) | agentE2EChain("log1","log2","log3");
      node2 : syslogTcp(5140) | agentE2EChain("log1","log2","log3");
      node3 : syslogTcp(5140) | agentE2EChain("log1","log2","log3");

      One day, out of 6.5m events, 2.5m of them were duplicates. As you can see from my config above, the roll time is set to 1 hour
      and my flume.agent.logdir.retransmit value is set to 8 hours (28800000ms).

      I understand that w/ E2E there is a possibility of duplication, but this seems a bit excessive. The problem does not occur with DFO chains.

      There is also a thread on this topic at https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/78af6c9cff03c42c#

      I am attempting to determine when the retransmits happen, but it is proving somewhat difficult due to the large number of events.

        Attachments

          Activity

            People

            • Assignee:
              jon Jonathan Hsieh
              Reporter:
              aaronbbrown Aaron Brown
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: