Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-357

Flume collector example from Cloudera's UserGuide does not work as expected

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Won't Fix
    • Affects Version/s: CDH3u2
    • Fix Version/s: None
    • Component/s: Docs, Flume
    • Labels:
    • Environment:
      Ubuntu 10.04.3 LTS (Lucid Lynx), CentOS 5 (Cloudera's own demo VM) both running inside VirtualBox with 2GB RAM.

      Description

      The bit in the UserGuide that shows you how to setup a collector and write to it http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors has this configuration:

      host : console | agentSink("localhost",35853) ;
      collector : collectorSource(35853) | console ;

      I changed this to:

      dataSource : console | agentSink("localhost") ;
      dataCollector : collectorSource() | console ;

      I spawned the nodes as:

      flume node_nowatch -n dataSource
      flume node_nowatch -n dataCollector

      I have tried this on two systems:

      1. Cloudera's own demo VM running inside VirtualBox with 2GB RAM.
      It comes with Flume 0.9.4-cdh3u2

      2. Ubuntu LTS (Lucid) with the debian package and openJDK (minus any hadoop packages installed) as a VM running inside VirtualBox with 2GB RAM
      Followed the steps here https://ccp.cloudera.com/display/CDHDOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages

      Here is what I did:

      flume dump 'collectorSource()' leads to

      $ sudo netstat -anp | grep 35853
      tcp6 0 0 :::35853 :::* LISTEN 3520/java
      $ ps aux | grep java | grep 3520
      1000 3520 0.8 2.3 1050508 44676 pts/0 Sl+ 15:38 0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;

      My assumption is that:

      flume dump 'collectorSource()'

      is same as running the config:

      dump : collectorSource() | console ;

      and starting the node with

      flume node -1 -n dump -c "dump: collectorSource() | console;" -s

      `dataSource : console | agentSink("localhost")` leads to

      $ sudo netstat -anp | grep 35853
      tcp6 0 0 :::35853 :::* LISTEN 3520/java
      tcp6 0 0 127.0.0.1:44878 127.0.0.1:35853 ESTABLISHED 3593/java
      tcp6 0 0 127.0.0.1:35853 127.0.0.1:44878 ESTABLISHED 3520/java

      $ ps aux | grep java | grep 3593
      1000 3593 1.2 3.0 1130956 57644 pts/1 Sl+ 15:41 0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource

      The observed behaviour *is exactly the same in both* the VirtualBox VMs:

      Un-ending flow of this at *dataSource*

      2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
      durability.NaiveFileWALManager: File lives in
      /tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034
      2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
      hdfs.SeqfileEventSink: constructed new seqfile event sink:
      file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
      2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO
      durability.NaiveFileWALManager: opening log file
      20111215-152748172-0500.1116926245855.00000034
      2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO
      endtoend.AckListener$Empty: Empty Ack Listener began
      20111215-152758253-0500.1127006668855.00000034
      2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO
      agent.WALAckManager: Ack for
      20111215-152748172-0500.1116926245855.00000034 is queued to be checked
      2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO
      durability.WALSource: end of file NaiveFileWALManager
      (dir=/tmp/flume-cloudera/agent/dataSource )
      2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager:
      Retransmitting 20111215-152657736-0500.1066489868855.00000034 after
      being stale for 60048ms
      2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO
      durability.NaiveFileWALManager: opening log file
      20111215-152657736-0500.1066489868855.00000034
      2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
      agent.WALAckManager: Ack for
      20111215-152657736-0500.1066489868855.00000034 is queued to be checked
      2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
      durability.WALSource: end of file NaiveFileWALManager
      (dir=/tmp/flume-cloudera/agent/dataSource )
      2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
      hdfs.SeqfileEventSink: closed
      /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
      2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
      endtoend.AckListener$Empty: Empty Ack Listener ended
      20111215-152758253-0500.1127006668855.00000034

      2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
      durability.NaiveFileWALManager: File lives in
      /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
      2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
      hdfs.SeqfileEventSink: constructed new seqfile event sink:
      file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
      2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO
      durability.NaiveFileWALManager: opening log file
      20111215-152758253-0500.1127006668855.00000034
      2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO
      endtoend.AckListener$Empty: Empty Ack Listener began
      20111215-152808335-0500.1137089135855.00000034
      2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
      agent.WALAckManager: Ack for
      20111215-152758253-0500.1127006668855.00000034 is queued to be checked
      2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
      durability.WALSource: end of file NaiveFileWALManager
      (dir=/tmp/flume-cloudera/agent/dataSource )
      2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
      hdfs.SeqfileEventSink: closed
      /tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
      2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
      endtoend.AckListener$Empty: Empty Ack Listener ended
      20111215-152808335-0500.1137089135855.00000034

      ..

      2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager:
      Retransmitting 20111215-152707823-0500.1076576334855.00000034 after
      being stale for 60277ms
      2011-12-15 15:35:24,763 [Heartbeat] INFO
      durability.NaiveFileWALManager: Attempt to retry chunk
      '20111215-152707823-0500.1076576334855.00000034' in LOGGED state.
      There is no need for state transition.

      Un-ending flow of this at *dataCollector*:

      localhost [INFO Thu Dec 15 15:31:09 EST 2011]

      { AckChecksum : (long)1323981059821 (string) ' 4Ck��' (double)6.54133557402E-312 }

      { AckTag : 20111215-153059819-0500.1308572847855.00000034 }

      { AckType : end }

      How do I get the console <-> console communication via collectors working again correctly?

        Attachments

          Activity

            People

            • Assignee:
              paul Paul Battaglia
              Reporter:
              newtoflume Henry Larson
            • Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: