Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: v0.9.1
-
Component/s: Sinks+Sources
-
Labels:None
-
Environment:Linux (ubuntu jaunty, gentoo)
Description
A race condition can cause TailSource to reset to the beginning of the input file incorrectly.
to test the issue i used this script to generate a log file:
#!/usr/bin/env python
import time
delay = 0.023
count = 0
filler =
"12345678901234567890123456789012345678901234567890123456789012334567890"
while 1:
count += 1
print "%s delay=%f %s" % (count,delay,filler)
time.sleep(delay)
this outputs lines like:
1 delay=0.023000
12345678901234567890123456789012345678901234567890123456789012334567890
2 delay=0.023000
12345678901234567890123456789012345678901234567890123456789012334567890
3 delay=0.023000
12345678901234567890123456789012345678901234567890123456789012334567890
i tried several different flume configurations and they all exhibited
the same behavior. this is the simplest:
nodeA default-flow tail("/tmp/test_src.log","true")
text("/tmp/test_dst.log","raw")
when delay is > 0.022 the src and dst files are exactly the same.
when delay <= 0.022 the dst file suddenly deviates dramatically from
the src. digging around in the dst file reveals that flume is re-
reading from the beginning of the src file repeatedly.