Uploaded image for project: 'Flume (READ-ONLY)'
  1. Flume (READ-ONLY)
  2. FLUME-160

Event.TAG_REGEX does not match necessary special characters

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: v0.9.0, v0.9.1
    • Fix Version/s: v0.9.1, v0.9.2
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      I tried to use output bucketing based on the scribe category by specifying
      collectorSink("hdfs://localhost:9000/somepath/%

      {scribe.category}", "somefile-")
      as the sink.

      However, %{scribe.category}

      does not get replaced, and shows up literally in the path name.

      After some poking around, it turns out that the regular expression used to match tags is too restricted in what it matches:
      final public static String TAG_REGEX = "\\%(\\w|\\%)|\\%

      {(\\w+)\\}

      ";

      The \w character class is equivalent to [a-zA-Z0-9_], so it will never match tags including a dot.

      The regex should be expanded to match dots, and possibly also underscores. Maybe even any character that is not a closing curly brackets:
      final public static String TAG_REGEX = "\\%(\\w|\\%)|\\%

      {([\\w\\.-]+)\\}

      ";
      or
      final public static String TAG_REGEX = "\\%(\\w|\\%)|\\%

      {([^\\}

      ]+)
      }";

      It could be even more elaborate (e.g. it could allow single or double quotes so the tags themselves could contain curly brackets), but I guess it's a much better idea to just keep things reasonable

        Attachments

          Activity

            People

            • Assignee:
              dzuelke David Zuelke
              Reporter:
              dzuelke David Zuelke
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: