Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-306

Crunch dataset target fails with CrunchRuntimeException: Path already exists

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: None
    • Component/s: Data Module
    • Labels:
      None

      Description

      Using CrunchDatasets.asTarget(Dataset) with a partitioned dataset and feeding a leaf/partitioned dataset into CrunchDatasets produces this error.

      org.apache.crunch.CrunchRuntimeException: Path already exists: hdfs://localhost:58425/kite/%2Fsource%3Aint64%2Ftype_0%3Aint64%2Fpayload%3Aint64/source=source_0/batch=2014_02_04_14_25_12
      	at org.apache.crunch.io.impl.FileTargetImpl.handleExisting(FileTargetImpl.java:257)
      	at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:212)
      	at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:200)
              ...
      

      The dataset also uses avro data. Seems like the gist behind the error is when creating a partitioned dataset[1] the partition directory is created. This directory is used as the target directory for Crunch but the file target expects this directory to not exist. I have to create the dataset here as well since not specifying the autoCreate boolean returns null.

      There is likely other combinations where this bug exists (i.e. parquet data, non-partitioned datasets).

      Also noticed that method is deprecated in master but not on the interface just in the implementation.

      [1] - https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/filesystem/FileSystemDataset.java#L181-L225

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tom Tom White
                Reporter:
                bbaugher Bryan Baugher
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: