Uploaded image for project: 'Kite SDK (READ-ONLY)'
  1. Kite SDK (READ-ONLY)
  2. KITE-1155

Deleting an already deleted empty path should not fail the job

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Data Module
    • Labels:
      None

      Description

      In the final phase of generating dataset, the MR framework calls DatasetKeyOutputFormat.MergeOutputCommitter.commitJob() which would clean the temporary path. In some scenarios that multiple applications are running on different paths but with same namespace, such as:
      hdfs://ns/path/dataset1
      hdfs://ns/path/dataset2

      The cleanlyDelete method [1] could throw a FileNotFoundException while deleting an already cleaned empty path, which fails the whole job:

      17/03/19 08:21:21 INFO mapreduce.Job: Job job_1488289274600_188649 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Could not cleanly delete path:hdfs://ns/path/.temp/job_1488289274600_188649
      at org.kitesdk.data.spi.filesystem.FileSystemUtil.cleanlyDelete(FileSystemUtil.java:239)
      at org.kitesdk.data.spi.filesystem.TemporaryFileSystemDatasetRepository.delete(TemporaryFileSystemDatasetRepository.java:61)
      at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:395)
      at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274)
      at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.FileNotFoundException: File hdfs://ns/path/.temp does not exist.
      at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:705)
      at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:106)
      at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:763)
      at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:759)
      at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:759)
      at org.kitesdk.data.spi.filesystem.FileSystemUtil.cleanlyDelete(FileSystemUtil.java:226)

      Since this method is to remove the empty directories, it should not fail the job when the directories are already being deleted.

      [1] https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/filesystem/FileSystemUtil.java#L241-L242

        Attachments

          Activity

            People

            • Assignee:
              vasas Szabolcs Vasas
              Reporter:
              Xiaomin Xiaomin Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: