Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.1
    • Fix Version/s: 1.1.0
    • Component/s: Command-line Interface
    • Labels:
      None

      Description

      There are a few alternatives for compaction (herringbone, filecrusher) that work approximately like the CLI does, by copying content in place in a MR job and deleting the old data. Kite can almost be used to do this, but the process requires copying to a different dataset with the copy command, removing data by hand, and copying the new files back.

      First, I think we should update the delete command to work with view URIs and call deleteAll() so users don't have to remove files by hand to do this with Kite

      Second, I think we should implement in-place compaction that creates a temporary dataset, runs a copy job, then deletes the source data and merges the temporary dataset (maybe add a replaceMerge() to do the work of delete and merge by partition). This would still corrupt data for a short period of time, but queries can be resubmitted.

      Last, I think we should integrate with Hive's locking mechanism so that we can do this safely. We can copy the data, lock the directory, replace the content, then unlock.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                blue Ryan Blue
                Reporter:
                blue Ryan Blue
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: