• Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.1
    • Fix Version/s: 1.1.0
    • Component/s: Command-line Interface
    • Labels:


      There are a few alternatives for compaction (herringbone, filecrusher) that work approximately like the CLI does, by copying content in place in a MR job and deleting the old data. Kite can almost be used to do this, but the process requires copying to a different dataset with the copy command, removing data by hand, and copying the new files back.

      First, I think we should update the delete command to work with view URIs and call deleteAll() so users don't have to remove files by hand to do this with Kite

      Second, I think we should implement in-place compaction that creates a temporary dataset, runs a copy job, then deletes the source data and merges the temporary dataset (maybe add a replaceMerge() to do the work of delete and merge by partition). This would still corrupt data for a short period of time, but queries can be resubmitted.

      Last, I think we should integrate with Hive's locking mechanism so that we can do this safely. We can copy the data, lock the directory, replace the content, then unlock.


          Issue Links



              • Assignee:
                blue Ryan Blue
                blue Ryan Blue
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created: