Details
-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 0.17.1
-
Fix Version/s: 1.1.0
-
Component/s: Command-line Interface
-
Labels:None
Description
There are a few alternatives for compaction (herringbone, filecrusher) that work approximately like the CLI does, by copying content in place in a MR job and deleting the old data. Kite can almost be used to do this, but the process requires copying to a different dataset with the copy command, removing data by hand, and copying the new files back.
First, I think we should update the delete command to work with view URIs and call deleteAll() so users don't have to remove files by hand to do this with Kite
Second, I think we should implement in-place compaction that creates a temporary dataset, runs a copy job, then deletes the source data and merges the temporary dataset (maybe add a replaceMerge() to do the work of delete and merge by partition). This would still corrupt data for a short period of time, but queries can be resubmitted.
Last, I think we should integrate with Hive's locking mechanism so that we can do this safely. We can copy the data, lock the directory, replace the content, then unlock.