Details
-
Type:
Improvement
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 1.1.0
-
Fix Version/s: 1.2.0
-
Component/s: None
-
Labels:None
Description
Due to various historical changes in the way Kite works with its own InputFormat, the automatic use of Crunch's CrunchCombineFileInputFormat no longer gets used when reading file-based datasets via Crunch.
This means that each file in an input dataset will result in an additional input split, and therefore an additional map task when reading a dataset. The overhead of a large number of extra map tasks can negatively impact performance.
It would be very useful if Kite were to automatically use CombineFileInputFormat's ability to combine multiple small files into a single input split when processing data via Crunch or MapReduce.
Attachments
Issue Links
- links to