[KITE-1118] Optimize writing data to S3 - Cloudera Open Source

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0
Component/s: Data Module
Labels:
None

Description

When using a combination of YARN + Crunch + Kite + S3 (org.apache.hadoop.fs.s3a.S3AFileSystem) it was observed that data records are copied a number of times after being written to S3, by S3 copy requests. This happens because the objects are moved from /path/.temp/job_X/mr/attempt_X, to /path/.temp/job_X/mr/job_X, to their final destination in /path. In each location they are renamed .filename.tmp to filename. This results in a total of 5 renames after the initial write.

Because S3 does not natively support renaming objects, the filesystem must implement rename with a copy operation, which is very slow. The first 3 copies happen concurrently across all reduce tasks. The last 2 are particularly painful as they are performed serially for all output files by the application master.

To make matters worse S3AFileSystem does not set the multipart copy part size and threshold in the transfer manager configuration [1], and the default threshold is 5 GB which is larger than our Crunch output files, so a full-object remote copy is done. We are using snappy-compressed Avro.

As far as a long-term solution to this goes, the Hadoop project is working on a substantial enhancement [2] to enable improved interactions with "Object Store" (non-)filesystems [3] such as S3. As evident from the Jira activity this is a well-known deficiency. Until that work is done, Kite can inspect the filesystem URI and when writing to known Object Stores elect to avoid all the renaming (i.e., copying) overhead. A new configuration property can be introduced and supplied to override this inference and consequential write behavior adjustment if desired.

We noticed a recent Crunch patch [4] that parallelized some renaming. However that code is actually not consumed by Kite, and skipping the temp file + rename entirely is much preferable anyway.

[1] https://issues.apache.org/jira/browse/HADOOP-12891
[2] https://issues.apache.org/jira/browse/HADOOP-9565
[3] https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md#warning-object-stores-are-not-filesystems
[4] https://issues.apache.org/jira/browse/CRUNCH-580

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Andrew Olson

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

04/Mar/16 10:44 PM

Updated:

07/Mar/16 9:59 PM

Resolved:

07/Mar/16 9:45 PM