[DISTRO-39] DistributedCache issues with CDH 0.20.2+320 - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: CDH3b3
Fix Version/s: CDH3b4
Component/s: HDFS, MapReduce
Labels:
None

Description

On Tue, Oct 5, 2010 at 2:54 PM, Jamie Cockrill <jamie.cockrill@gmail.com>wrote:

> Hi Kim,
>
> We didn't fix it in the end. I just ended up manually writing the
> files to the cluster using the FileSystem class, and then reading them
> back out again on the other side. Not terribly efficient as I guess
> the point of DistributedCache is that the files get distributed to
> every node, whereas I'm only writing to two or three nodes, then every
> map-task is then trying to read back from those two or three nodes the
> data are stored on.
>
> Unfortunately I didn't have the will or inclination to investigate it
> any further as I had some pretty tight deadlines to keep to and it
> hasn't caused me any significant problems yet...
>
> Thanks,
>
> Jamie
>
> On 5 October 2010 22:30, Kim Vogt <kim@simplegeo.com> wrote:
> > I'm experiencing the same problem. I was hoping there were be a reply to
> > this. Anyone? Bueller?
> >
> > -Kim
> >
> > On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill <
> jamie.cockrill@gmail.com>wrote:
> >
> >> Dear All,
> >>
> >> We recently upgraded from CDH3b1 to b2 and ever since, all our
> >> mapreduce jobs that use the DistributedCache have failed. Typically,
> >> we add files to the cache prior to job startup, using
> >> addCacheFile(URI, conf) and then get them on the other side, using
> >> getLocalCacheFiles(conf). I believe the hadoop-core versions for these
> >> are 0.20.2+228 and +320 respectively.
> >>
> >> We then open the files and read them in using a standard FileReader,
> >> using the toString on the path object as the constructor parameter,
> >> which has worked fine up to now. However, we're now getting
> >> FileNotFound exceptions when the file reader tries to open the file.
> >>
> >> Unfortunately the cluster is on an airgapped network, but the
> >> FileNotFound line comes out like:
> >>
> >> java.io.FileNotFoundException:
> >>
> >>
> /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt
> >>
> >> Note, the duplication of filename.txt is deliberate. I'm not sure if
> >> that's strange or not as this has previously worked absolutely fine.
> >> Has anyone else experienced this? Apologies if this is known, I've
> >> only just joined the list.
> >>
> >> Many thanks,
> >>
> >> Jamie
> >>
> >
>

Attachments

Activity

People

Assignee:

Tom White

Reporter:

Patrick Angeles

Votes:

0 Vote for this issue

Watchers:

1 Start watching this issue

Dates

Created:

06/Oct/10 8:58 PM

Updated:

20/Jan/11 7:36 PM

Resolved:

20/Jan/11 7:36 PM