Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-39

DistributedCache issues with CDH 0.20.2+320

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: CDH3b3
    • Fix Version/s: CDH3b4
    • Component/s: HDFS, MapReduce
    • Labels:
      None

      Description

      On Tue, Oct 5, 2010 at 2:54 PM, Jamie Cockrill <jamie.cockrill@gmail.com>wrote:

      > Hi Kim,
      >
      > We didn't fix it in the end. I just ended up manually writing the
      > files to the cluster using the FileSystem class, and then reading them
      > back out again on the other side. Not terribly efficient as I guess
      > the point of DistributedCache is that the files get distributed to
      > every node, whereas I'm only writing to two or three nodes, then every
      > map-task is then trying to read back from those two or three nodes the
      > data are stored on.
      >
      > Unfortunately I didn't have the will or inclination to investigate it
      > any further as I had some pretty tight deadlines to keep to and it
      > hasn't caused me any significant problems yet...
      >
      > Thanks,
      >
      > Jamie
      >
      > On 5 October 2010 22:30, Kim Vogt <kim@simplegeo.com> wrote:
      > > I'm experiencing the same problem. I was hoping there were be a reply to
      > > this. Anyone? Bueller?
      > >
      > > -Kim
      > >
      > > On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill <
      > jamie.cockrill@gmail.com>wrote:
      > >
      > >> Dear All,
      > >>
      > >> We recently upgraded from CDH3b1 to b2 and ever since, all our
      > >> mapreduce jobs that use the DistributedCache have failed. Typically,
      > >> we add files to the cache prior to job startup, using
      > >> addCacheFile(URI, conf) and then get them on the other side, using
      > >> getLocalCacheFiles(conf). I believe the hadoop-core versions for these
      > >> are 0.20.2+228 and +320 respectively.
      > >>
      > >> We then open the files and read them in using a standard FileReader,
      > >> using the toString on the path object as the constructor parameter,
      > >> which has worked fine up to now. However, we're now getting
      > >> FileNotFound exceptions when the file reader tries to open the file.
      > >>
      > >> Unfortunately the cluster is on an airgapped network, but the
      > >> FileNotFound line comes out like:
      > >>
      > >> java.io.FileNotFoundException:
      > >>
      > >>
      > /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt
      > >>
      > >> Note, the duplication of filename.txt is deliberate. I'm not sure if
      > >> that's strange or not as this has previously worked absolutely fine.
      > >> Has anyone else experienced this? Apologies if this is known, I've
      > >> only just joined the list.
      > >>
      > >> Many thanks,
      > >>
      > >> Jamie
      > >>
      > >
      >

        Attachments

          Activity

            People

            • Assignee:
              tom Tom White
              Reporter:
              patrickangeles Patrick Angeles
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: