Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: CDH4.4.0
-
Fix Version/s: CDH5.0.0
-
Component/s: Hadoop Common
-
Labels:None
Description
This content is directly from the forums at this link: http://community.cloudera.com/t5/Storage-Random-Access-HDFS/CDH4-4-globbing-inside-HAR-files-bug/m-p/2677#M116
===========================================
We found a bug in CDH4.4.0.1 (and possibly earlier) in the globbing functionality inside .HAR files, implemented in the methods FileSystem::globStatus() and FileSystem::globStatusInternal().
The bug results in an exception:
hdfs@h01:~$ hadoop fs -ls har:/ddd/hhh.har/H*
-ls: Can not create a Path from an empty string
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [<path> ...]
were the Hadoop 2.0.0 implementation yields:
hadoop@h01:~$ hadoop fs -ls har:/ddd/hhh.har/H*
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2013-09-05 06:55 har:///ddd/hhh.har/Hhh/Fff
the bug occurs in globStatusInternal on line:
matches.add(getFileStatus(new Path(baseDir)));
as the loop before it fails to collect the har filename into the baseDir variable, down the line
HarFileSystem::getHarInPath() executes:
Path tmp = new Path(harPath.getName());
and fails because "".equals(new Path("/").getName())
replacing both methods with their Hadoop 2.0.0 versions gives a client-side workaround.
I cannot see why these methods were changed in the way they are as I don't see how globbing inside a .HAR file
can work.
Is this a known issue? Which CDH version resolves this issue?
=======================================