Details
Description
hadoop distcp between cdh4.1.2 to cdh5.1.0 is failing (Caused by: java.io.IOException: Check-sum mismatch between) even with -Ddfs.checksum.type=CRC32 or CRC32C
After getting checksum from both clusters on same file (i distcp'ed the file with -skipcrccheck) i am seeing a difference in checksum
cdh4.1.2
========
stat
[sc-app1:~]$ hdfs dfs -stat "%b/%o" /user/vigith/test 15/134217728
version
[vigith@btsc-nn1 ~]$ hadoop version Hadoop 2.0.0-cdh4.1.2
GETFILECHECKSUM
[vigith@btsc-nn1 ~]$ curl "http://btsc-wh1.example.com:14000/webhdfs/v1/user/vigith/test?op=GETFILECHECKSUM&user.name=ops" {"FileChecksum":{"algorithm":"MD5-of-0MD5-of-512CRC32","bytes":"0000020000000000000000009044aa7dbdf5696d046adc0dafe825aa00000000","length":28}}
unix md5sum
[vigith@btsc-nn1 ~]$ hdfs dfs -cat /user/vigith/test | md5sum 3469cdaffd9cd5aaa9f579649dd77a3d -
cdh5.1.0
========
checksum
[vigith@science-app1 ~]$ hdfs dfs -checksum /user/vigith/test /user/vigith/test MD5-of-0MD5-of-512CRC32 000002000000000000000000acc13345fde37a24ba5cf776e6cdff9c
stat
[vigith@science-app1 ~]$ hdfs dfs -stat "%b/%o" /user/vigith/test 15/134217728
version
[vigith@science-app1 ~]$ hadoop version Hadoop 2.3.0-cdh5.1.0
unix md5sum
[vigith@science-app1 ~]$ hdfs dfs -cat /user/vigith/test | md5sum 3469cdaffd9cd5aaa9f579649dd77a3d -
Please let me know if I am doing something really stupid
distcp command i tried
hadoop distcp -pb hftp://btsc-nn1.example.com:50070/user/vigith/test hdfs://haHdfs/user/vigith/test