Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-637

hadoop distcp md5 checksum failure even with same checksum type (cdh4.1.2 to cdh5.1.0)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: CDH 5.1.0
    • Fix Version/s: None
    • Component/s: HDFS
    • Labels:

      Description

      hadoop distcp between cdh4.1.2 to cdh5.1.0 is failing (Caused by: java.io.IOException: Check-sum mismatch between) even with -Ddfs.checksum.type=CRC32 or CRC32C

      After getting checksum from both clusters on same file (i distcp'ed the file with -skipcrccheck) i am seeing a difference in checksum

      cdh4.1.2
      ========

      stat

      [sc-app1:~]$ hdfs dfs -stat "%b/%o" /user/vigith/test
      15/134217728
      

      version

      [vigith@btsc-nn1 ~]$ hadoop version
      Hadoop 2.0.0-cdh4.1.2
      

      GETFILECHECKSUM

      [vigith@btsc-nn1 ~]$ curl "http://btsc-wh1.example.com:14000/webhdfs/v1/user/vigith/test?op=GETFILECHECKSUM&user.name=ops"
      {"FileChecksum":{"algorithm":"MD5-of-0MD5-of-512CRC32","bytes":"0000020000000000000000009044aa7dbdf5696d046adc0dafe825aa00000000","length":28}}
      

      unix md5sum

      [vigith@btsc-nn1 ~]$ hdfs dfs -cat /user/vigith/test | md5sum
      3469cdaffd9cd5aaa9f579649dd77a3d  -
      

      cdh5.1.0
      ========

      checksum

      [vigith@science-app1 ~]$ hdfs dfs -checksum /user/vigith/test
      /user/vigith/test       MD5-of-0MD5-of-512CRC32 000002000000000000000000acc13345fde37a24ba5cf776e6cdff9c
      

      stat

      [vigith@science-app1 ~]$ hdfs dfs -stat "%b/%o" /user/vigith/test
      15/134217728
      

      version

      [vigith@science-app1 ~]$ hadoop version
      Hadoop 2.3.0-cdh5.1.0
      

      unix md5sum

      [vigith@science-app1 ~]$ hdfs dfs -cat /user/vigith/test | md5sum
      3469cdaffd9cd5aaa9f579649dd77a3d  -
      

      Please let me know if I am doing something really stupid

      distcp command i tried

      hadoop distcp -pb  hftp://btsc-nn1.example.com:50070/user/vigith/test hdfs://haHdfs/user/vigith/test
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              vigith Vigith Maurice
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: