Details
Description
Some good comments from eric.xkcd@gmail.com on cdh-user@ we should incorporate into the upgrade docs:
I'm upgrading a cluster right now, and these are the things I ran into so far:
Upgrading the namenode with /etc/init.d/hadoop-0.20-namenode upgrade does not work: eventually I just (temporarily) hard coded the -upgrade option in the start script
If you defined non-existing dfs.data.dir entries, CDH3b3 will no longer silently ignore them and shutdown instead
Upgrading the namenode image is fast, but upgrading the data nodes can take a long time. My cluster is roughly using 50TB of disk space.
Looking at the progress in the log files, I estimate that this can take 7 hours or more to complete.I had to manually extend my hadoop-env.sh scripts with:
export HADOOP_NAMENODE_USER hdfs
export HADOOP_DATANODE_USER hdfsThe last point is understandable since the package manager won't blindly upgrade custom settings, but it should be added to the upgrade
instructions. I had to find this by using Google and reading a mailing list.