Last modified: 2014-07-17 12:07:35 UTC
When trying to bring up a namenode in labs, puppet fails with Error: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist Error: /Stage[main]/Cdh::Hadoop::Namenode/File[/var/lib/hadoop/name]/ensure: change from absent to directory failed: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist With puppet at commit ebcbef50568960d424fcb95fc79ba3be945a905e, everything is working, and setting up a cluster in labs works. With 87bd718e678d290b80b0916d255f1bae8666e7d7 (i.e.: the child following the above ebcdef commit) + cherry-picking a38770013716dd39ee5df90380473b734e0cebbb on top [1], puppet fails to set up namenode. Puppet runs fail with the above error message. So it seems 87bd718e678d290b80b0916d255f1bae8666e7d7 is the culprit. But as this commit is doing much reshuffling (~800 lines changed), I'll leave it to CDH+puppet experts to dig deeper. * Steps to Reproduce * Add a new instance 'demo-master' (m1.small, ubuntu-12.04-precise) * Wait for the instance to come up. * Configure the instance by adding role role::analytics::hadoop::master and setting hadoop_namenodes to demo-master.eqiad.wmflabs * Wait for the next puppet run * Expected result Puppet passes without errors * Actual result Puppet fails with Error: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist Error: /Stage[main]/Cdh::Hadoop::Namenode/File[/var/lib/hadoop/name]/ensure: change from absent to directory failed: Cannot create /var/lib/hadoop/name; parent directory /var/lib/hadoop does not exist [1] Plain 87bd718e678d290b80b0916d255f1bae8666e7d7 fails with Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate parameter 'mapreduce_output_compression' for on Class[Cdh::Hadoop] at /etc/puppet/manifests/role/analytics/hadoop.pp:201 on node qchris-master-87bd718.eqiad.wmflabs which was fixed upstream in commit a38770013716dd39ee5df90380473b734e0cebbb.
Btw. bringing up hadoop workers with current puppet also fails with a (different) directory in /var/lib/hadoop not existing. (Again, when using puppet at ebcbef50568960d424fcb95fc79ba3be945a905e hadoop workers are brought up by puppet without issues.)
It seems the part that creates /var/lib/hadoop [1] has been lost in translation for commit 87bd718e678d290b80b0916d255f1bae8666e7d7. [1] Search for "unlikely" on https://git.wikimedia.org/blobdiff/operations%2Fpuppet/87bd718e678d290b80b0916d255f1bae8666e7d7/manifests%2Frole%2Fanalytics%2Fhadoop.pp