Thursday, August 25, 2011

Hadoop 0.20.2 tips

1) If no-password ssh login does not work for rsa, try dsa.

2) ssh public key could be copied over with the command such as

ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@dnjsrcha01

3) Leave safe mode

bin/hadoop dfsadmin -safemode leave

4) Balance HDFS disk space after add/remove nodes, for example,

bin/start-balancer.sh -threshold 10

5) Bigger HDFS block such as 128MB works better for very big input files. Not obvious otherwise.

6) ChainMapper & ChainReducer save intermediate file storage

7) HADOOP_MASTER in hadoop-env.sh can be set to distribute file changes to slaves.

8) HADOOP_PID_DIR in hadoop-env.sh should be set, otherwise, the pids are saved to /tmp by default, which could be a problem when you shut down the cluster if the pid files on /tmp are removed by Linux system

9) Kill a running job:

bin/hadoop job -kill job_id

0 comments:

Post a Comment