1) If no-password ssh login does not work for rsa, try dsa.
2) ssh public key could be copied over with the command such as
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@dnjsrcha01
3) Leave safe mode
bin/hadoop dfsadmin -safemode leave
4) Balance HDFS disk space after add/remove nodes, for example,
bin/start-balancer.sh -threshold 10
5) Bigger HDFS block such as 128MB works better for very big input files. Not obvious otherwise.
6) ChainMapper & ChainReducer save intermediate file storage
7) HADOOP_MASTER in hadoop-env.sh can be set to distribute file changes to slaves.
8) HADOOP_PID_DIR in hadoop-env.sh should be set, otherwise, the pids are saved to /tmp by default, which could be a problem when you shut down the cluster if the pid files on /tmp are removed by Linux system
9) Kill a running job:
bin/hadoop job -kill job_id
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment