-- Installation Follow instructions from this article https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-in-stand-alone-mode-on-ubuntu-16-04 -- Execution ---- Hadoop with Java ------ make input directory mkdir ~/input ------ make output directory mkdir ~/output ------ copy config files (for testing) to hadoop input directory cp /usr/local/hadoop/etc/hadoop/*.xml ~/input/ ------ run job /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar grep ~/input ~/output 'hadoop[.]*' ---- Hadoop with Python ------ Create workspace mkdir ~/workspace/hadoop ------ create mapper.py inside workspace vim ~/workspace/hadoop/mapper.py ------ create reducer.py vim ~/workspace/hadoop/mapper.py ------ make mapper.py and reducer.py executable chmod +x mapper.py chmod +x reducer.py ------ create data directory and copy data (book*.txt) into that mkdir ~/workspace/hadoop/data ------ run job without hadoop environment -------- command prompt echo "foo foo quux labs foo bar quux" | ./mapper.py | sort -k1,1 | ./reducer.py -------- book example cat data/book1.txt | ./mapper.py | ./reducer.py -------- run job in hadoop environment ---------- copy local text files to hdfs /usr/local/hadoop/bin/hadoop dfs -copyFromLocal data/ ./input ---------- check input in hdfs /usr/local/hadoop/bin/hadoop dfs -ls ---------- run map reduce job /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.0.0.jar -file /home/ckpcetco/workspace/hadoop/mapper.py -mapper /home/ckpcetco/workspace/hadoop/mapper.py -file /home/ckpcetco/workspace/hadoop/reducer.py -reducer /home/ckpcetco/workspace/hadoop/reducer.py -input /home/ckpcetco/workspace/hadoop/input/* -output /home/ckpcetco/workspace/hadoop/output ---------- check output /usr/local/hadoop/bin/hadoop dfs -cat /home/ckpcetco/workspace/hadoop/output/part-00000