Saturday, December 19, 2015

Steps to run wordcount program in hadoop setup for big data Analytics

hello friends...

I am writing this blog about how to get interact with installed hadoop environment by running first program for wordcount. I am giving step by step procedure of all terminal commands with results exucuted on my node ( My hadoop version is 2.6.0). Right from starting my node the commands are highlighted in blue color and some important results are in green color.


omesh@omesh-HP-240-G3-Notebook-PC:~$ sudo su hduser
[sudo] password for omesh:
hduser@omesh-HP-240-G3-Notebook-PC:/home/omesh$ cd /usr/local/hadoop/sbin/
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ ls
distribute-exclude.sh    start-all.cmd        stop-balancer.sh
hadoop-daemon.sh         start-all.sh         stop-dfs.cmd
hadoop-daemons.sh        start-balancer.sh    stop-dfs.sh
hdfs-config.cmd          start-dfs.cmd        stop-secure-dns.sh
hdfs-config.sh           start-dfs.sh         stop-yarn.cmd
httpfs.sh                start-secure-dns.sh  stop-yarn.sh
kms.sh                   start-yarn.cmd       yarn-daemon.sh
mr-jobhistory-daemon.sh  start-yarn.sh        yarn-daemons.sh
refresh-namenodes.sh     stop-all.cmd
slaves.sh                stop-all.sh
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/12/19 11:52:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-omesh-HP-240-G3-Notebook-PC.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-omesh-HP-240-G3-Notebook-PC.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-omesh-HP-240-G3-Notebook-PC.out
15/12/19 11:52:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-omesh-HP-240-G3-Notebook-PC.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-omesh-HP-240-G3-Notebook-PC.out
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ jps
25596 NodeManager
24829 DataNode
25694 Jps
25351 ResourceManager
25166 SecondaryNameNode
24591 NameNode

hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hdfs dfs -ls /
15/12/19 11:58:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
-rw-r--r--   1 hduser supergroup     661808 2015-12-18 20:25 /hadoop-projectfile.txt
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:32 /om
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:14 /output
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:40 /output2
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:13 /user
-rw-r--r--   1 hduser supergroup     661808 2015-12-18 19:21 /wordcount

hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hadoop jar /usr/local/hadoop/share/
doc/    hadoop/
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/
hadoop-mapreduce-client-app-2.6.0.jar              hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
hadoop-mapreduce-client-common-2.6.0.jar           hadoop-mapreduce-client-shuffle-2.6.0.jar
hadoop-mapreduce-client-core-2.6.0.jar             hadoop-mapreduce-examples-2.6.0.jar
hadoop-mapreduce-client-hs-2.6.0.jar               lib/
hadoop-mapreduce-client-hs-plugins-2.6.0.jar       lib-examples/
hadoop-mapreduce-client-jobclient-2.6.0.jar        sources/
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /om /output3
15/12/19 12:00:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/19 12:00:31 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/12/19 12:00:31 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/12/19 12:00:31 INFO input.FileInputFormat: Total input paths to process : 1
15/12/19 12:00:31 INFO mapreduce.JobSubmitter: number of splits:1
15/12/19 12:00:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local964714568_0001
15/12/19 12:00:31 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/12/19 12:00:31 INFO mapreduce.Job: Running job: job_local964714568_0001
15/12/19 12:00:31 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/12/19 12:00:31 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/12/19 12:00:32 INFO mapred.LocalJobRunner: Waiting for map tasks
15/12/19 12:00:32 INFO mapred.LocalJobRunner: Starting task: attempt_local964714568_0001_m_000000_0
15/12/19 12:00:32 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/12/19 12:00:32 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/om/hadoop-projectfile.txt:0+661808
15/12/19 12:00:32 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/12/19 12:00:32 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/12/19 12:00:32 INFO mapred.MapTask: soft limit at 83886080
15/12/19 12:00:32 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/12/19 12:00:32 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/12/19 12:00:32 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/12/19 12:00:32 INFO mapred.LocalJobRunner:
15/12/19 12:00:32 INFO mapred.MapTask: Starting flush of map output
15/12/19 12:00:32 INFO mapred.MapTask: Spilling map output
15/12/19 12:00:32 INFO mapred.MapTask: bufstart = 0; bufend = 1086544; bufvoid = 104857600
15/12/19 12:00:32 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25775024(103100096); length = 439373/6553600
15/12/19 12:00:32 INFO mapreduce.Job: Job job_local964714568_0001 running in uber mode : false
15/12/19 12:00:32 INFO mapreduce.Job:  map 0% reduce 0%
15/12/19 12:00:33 INFO mapred.MapTask: Finished spill 0
15/12/19 12:00:33 INFO mapred.Task: Task:attempt_local964714568_0001_m_000000_0 is done. And is in the process of committing
15/12/19 12:00:33 INFO mapred.LocalJobRunner: map
15/12/19 12:00:33 INFO mapred.Task: Task 'attempt_local964714568_0001_m_000000_0' done.
15/12/19 12:00:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local964714568_0001_m_000000_0
15/12/19 12:00:33 INFO mapred.LocalJobRunner: map task executor complete.
15/12/19 12:00:33 INFO mapred.LocalJobRunner: Waiting for reduce tasks
15/12/19 12:00:33 INFO mapred.LocalJobRunner: Starting task: attempt_local964714568_0001_r_000000_0
15/12/19 12:00:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/12/19 12:00:33 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@4748aec5
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334063200, maxSingleShuffleLimit=83515800, mergeThreshold=220481728, ioSortFactor=10, memToMemMergeOutputsThreshold=10
15/12/19 12:00:33 INFO reduce.EventFetcher: attempt_local964714568_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
15/12/19 12:00:33 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local964714568_0001_m_000000_0 decomp: 267009 len: 267013 to MEMORY
15/12/19 12:00:33 INFO reduce.InMemoryMapOutput: Read 267009 bytes from map-output for attempt_local964714568_0001_m_000000_0
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 267009, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->267009
15/12/19 12:00:33 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
15/12/19 12:00:33 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
15/12/19 12:00:33 INFO mapred.Merger: Merging 1 sorted segments
15/12/19 12:00:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 267004 bytes
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: Merged 1 segments, 267009 bytes to disk to satisfy reduce memory limit
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: Merging 1 files, 267013 bytes from disk
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
15/12/19 12:00:33 INFO mapred.Merger: Merging 1 sorted segments
15/12/19 12:00:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 267004 bytes
15/12/19 12:00:33 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/12/19 12:00:33 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
15/12/19 12:00:33 INFO mapreduce.Job:  map 100% reduce 0%
15/12/19 12:00:34 INFO mapred.Task: Task:attempt_local964714568_0001_r_000000_0 is done. And is in the process of committing
15/12/19 12:00:34 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/12/19 12:00:34 INFO mapred.Task: Task attempt_local964714568_0001_r_000000_0 is allowed to commit now
15/12/19 12:00:34 INFO output.FileOutputCommitter: Saved output of task 'attempt_local964714568_0001_r_000000_0' to hdfs://localhost:54310/output3/_temporary/0/task_local964714568_0001_r_000000
15/12/19 12:00:34 INFO mapred.LocalJobRunner: reduce > reduce
15/12/19 12:00:34 INFO mapred.Task: Task 'attempt_local964714568_0001_r_000000_0' done.
15/12/19 12:00:34 INFO mapred.LocalJobRunner: Finishing task: attempt_local964714568_0001_r_000000_0
15/12/19 12:00:34 INFO mapred.LocalJobRunner: reduce task executor complete.
15/12/19 12:00:34 INFO mapreduce.Job:  map 100% reduce 100%
15/12/19 12:00:34 INFO mapreduce.Job: Job job_local964714568_0001 completed successfully
15/12/19 12:00:35 INFO mapreduce.Job: Counters: 38
    File System Counters
        FILE: Number of bytes read=1075078
        FILE: Number of bytes written=1845581
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1323616
        HDFS: Number of bytes written=196183
        HDFS: Number of read operations=15
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Map-Reduce Framework
        Map input records=12761
        Map output records=109844
        Map output bytes=1086544
        Map output materialized bytes=267013
        Input split bytes=113
        Combine input records=109844
        Combine output records=18039
        Reduce input groups=18039
        Reduce shuffle bytes=267013
        Reduce input records=18039
        Reduce output records=18039
        Spilled Records=36078
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=9
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=429260800
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=661808
    File Output Format Counters
        Bytes Written=196183

hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$


Now if you want to run some benchmark tests in your hadoop cluster, please follow the link


For more frequent updates on big data analytics using Hadoop please like this page https://www.facebook.com/coebda/

2 comments:

  1. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking.Hadoop Admin Online Course Hyderabad

    ReplyDelete
  2. It is really a great work and the way in which you are sharing the knowledge is excellent.

    big data analytics company in hyderabad

    ReplyDelete