Monday, December 21, 2015

Running Benchmark tests in hadoop

Dear friends...


Today, I am writting this blog to make you understand about how to run a benchmark test in hadoop cluster.

Benchmark application tests are already in hadoop distribution. you just need to run and test the performance of your cluster.

The command to run the TestDFSIO is as shown below:


This is how you can run various tests in hadoop cluster. You will get the final results of this commands like this:


Another Benchmark test that you can run in your hadoop cluster is: mrbench. This test creates many small jobs and tests their working in your cluster as shown.


The results of this benchmark test will be shown like this:



Wanna give more time to benchmark tests, then visit next page

Hope It will help you...

For more updates keep visiting this blog or our facebook page

Sunday, December 20, 2015

Steps to install hadoop in Ubuntu

Hello Friends...


In this blog I am explaining the procedure to install hadoop single node cluster in Linux. I installed hadoop 2.6.0 on Ubuntu 12.04. Hadoop installation needs basic working knowledge of Linux. I hope you have basic working knowledge of linux or have a look on this post for basic understanding of Linux first: Linux administration.

The steps for installing hadoop are as following:

1. First open the terminal by Ctrl+Alt+T.
2. Run the update command: sudo apt-get update
First it will prompt for your password, and then it may take time depending upon your internet
speed and system update status.
3. Then install java in your sytem using $ sudo apt-get install openjdk-6-jdk
Note: I used java version 6, you can opt for higher version 7 or 8.
To change java version in your system you can run the command:
            $ update-alternatives --config java


4. check java version by using: $ java -version

5. Add a new group named hadoop: $ sudo addgroup hadoop
6. Then make a new user hduser in that group: $ sudo adduser –ingroup hadoop hduser
It may ask for some details like name, address, etc. Fill these details although you may skip some of these.
7. Now for communication install the ssh: $ sudo apt-get install ssh
8. Generate the RSA public private key pair using SSH and move this to the authorized_keys as shown in following steps:




9. Add localhost as secure channel using ssh: $ ssh localhost
10. Now install the freely available hadoop version from any site ( I downloaded 2.6.0)
11. Untar the downloaded package using the command: $ tar xvzf hadoop-2.6.0.tar.gz

Now make the hadoop directory inside /usr/local by the command: mkdir -p /usr/local/hadoop

12. Now change directory to this folder using: $ cd hadoop-2.6.0

13. Now move all content of this directory to the /usr/local/hadoop
   
14. This may throw an error like:
hduser is not in the sudoers file. This incident will be reported......
15. To deal with this error add hduser to the sudoers file as shown

16. Now again move the folder as tried previously and change its ownership to the hduser as shown:

17. Now we are almost done, and just need to change the configuration file. The following files
needs to be changed
1. ~/.bashrc
2. hadoop-env.sh
3. core-site.xml
4. mapred-site.xml
5. hdfs-site.xml

18. Open bashrc by the comand ( vim ~/.bashrc) and add the hadoop path to the directory as shown: [If vim is not already installed on your system, install it by following command sudo apt-get install vim  (after it again try to open the .bashrc file as shown:) vim ~/.bashrc


19. Now open ( $ vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh) and update hadoop-env.sh as shown



20. Now First make a tmp directory as mentioned in the given step:



Now open and update core-site.xml as shown:


Now first copy content of mapred-site.xml.template to mapred-site.xml by the command shown in the image below:



21. Now open and update mapred-site.xml as shown (opening command in the image above and opened file and necessary changes in the image underneath)



22. Now make two directories for namenode and datanode and then make corresponding updates in hdfs-site.xml


Updates in hdfs-site.xml



23. Now we are done... !!
24. Lets start the hadoop now,
25. first format the namenode

26. Then start the hadoop:
27.Change the directory where start-all.sh file resides:
28. Now start hadoop : $ start-all.sh and check the status of the node using the command $ jps

Errorfree start of the hadoop environment will show Namenode, SecondaryNameNode,
NodeManager, DataNode, ResourceManager and jps itself as running processes. So we are done.
29. Lets see the web interface of Namenode and Secondary namenodes:
Namenode at port 50070 of localhost:




We are done.... All components are working fine.

30. Last one.... Dont forget to leave hadoop cluster without stoping the services by the following commands:


If you wish to make a multinode hadoop cluster. Please refer the instructions given at following post hadoop multinode installation

*****************************************************************************

Now to run the first program on your hadoop cluster Please follow this blog: Running first program in hadoop

For configuring hbase in your hadoop cluster visit this post

For configuration of pig in your hadoop cluster go to this pig-installation-page


For more frequent updates about Big data Analytics using hadoop please visit and like: DataioticsHub


Thanks and Regards



Saturday, December 19, 2015

Steps to run wordcount program in hadoop setup for big data Analytics

hello friends...

I am writing this blog about how to get interact with installed hadoop environment by running first program for wordcount. I am giving step by step procedure of all terminal commands with results exucuted on my node ( My hadoop version is 2.6.0). Right from starting my node the commands are highlighted in blue color and some important results are in green color.


omesh@omesh-HP-240-G3-Notebook-PC:~$ sudo su hduser
[sudo] password for omesh:
hduser@omesh-HP-240-G3-Notebook-PC:/home/omesh$ cd /usr/local/hadoop/sbin/
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ ls
distribute-exclude.sh    start-all.cmd        stop-balancer.sh
hadoop-daemon.sh         start-all.sh         stop-dfs.cmd
hadoop-daemons.sh        start-balancer.sh    stop-dfs.sh
hdfs-config.cmd          start-dfs.cmd        stop-secure-dns.sh
hdfs-config.sh           start-dfs.sh         stop-yarn.cmd
httpfs.sh                start-secure-dns.sh  stop-yarn.sh
kms.sh                   start-yarn.cmd       yarn-daemon.sh
mr-jobhistory-daemon.sh  start-yarn.sh        yarn-daemons.sh
refresh-namenodes.sh     stop-all.cmd
slaves.sh                stop-all.sh
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/12/19 11:52:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-omesh-HP-240-G3-Notebook-PC.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-omesh-HP-240-G3-Notebook-PC.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-omesh-HP-240-G3-Notebook-PC.out
15/12/19 11:52:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-omesh-HP-240-G3-Notebook-PC.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-omesh-HP-240-G3-Notebook-PC.out
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ jps
25596 NodeManager
24829 DataNode
25694 Jps
25351 ResourceManager
25166 SecondaryNameNode
24591 NameNode

hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hdfs dfs -ls /
15/12/19 11:58:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
-rw-r--r--   1 hduser supergroup     661808 2015-12-18 20:25 /hadoop-projectfile.txt
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:32 /om
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:14 /output
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:40 /output2
drwxr-xr-x   - hduser supergroup          0 2015-12-18 20:13 /user
-rw-r--r--   1 hduser supergroup     661808 2015-12-18 19:21 /wordcount

hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hadoop jar /usr/local/hadoop/share/
doc/    hadoop/
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/
hadoop-mapreduce-client-app-2.6.0.jar              hadoop-mapreduce-client-jobclient-2.6.0-tests.jar
hadoop-mapreduce-client-common-2.6.0.jar           hadoop-mapreduce-client-shuffle-2.6.0.jar
hadoop-mapreduce-client-core-2.6.0.jar             hadoop-mapreduce-examples-2.6.0.jar
hadoop-mapreduce-client-hs-2.6.0.jar               lib/
hadoop-mapreduce-client-hs-plugins-2.6.0.jar       lib-examples/
hadoop-mapreduce-client-jobclient-2.6.0.jar        sources/
hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /om /output3
15/12/19 12:00:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/19 12:00:31 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/12/19 12:00:31 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/12/19 12:00:31 INFO input.FileInputFormat: Total input paths to process : 1
15/12/19 12:00:31 INFO mapreduce.JobSubmitter: number of splits:1
15/12/19 12:00:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local964714568_0001
15/12/19 12:00:31 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/12/19 12:00:31 INFO mapreduce.Job: Running job: job_local964714568_0001
15/12/19 12:00:31 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/12/19 12:00:31 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
15/12/19 12:00:32 INFO mapred.LocalJobRunner: Waiting for map tasks
15/12/19 12:00:32 INFO mapred.LocalJobRunner: Starting task: attempt_local964714568_0001_m_000000_0
15/12/19 12:00:32 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/12/19 12:00:32 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/om/hadoop-projectfile.txt:0+661808
15/12/19 12:00:32 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/12/19 12:00:32 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/12/19 12:00:32 INFO mapred.MapTask: soft limit at 83886080
15/12/19 12:00:32 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/12/19 12:00:32 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/12/19 12:00:32 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/12/19 12:00:32 INFO mapred.LocalJobRunner:
15/12/19 12:00:32 INFO mapred.MapTask: Starting flush of map output
15/12/19 12:00:32 INFO mapred.MapTask: Spilling map output
15/12/19 12:00:32 INFO mapred.MapTask: bufstart = 0; bufend = 1086544; bufvoid = 104857600
15/12/19 12:00:32 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25775024(103100096); length = 439373/6553600
15/12/19 12:00:32 INFO mapreduce.Job: Job job_local964714568_0001 running in uber mode : false
15/12/19 12:00:32 INFO mapreduce.Job:  map 0% reduce 0%
15/12/19 12:00:33 INFO mapred.MapTask: Finished spill 0
15/12/19 12:00:33 INFO mapred.Task: Task:attempt_local964714568_0001_m_000000_0 is done. And is in the process of committing
15/12/19 12:00:33 INFO mapred.LocalJobRunner: map
15/12/19 12:00:33 INFO mapred.Task: Task 'attempt_local964714568_0001_m_000000_0' done.
15/12/19 12:00:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local964714568_0001_m_000000_0
15/12/19 12:00:33 INFO mapred.LocalJobRunner: map task executor complete.
15/12/19 12:00:33 INFO mapred.LocalJobRunner: Waiting for reduce tasks
15/12/19 12:00:33 INFO mapred.LocalJobRunner: Starting task: attempt_local964714568_0001_r_000000_0
15/12/19 12:00:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/12/19 12:00:33 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@4748aec5
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334063200, maxSingleShuffleLimit=83515800, mergeThreshold=220481728, ioSortFactor=10, memToMemMergeOutputsThreshold=10
15/12/19 12:00:33 INFO reduce.EventFetcher: attempt_local964714568_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
15/12/19 12:00:33 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local964714568_0001_m_000000_0 decomp: 267009 len: 267013 to MEMORY
15/12/19 12:00:33 INFO reduce.InMemoryMapOutput: Read 267009 bytes from map-output for attempt_local964714568_0001_m_000000_0
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 267009, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->267009
15/12/19 12:00:33 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
15/12/19 12:00:33 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
15/12/19 12:00:33 INFO mapred.Merger: Merging 1 sorted segments
15/12/19 12:00:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 267004 bytes
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: Merged 1 segments, 267009 bytes to disk to satisfy reduce memory limit
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: Merging 1 files, 267013 bytes from disk
15/12/19 12:00:33 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
15/12/19 12:00:33 INFO mapred.Merger: Merging 1 sorted segments
15/12/19 12:00:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 267004 bytes
15/12/19 12:00:33 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/12/19 12:00:33 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
15/12/19 12:00:33 INFO mapreduce.Job:  map 100% reduce 0%
15/12/19 12:00:34 INFO mapred.Task: Task:attempt_local964714568_0001_r_000000_0 is done. And is in the process of committing
15/12/19 12:00:34 INFO mapred.LocalJobRunner: 1 / 1 copied.
15/12/19 12:00:34 INFO mapred.Task: Task attempt_local964714568_0001_r_000000_0 is allowed to commit now
15/12/19 12:00:34 INFO output.FileOutputCommitter: Saved output of task 'attempt_local964714568_0001_r_000000_0' to hdfs://localhost:54310/output3/_temporary/0/task_local964714568_0001_r_000000
15/12/19 12:00:34 INFO mapred.LocalJobRunner: reduce > reduce
15/12/19 12:00:34 INFO mapred.Task: Task 'attempt_local964714568_0001_r_000000_0' done.
15/12/19 12:00:34 INFO mapred.LocalJobRunner: Finishing task: attempt_local964714568_0001_r_000000_0
15/12/19 12:00:34 INFO mapred.LocalJobRunner: reduce task executor complete.
15/12/19 12:00:34 INFO mapreduce.Job:  map 100% reduce 100%
15/12/19 12:00:34 INFO mapreduce.Job: Job job_local964714568_0001 completed successfully
15/12/19 12:00:35 INFO mapreduce.Job: Counters: 38
    File System Counters
        FILE: Number of bytes read=1075078
        FILE: Number of bytes written=1845581
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1323616
        HDFS: Number of bytes written=196183
        HDFS: Number of read operations=15
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Map-Reduce Framework
        Map input records=12761
        Map output records=109844
        Map output bytes=1086544
        Map output materialized bytes=267013
        Input split bytes=113
        Combine input records=109844
        Combine output records=18039
        Reduce input groups=18039
        Reduce shuffle bytes=267013
        Reduce input records=18039
        Reduce output records=18039
        Spilled Records=36078
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=9
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=429260800
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=661808
    File Output Format Counters
        Bytes Written=196183

hduser@omesh-HP-240-G3-Notebook-PC:/usr/local/hadoop/sbin$


Now if you want to run some benchmark tests in your hadoop cluster, please follow the link


For more frequent updates on big data analytics using Hadoop please like this page https://www.facebook.com/coebda/