The following image gives an overview of a Hadoop Distributed File System Architecture. Hadoop configuration files are in the HADOOP_HOME/conf dir.Įxecute the below command from hadoop home directory $ ~/hadoop/bin/hadoop namenode -format Since a Hadoop user would require to read-write to these directories you would need to change the permissions of above directories to 755 or 777 for Hadoop user. Let us create a directory with the name hdfs and three sub-directories name, data and tmp. JAVA_HOME can be verified by command echo $JAVA_HOMEĪn advantage of using Hadoop is that with just a limited number of directories you can set it up to work correctly. Use the below command to set JAVA_HOME on Ubuntu export JAVA_HOME=/usr/lib/jvm/java-6-sun Alternatively you can also let hadoop know this by setting Java_Home in hadoop conf/hadoop-env.sh file. Java_Home can be configured in ~/.bash_profile or ~/.bashrc file. Hadoop requires Java installation path to work on, for this we will be setting JAVA_HOME environment variable and this will point to our Java installation dir. Use this command to verify your Hadoop installation: Now place the Hadoop binary directory on your command-line path by executing the command export PATH=$PATH:$HADOOP_HOME/bin Use the following command to create an environment variable that points to the Hadoop installation directory (HADOOP_HOME) export HADOOP_HOME=/home/user/hadoop Save the extracted folder to an appropriate location, HADOOP_HOME will be pointing to this directory.Ĭheck if the following directories exist under HADOOP_HOME: bin, conf, lib, bin Unpack the release tar – zxvf hadoop-1.0.3.tar.gz Pressing yes will add localhost to known hostsĭownload the latest stable release of Apache Hadoop from. Verify ssh configuration using the command.
ssh/authorized_keys file:Ĭat $HOME/.ssh/id_rsa.pub > $HOME/.ssh/authorized_keys
This will install the full JDK under /usr/lib/jvm/java-6-sundirectory. Use the below command to begin the installation of Java $ sudo apt-get install openjdk-6-jdk In this tutorial, we will use Java 1.6 therefore describing the installation of Java 1.6 in detail. In our demonstration, we will be using a default user for running Hadoop.įollow these steps for installing and configuring Hadoop on a single node: You might want to create a dedicated user for running Apache Hadoop but it is not a prerequisite. Installing & Configuring Hadoop in Standalone Mode
Working with Hadoop on Windows also requires Cygwin installation. It can be run on both Windows & Unix but Linux/Unix best support the production environment. However, using Java 1.6 is recommended for running Hadoop. In addition we will have more than one Ubuntu machine playing on the host VM.įully Distributed Mode- It is quite similar to a pseudo distributed environment with the exception that instead of VM the machines/node will be on a real distributed environment.įollowing are some of the prerequisites for configuring Hadoop: Pseudo-Distributed Mode- In a pseudo distributed environment, we will configure more than one machine, one of these to act as a master and the rest as slave machines/node. The configuration in standalone mode is quite straightforward and does not require major changes. Standalone Mode- In standalone mode, we will configure Hadoop on a single machine (e.g.
Through this tutorial I will try and throw light on how to configure Apache Hadoop in Standalone Mode.īefore I get to that, it is important to understand that Hadoop can be run in any of the following three modes: Standalone Mode, Pseudo-Distributed Mode and Fully Distributed Mode. One of the striking features of Hadoop is that it efficiently distributes large amounts of work across a cluster of machines/commodity hardware. Hadoop can be used on a single machine (Standalone Mode) as well as on a cluster of machines (Distributed Mode – Pseudo & Fully). Apache Hadoop is an open source framework for storing and distributed batch processing of huge datasets on clusters of commodity hardware.