Apache Oozie Configuration with Hadoop 2.6.0

My idea of writing this post is to help people who are trying to install Oozie with Hadoop 2+ environment. As I had to refer different places for fixing the errors which I encountered during the process. Here’s it goes..

Step 1: Download Oozie 4.1 from the Apache URL and save the tarball to any directory

cd ~/Downloads
tar -zxf oozie-4.1.0.tar.gz
sudo mv oozie-4.1.0 /usr/local/oozie-4.1.0

Step 2: Assuming you have maven installed, if not, refer to the installation instructions here

Step 3: Update the pom.xml to change the default hadoop version to 2.3.0. The reason we’re not changing it to hadoop version 2.6.0 here is because 2.3.0-oozie-4.1.0.jar is the latest available jar file. Luckily it works with higher versions in 2.x series

cd /usr/local/oozie-4.1.0
vim pom.xml
--Search for
<hadoop.version>1.1.1</hadoop.version>
--Replace it with
<hadoop.version>2.3.0</hadoop.version>

Step 4: Build Oozie executable

mvn clean package assembly:single -P hadoop-2 -DskipTests

Step 5: The executable will be generated in the target sub directory under distro dir. Move it to a new folder under /usr/local/

cd ~/Downloads/oozie-4.1.0/distro/target
tar -zxf oozie-4.1.0-distro.tar.gz
sudo mv oozie-4.1.0 /user/local/oozie-4.1.0

Continue reading

How do I begin with Hadoop?

“Tell me and I forget. Teach me and I remember. Involve me and I learn.”

                                                                                   -Benjamin Franklin

I’m a big fan of practical learning, “implement as you learn” is my mantra for learning anything. Hadoop being open source gives the best opportunity for getting your hands dirty as you read about it. There are plenty of free resources online that you can refer to get started with and in this post, I’m going to list and refer some of the good ones I’ve come across.

Getting Started with Hadoop

Depending on your level of interest in learning and exploring Hadoop, you can enroll in any of the free online fundamental courses offered from Big Data University or watch video tutorials form edureka on YouTube. These two sources do not require a sign in from your corporate email id and give a basic overview on what Hadoop is? And of-course the documentation provided by Apache helps in understanding it detail, alternatively you can read the Yahoo Hadoop tutorial. Continue reading