Controlling Hadoop Jobs using Oozie
Learn how to schedule and manage Hadoop jobs using Apache Oozie. Explore its workflow automation capabilities to optimize big data processing.
At a Glance
This Apache Oozie course teaches you how to control Hadoop jobs on your Big Data projects.
About This Course
This course gives an overview of Oozie and how it is able to control Hadoop jobs. It begins with looking at the components required to code a workflow as well as optional components such as case statements, forks, and joins. That is followed by using the Oozie coordinator in order to schedule a workflow.
One of the things that the student will quickly notice is that workflows are coded using XML which tends to get verbose. The last lesson of this course shows a graphical workflow editor tool designed to simplify the work in generating a workflow.
Course Syllabus
After completing this course, you should be able to:
- Describe the MapReduce model v1
- List the limitations of Hadoop 1 and MapReduce 1
- Review the Java code required to handle the Mapper class, the Reducer class, and the program driver needed to access MapReduce
- Describe the YARN model
- Compare YARN / Hadoop 2 / MR2 vs Hadoop 1 / MR1
Requirements
Have taken the Hadoop Fundamentals course on Big Data University.
Recommended skills prior to taking this course
- Basic understanding of Apache Hadoop and Big Data.
- Basic Linux Operating System knowledge.
- Basic understanding of the Scala, Python, or Java programming languages.
Course Staff
Glen R.J. Mules
Glen R.J. Mules is a Senior Instructor and Principal Consultant with IBM Information
Management World-Wide Education and works from New Rochelle, NY. Glen joined IBM in 2001
as a result of IBM’s acquisition of Informix Software. He has worked at IBM, and previously at
Informix Software, as an instructor, a course developer, and in the enablement of instructors
worldwide. He teaches courses in BigData (BigInsights & Streams), Optim, Guardium, and
DB2, & Informix databases. He has a BSc in Mathematics from the University of Adelaide, South
Australia; an MSc in Computer Science from the University of Birmingham, England; and has just
completed a PhD in Education (Educational Technology) at Walden University. His early work
life was as a high school teacher in Australia. In the 1970s he designed, programmed, and managed
banking systems in Manhattan and Boston. In the 1980s he was a VP in Electronic Payments for
Bank of America in San Francisco and New York.
In the early 1990s he was an EVP in Marketing for
a software development company and chaired the
ANSI X12C Standards Committee on Data Security
for Electronic Data Interchange (EDI).
Warren Pettit
Warren Pettit has been with IBM for over 30 years. For the last 16 years, he has worked in Information Management education where he has been both
an instructor and a course developer in the Data Warehouse and Big Data
curriculums. For the nine years prior to his joining IBM, he was an application
programmer and was responsible for developing a training program for newly
hired programmers.
There are no reviews yet.