I have been using Oozie for a while now and was a little dissatisfied with the tool in terms of managing the Hadoop jobs and not to mention debugging vague errors. While I was analyzing the substitute workflow engine, the Airflow by Aribnb caught my eye. I’ll skip the introduction for now, you can read more about it here. This post highlights a its key features and demonstration of hadoop job.
Before I begin with the example, I’d like to mention the key advantages of Airflow over other tools:
- Amazing UI for viewing job flow(DAG), run stats, logs etc.
- You write an actual Python program instead of ugly configuration files
- Exceptional monitoring options of batch jobs
- Ability to query metadata and generate custom charts
- Contributors in the developer community have mostly worked/evaluated the other similar tools, thus it brings the best of everything as the tool evolves.