Designing workflow with Airflow

I have been using Oozie for a while now and was a little dissatisfied with the tool in terms of managing the Hadoop jobs and not to mention  debugging vague errors. While I was analyzing the substitute workflow engine, the Airflow by Aribnb caught my eye. I’ll skip the introduction for now, you can read more about it here. This post highlights a its key features and demonstration of hadoop job.

Before I begin with the example, I’d like to mention the key advantages of Airflow over other tools:

  • Amazing UI for viewing job flow(DAG), run stats, logs etc.
  • You write an actual Python program instead of ugly configuration files
  • Exceptional monitoring options of batch jobs
  • Ability to query metadata and generate custom charts
  • Contributors in the developer community have mostly worked/evaluated the other similar tools, thus it brings the best of everything as the tool evolves.

Continue reading