27-29 November, Vilnius

Conference about Big Data, High Load, Data Science, Machine Learning & AI

Early Bird Ends In:

Day(s)

:

Hour(s)

:

Minute(s)

:

Second(s)

GERARD TOONSTRA

BigData Republic, The Netherlands

Biography

Gerard Toonstra is an Apache Airflow enthousiast and is excited about it ever since it was announced as open source. He was the initial contributor of the HttpHook and HttpOperator and set up a site “ETL with airflow”, which is one of the richest practical sources of information about Apache Airflow. Gerard has a background in nautical engineering, but works in information technology since 1998, after which he worked in different engineering positions in the UK, The Netherlands and Brazil.
He now works at BigData Republic in The Netherlands as BigData Architect / Engineer. BigData Republic is a multidisciplinary team of experienced and business oriented Data Scientists, Data Engineers, and Architects. Irrespective of an organization’s data maturity level, we help to translate business goals into the design, implementation and utilization of innovative solutions. In his spare time Gerard likes oil painting and in his holidays visit a beautiful beach in Brazil to read spy novels or psychology books.

Workshop

Apache Airflow hands on

Apache Airflow is attracting more attention worldwide as a de-facto ETL platform. As the author of the site “ETL with airflow”, I’d like to share this knowledge and get novices up to speed with Apache Airflow as their ETL platform. Learn how to write your first DAG in python, email notifications, scheduler configuration, writing your own hooks and operators and pointing you towards important principles to maintain when composing your dags.

Apache Airflow has become a very popular tool for running ETL, machine learning and data processing pipelines. Embedded in the implementation are the insights and learnings from years of experience in data engineering.

The workshop explains what these principles are and how they can be achieved rather effortlessly by putting the components of Apache Airflow together in a data processing workflow.

Agenda

Installing Apache Airflow – 45 mins

  • Introduction
  • Run Apache Airflow on docker

Exploring the UI – 45 mins

  • Monitoring DAG statuses
  • Administrative tasks
  • DAG detail screens

Your first DAG – 45 mins

  • Setting a DAG schedule
  • Start date and execution date
  • Understanding macros

Failure emails, SLA’s – 45 mins

  • When tasks fail
  • Sending custom emails
  • SLA’s and their uses

Applying best practices – 2 hours

  • Explaining best practices
  • Implement them in airflow

Extending airflow – 45 mins

  • How to build your own hook
  • How to build your own operator

Deploying airflow – 45 mins

  • System components
  • Important things to keep in mind
  • PaaS solutions

Round off – 45 mins

  • Room for questions and exploration

Course objectives

The workshop allows you to get your wet feet with the Apache Airflow platform. No prior knowledge is assumed. The workshop focuses on getting familiar with the user interface, building and configuring a data processing workflow and building pipelines that adhere to best practices. The objective is to make you walk away with a rough understanding what Apache Airflow can do for your company and the challenges you will face that are specific to your organization.

Target audience

This workshop is hands on, the intended audience are people who have basic code reading abilities. All sessions rely on pre-existing code, so no code will be developed from scratch

Course prerequisites

A laptop, notebook or macbook with internet connection, ideally with docker preinstalled.

DATE:
27 November, 2018

TIME:
10:00-17:30

Venue to be confirmed

RESERVE YOUR SEAT

DATE:
27 November, 2018

TIME:
10:00-17:30

Venue to be confirmed

RESERVE YOUR SEAT