27-29 November, Vilnius

Conference about Big Data, High Load, Data Science, Machine Learning & AI

Early Bird Ends In:

Day(s)

:

Hour(s)

:

Minute(s)

:

Second(s)

GERARD TOONSTRA

BigData Republic, The Netherlands

GERARD TOONSTRA

BigData Republic, The Netherlands

Biography

Gerard Toonstra is an Apache Airflow enthousiast and is excited about it ever since it was announced as open source. He was the initial contributor of the HttpHook and HttpOperator and set up a site “ETL with airflow”, which is one of the richest practical sources of information about Apache Airflow. Gerard has a background in nautical engineering, but works in information technology since 1998, after which he worked in different engineering positions in the UK, The Netherlands and Brazil.
He now works at BigData Republic in The Netherlands as BigData Architect / Engineer. BigData Republic is a multidisciplinary team of experienced and business oriented Data Scientists, Data Engineers, and Architects. Irrespective of an organization’s data maturity level, we help to translate business goals into the design, implementation and utilization of innovative solutions. In his spare time Gerard likes oil painting and in his holidays visit a beautiful beach in Brazil to read spy novels or psychology books.

Talk

Design philosophy of Apache Airflow ETL Pipelines

Apache Airflow is attracting a lot of attention over the past couple of years. This session explains very important principles that should be maintained in your ETL pipelines to make them scalable and restartable; many of these principles have been known for years in functional programming communities. Apache Airflow is designed around that philosophy and naturally guides the developer towards better and more scalable pipelines.

Workshop

Apache Airflow hands on

Apache Airflow is attracting more attention worldwide as a de-facto ETL platform. As the author of the site “ETL with airflow”, I’d like to share this knowledge and get novices up to speed with Apache Airflow as their ETL platform. Learn how to write your first DAG in python, email notifications, scheduler configuration, writing your own hooks and operators and pointing you towards important principles to maintain when composing your dags.

Apache Airflow has become a very popular tool for running ETL, machine learning and data processing pipelines. Embedded in the implementation are the insights and learnings from years of experience in data engineering.

The workshop explains what these principles are and how they can be achieved rather effortlessly by putting the components of Apache Airflow together in a data processing workflow.