27-29 November, Vilnius
Conference about Big Data, High Load, Data Science, Machine Learning & AI
Early Bird Ends In:
BigData Republic, The Netherlands
Gerard Toonstra is an Apache Airflow enthousiast and is excited about it ever since it was announced as open source. He was the initial contributor of the HttpHook and HttpOperator and set up a site “ETL with airflow”, which is one of the richest practical sources of information about Apache Airflow. Gerard has a background in nautical engineering, but works in information technology since 1998, after which he worked in different engineering positions in the UK, The Netherlands and Brazil.
He now works at BigData Republic in The Netherlands as BigData Architect / Engineer. BigData Republic is a multidisciplinary team of experienced and business oriented Data Scientists, Data Engineers, and Architects. Irrespective of an organization’s data maturity level, we help to translate business goals into the design, implementation and utilization of innovative solutions. In his spare time Gerard likes oil painting and in his holidays visit a beautiful beach in Brazil to read spy novels or psychology books.
Apache Airflow hands on
Apache Airflow is attracting more attention worldwide as a de-facto ETL platform. As the author of the site “ETL with airflow”, I’d like to share this knowledge and get novices up to speed with Apache Airflow as their ETL platform. Learn how to write your first DAG in python, email notifications, scheduler configuration, writing your own hooks and operators and pointing you towards important principles to maintain when composing your dags.
Apache Airflow has become a very popular tool for running ETL, machine learning and data processing pipelines. Embedded in the implementation are the insights and learnings from years of experience in data engineering.
The workshop explains what these principles are and how they can be achieved rather effortlessly by putting the components of Apache Airflow together in a data processing workflow.
- Run Apache Airflow on docker
Exploring the UI – 45 mins
- Monitoring DAG statuses
- Administrative tasks
- DAG detail screens
Your first DAG – 45 mins
- Setting a DAG schedule
- Start date and execution date
- Understanding macros
Failure emails, SLA’s – 45 mins
- When tasks fail
- Sending custom emails
- SLA’s and their uses
Applying best practices – 2 hours
- Explaining best practices
- Implement them in airflow
Extending airflow – 45 mins
- How to build your own hook
- How to build your own operator
Deploying airflow – 45 mins
- System components
- Important things to keep in mind
- PaaS solutions
Round off – 45 mins
- Room for questions and exploration
The workshop allows you to get your wet feet with the Apache Airflow platform. No prior knowledge is assumed. The workshop focuses on getting familiar with the user interface, building and configuring a data processing workflow and building pipelines that adhere to best practices. The objective is to make you walk away with a rough understanding what Apache Airflow can do for your company and the challenges you will face that are specific to your organization.
This workshop is hands on, the intended audience are people who have basic code reading abilities. All sessions rely on pre-existing code, so no code will be developed from scratch
A laptop, notebook or macbook with internet connection, ideally with docker preinstalled.