26-28 November, 2019, Vilnius

Conference Starts in:

Day(s)

:

Hour(s)

:

Minute(s)

:

Second(s)

Michał Dyrda

Philip Morris International, Poland

Biography

Senior Enterprise Data Scientist dealing with all possible scales of data sets from small to big data. CI/CD and Data Science best practices evangelist and trainer @PMI. Addicted to long-distance runs. 

Talk

Data Science at PMI - The Tools of The Trade

Data Science is not a one man show. It is a team effort that requires every team member to master the tools of the trade. This is extremely important for effectively putting data science to work in a global organization. In this talk Michal would like to share with you the best practices to start, develop and ship data science products developed inside PMI – the best practices and tools, currently in use by 40+ data scientists across four different location, where data science labs of PMI were established in 2017.
He would like to share with you how the technologies (Docker, Artifactory, Jenkins) and methods (templates in Cookiecutter, CI/CD with GitFlow) well-known from software engineering are helping us in creating data science workflow that adapts to specific needs of every peculiar use case we need to deal with, provides transparency at all times, is reproducible not only at the data science but also data engineering and devops dimensions and allows at the same time frictionless development of data products and gives us the freedom to experiment.
Michal would also discuss the “transition challenges” and share some practical hints – for moving from pure exploration in Jupyter to building pip packages that will be put into production as well as for moving from data-to-code to code-to-data approaches in data science challenges.

If you’re interested in how Python, Jupyter notebooks, Docker, AWS, Hadoop ecosystem, Spark, Artifactory, Jenkins, Atlassian suite, etc. are setup to support our collaborative work, devoted to building predictive models, this talk is for you.

Session Keywords

Best Practices for Data Science
CI/CD