26-28 November, 2019, Vilnius

Early Bird Ends In:

Day(s)

:

Hour(s)

:

Minute(s)

:

Second(s)

26-28 November, 2019, Vilnius

Early Bird Ends In:

Day(s)

:

Hour(s)

:

Minute(s)

:

Second(s)

Maciej Marek

Philip Morris International, Poland

Biography

Enterprise Data Scientist and CI/CD Best Practices Ambassador at Philip Morris International. Passionate about Big Data building blocks, Spark lover and Machine Learning enthusiast. Sea, air and roads change Maciej’s perspective and give him energy for new challenges.

Talk

Data Science at PMI - The Tools of The Trade

Data Science is not a one man show. It is a team effort that requires every team member to master the tools of the trade. This is extremely important for effectively putting data science to work in a global organization. In this talk Maciej would like to share with you the best practices to start, develop and ship data science products developed inside PMI – the best practices and tools, currently in use by 40+ data scientists across four different location, where data science labs of PMI were established in 2017.
He would like to share with you how the technologies (Docker, Artifactory, Jenkins) and methods (templates in Cookiecutter, CI/CD with GitFlow) well-known from software engineering are helping us in creating data science workflow that adapts to specific needs of every peculiar use case we need to deal with, provides transparency at all times, is reproducible not only at the data science but also data engineering and devops dimensions and allows at the same time frictionless development of data products and gives us the freedom to experiment.
Maciej would also discuss the “transition challenges” and share some practical hints – for moving from pure exploration in Jupyter to building pip packages that will be put into production as well as for moving from data-to-code to code-to-data approaches in data science challenges.

If you’re interested in how Python, Jupyter notebooks, Docker, AWS, Hadoop ecosystem, Spark, Artifactory, Jenkins, Atlassian suite, etc. are setup to support our collaborative work, devoted to building predictive models, this talk is for you.

Session Keywords

CI/CD
Data Product
Reproducible research
Best Practices for Data Science