Theofilos Kakantousis

Logical Clocks AB, Sweden

Theofilos Kakantousis is a co-founder of Logical Clocks AB, the main developers of Hops Hadoop (www.hops.io). He received his MSc in Distributed Systems from KTH in 2014. He has previously worked as a middleware consultant at Oracle, Greece, as well  a research engineer at SICS Swedish ICT, Stockholm. He frequently gives talks on Hops Hadoop, and has presented Hops at venues such as Strata San Jose/New York and Big Data Tech Warsaw.

Topic: Multi-tenant Streaming and TensorFlow as a Service with Hops

Hops is a new European version of Apache Hadoop that introduces new concepts to Hadoop to enable multi-tenant Streaming-as-a-Service and TensorFlow-as-a-Service. In particular, Hops introduces the abstractions: projects, datasets and users. Projects are containers for datasets and users, and are aimed at removing the need for users to manage and launch clusters today, as clusters are currently the only strong mechanisms for isolating users and their data from one another. Our platform for managing datasets and running jobs, called Hopsworks, builds on Hops concepts and is in an entirely UI-driven environment implemented with only open-source software. In this talk we will discuss the challenges and experiences in building secure streaming applications on both Spark and Flink with Kafka over YARN using Hopsworks. We also show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark applications, how we use Grafana and Vizops (an in-house developed monitoring tool) with InfluxDB to monitor Spark applications and finally how Apache Zeppelin and Jupyter can provide interactive visualizations and charts to end-users. We also discuss how Hopsworks provides TensorFlow-as-a-Service with Distributed TensorFlow and Yahoo’s TensorFlowOnSpark. Users can debug applications using Tensorboard and SparkUI, examine logs and monitor training. Moreover, we will show how Hopsworks simplifies discovering and downloading huge datasets between Hopsworks clusters using a custom peer-to-peer sharing tool. Users can, within minutes, install Hopsworks, discover curated important datasets and download them to either apply their business logic with a streaming application or train Deep Neural networks using TensorFlow. We will also discuss our experiences running Streaming-as-a-Service and TensorFlow-as-a-Service on a cluster in Sweden with over 200 users (as of mid 2017).