Migrating Etsy infrastructure from On-premises to Google Cloud Platform

Talk video

Talk presentation

Data Science

Etsy is one of the largest and best-known specialty online marketplaces worldwide, with gross sales in 2017 exceeding $3 Billion. Etsy was founded in 2005, before the emergence of viable cloud platforms. Until recently, all of Etsy's critical systems -including production and analytics data stacks - were hosted and managed on premises. In 2017, the decision was made to migrate all infrastructure to Google Cloud Platform (GCP), to become operational in 2018. This talk describes the migration, with a focus on moving Etsy's analytics data systems. The Etsy Analytics Data Stack consists of Hadoop for large batch jobs, Vertica for data analysis, and Kafka for clickstream and production data distribution, as well as custom tools for Data Science projects and ETL processes. In addition to migrating legacy technologies to GCP, Etsy has also integrated native GCP data products such as Big Query (big data processing) and Airflow (workflow management replacing Oozie).

The technical challenges and cloud economics of the migration will be discussed. This has been a very large project that has gone well, due to good planning and building the right teams. Anyone considering migrating infrastructure to the cloud, especially to GCP, will benefit from hearing about Etsy's challenges and solutions.

Chris Bohn
Etsy.com
  • Chris Bohn ("CB") is a Senior Database Engineer at Etsy.com.
  • He has worked at Etsy since 2007 and been involved with the architecture and implementation of its production OLTP data systems (PostgreSQL and MySQL) and analytics databases (Vertica, Hadoop, BigQuery).
  • He is a graduate of the University of California, Berkeley.
Sign in
Or by mail
Sign in
Or by mail
Register with email
Register with email
Forgot password?

Pay for other participants Cancel
Tickets will be sent on users emails
Add promo code to get discount Cancel