Migrating Etsy infrastructure from On-premises to Google Cloud Platform
Etsy is one of the largest and best-known specialty online marketplaces worldwide, with gross sales in 2017 exceeding $3 Billion. Etsy was founded in 2005, before the emergence of viable cloud platforms. Until recently, all of Etsy's critical systems -including production and analytics data stacks - were hosted and managed on premises. In 2017, the decision was made to migrate all infrastructure to Google Cloud Platform (GCP), to become operational in 2018. This talk describes the migration, with a focus on moving Etsy's analytics data systems. The Etsy Analytics Data Stack consists of Hadoop for large batch jobs, Vertica for data analysis, and Kafka for clickstream and production data distribution, as well as custom tools for Data Science projects and ETL processes. In addition to migrating legacy technologies to GCP, Etsy has also integrated native GCP data products such as Big Query (big data processing) and Airflow (workflow management replacing Oozie).
The technical challenges and cloud economics of the migration will be discussed. This has been a very large project that has gone well, due to good planning and building the right teams. Anyone considering migrating infrastructure to the cloud, especially to GCP, will benefit from hearing about Etsy's challenges and solutions.
- Chris Bohn ("CB") is a Senior Database Engineer at Etsy.com.
- He has worked at Etsy since 2007 and been involved with the architecture and implementation of its production OLTP data systems (PostgreSQL and MySQL) and analytics databases (Vertica, Hadoop, BigQuery).
- He is a graduate of the University of California, Berkeley.