Applications of Multimodal Learning in media search engines

Talk presentation

Building a high-quality and robust search engine for visual content is a complex problem that can be addressed in many different ways.

In this talk, I will show how we leveraged anonymized user engagement data as well as various metadata sources of a media platform to build a multimodal vector space of search queries, tags, and gifs. We consider this space a compact representation of the environment at hand, which allows us to model user behavior and preferences. We will discuss approaches that we utilized at different stages of the project and application of these embeddings in numerous services, including search and recommender system.

Dmitry Voitekh

Proxet

Machine Learning Engineer at Proxet
Develops end-to-end ML products
Fond of music and cars
LinkedIn, Twitter, GitHub