Applications of Multimodal Learning in media search engines
Talk presentation
Building a high-quality and robust search engine for visual content is a complex problem that can be addressed in many different ways.
In this talk, I will show how we leveraged anonymized user engagement data as well as various metadata sources of a media platform to build a multimodal vector space of search queries, tags, and gifs. We consider this space a compact representation of the environment at hand, which allows us to model user behavior and preferences. We will discuss approaches that we utilized at different stages of the project and application of these embeddings in numerous services, including search and recommender system.