Programmatically creating and managing training data with Snorkel
Today's state-of-the-art machine learning models are more powerful and easy to use than ever before, however, they require massive amounts of training data. Traditionally, these training datasets require slow and often prohibitively expensive manual labeling by domain experts.
Instead, in Snorkel, users write "labeling functions" to heuristically label data; Snorkel then uses modern, theoretically-grounded modeling techniques to clean and integrate the resulting training data, without requiring any manual labeling. In a wide range of applications from medical image monitoring to text information extraction to industrial deployments over web data, Snorkel provides a radically faster and more flexible to build machine learning applications, by letting users programmatically build and manipulate training data rather than label it by hand.
- PhD in Computer Science from Stanford University, with prior work at Google, Facebook, and MIT.
- Researches machine learning systems, focusing on how to get supervision signal from a human into a model as quickly, easily, and efficiently as possible.
- Core developer of the Snorkel open-source project, recently featured on the Google AI blog.