Machine learning researchers and practitioners develop computer
algorithms that "improve performance automatically through
experience". At Google, machine learning is applied to solve many
problems, such as prioritizing emails in Gmail, recommending tags for
YouTube videos, and identifying different aspects from online user
reviews. Machine learning on big data, however, is challenging. Some
"simple" machine learning algorithms with quadratic time complexity,
while running fine with hundreds of records, are almost impractical to
use on billions of records.
In this talk, I will describe lessons drawn from various Google
projects on developing large scale machine learning systems. These
systems build on top of Google's computing infrastructure such as GFS
and MapReduce, and attack the scalability problem through massively
parallel algorithms. I will present the design decisions made in
these systems, strategies of scaling and speeding up machine learning
systems on web scale data.
Max Lin is a software engineer with Google Research in New York City
office. He is the tech lead of the Google Prediction API, a machine
learning web service in the cloud. Prior to Google, he published
research work on video content analysis, sentiment analysis, machine
learning, and cross-lingual information retrieval. He had a PhD in
Computer Science from Carnegie Mellon University.