Personal Information
Organização/Local de trabalho
San Francisco Bay Area United States
Cargo
Senior Research Engineer at Netflix
Setor
Education
Site
www.dbtsai.com
Sobre
Big Data Machine Learning Engineer with strong computer science, theoretical physics and mathematical background. I've deep understanding of implementing data mining algorithms in a scalable ways, not just using them as consumers.
I'm a big fan of Scala, and have been using it to develop scalable and distributed data mining algorithms with Apache Spark. I've involved with open source Apache Spark development as a contributor. Apache Spark is a fast and general engine for large-scale data processing, and it fits into the Hadoop open-source ecosystem.
Specialties:
• Machine Learning and Data Mining.
• Distributed/Parallel Computing and Big Data Processing.
• Expert in Apache Hadoop
Marcadores
machine learning
spark
mapreduce
hadoop
mllib
alpine data labs
big data
logistic regression
netflix
data mining
apache spark
multinomial
l-bfgs
recommendation
pipeline
kernel methods
linear models
polynomial mapping
feature engineering
linear regression
ml
spark summit
elastic-net
batch layer
serving layer
speed layer
spark streaming
pig
lambda architecture
real time
storm
stream
large scale
iot
internet of things
svd
k-means
unsupervised learning
Ver mais
Apresentações
(9)Gostaram
(4)Distributed Time Travel for Feature Generation at Netflix
sfbiganalytics
•
Há 8 anos
Introducing Windowing Functions (pgCon 2009)
PostgreSQL Experts, Inc.
•
Há 11 anos
Multinomial Logistic Regression with Apache Spark
DB Tsai
•
Há 10 anos
Personal Information
Organização/Local de trabalho
San Francisco Bay Area United States
Cargo
Senior Research Engineer at Netflix
Setor
Education
Site
www.dbtsai.com
Sobre
Big Data Machine Learning Engineer with strong computer science, theoretical physics and mathematical background. I've deep understanding of implementing data mining algorithms in a scalable ways, not just using them as consumers.
I'm a big fan of Scala, and have been using it to develop scalable and distributed data mining algorithms with Apache Spark. I've involved with open source Apache Spark development as a contributor. Apache Spark is a fast and general engine for large-scale data processing, and it fits into the Hadoop open-source ecosystem.
Specialties:
• Machine Learning and Data Mining.
• Distributed/Parallel Computing and Big Data Processing.
• Expert in Apache Hadoop
Marcadores
machine learning
spark
mapreduce
hadoop
mllib
alpine data labs
big data
logistic regression
netflix
data mining
apache spark
multinomial
l-bfgs
recommendation
pipeline
kernel methods
linear models
polynomial mapping
feature engineering
linear regression
ml
spark summit
elastic-net
batch layer
serving layer
speed layer
spark streaming
pig
lambda architecture
real time
storm
stream
large scale
iot
internet of things
svd
k-means
unsupervised learning
Ver mais