SlideShare a Scribd company logo
1 of 20
SRA-SV | Cloud Research LabSRA-SV | Cloud Research Lab
Guangdeng Liao
Zhan Zhang
Samsung Cloud Research Lab
Data Platform at Samsung
SRA-SV | Cloud Research Lab Slide 2
Our Mission: provide scalable, reliable, and secure storage and
computation for Samsung R&D
Samsung Data Platform
Resources:
• Hundreds of machines
• Petabytes of storage
• keep increasing..
SRA-SV | Cloud Research Lab Slide 3
What we have in our platform
Distributed MR processing
Data warehousing with
Hive/Pig
In-house web-based ETL
portal
Many more..
Offline
K-V store HBase
In-house Blob store
Online Storm
Many more..
Online
Apache Mahout
ElasticSearch
In house unified web portal
In house Single Sign On
Visualization
Many more..
Dev. & management tools
By using platform, we already significantly improve ETL process, data
management and processing for other teams!!
SRA-SV | Cloud Research Lab Slide 4
So, are we done?
No. Many more complex challenges.
SRA-SV | Cloud Research Lab Slide 5
Challenge #1: How to build scalable and efficient machine
learning over Big Data?
SRA-SV | Cloud Research Lab Slide 6
MR-based Mahout is good but...
Not good at expressing data dependency and iterative algorithms like PageRank
Map: distribute rank to link targets
Reduce: collect ranks from multiple sources
Iterate








n
i i
i
tC
tPR
N
xPR
1 )(
)(
)1(
1
)( 
One job/iteration
Startup penaltyI/O Penalty
Unfortunately, a lot of MLDM are iterative jobs
SRA-SV | Cloud Research Lab Slide 7
Graph naturally represents data dependency
SRA-SV | Cloud Research Lab Slide 8
Graph-based Processing: Think like a Vertex
Scheduling
p p
p
p
p
p
p
In-memory data graph over a cluster
Communication
– Message-based
– Shared memory-
based
Vertex abstraction
– Think like a vertex’s
– In-memory processing
Execution engine
– Bulk synchronous
parallel
– Asynchronous parallel
Popular frameworks:
– Giraph
– GraphLab
SRA-SV | Cloud Research Lab Slide 9
Graph-based Machine Learning
We used Apache Giraph 1.0 and developed machine learning library over it:
Alternative Least Square
(ALS)
Weight ALS
SGD ( Matrix Factorization)
Bias SGD
Belief Propagation
Recommendation Graphical Model
KMeans
KMeans++
Fuzzy-Clustering
Clustering
We see one magnitude order of speedups compared to MR-based approach
in our cluster
SRA-SV | Cloud Research Lab Slide 10
Challenge #2: How to make Big Model + Big Data like Deep
Learning scalable and efficient?
SRA-SV | Cloud Research Lab Slide 11
One example: Deep Learning1
Many more examples (millions to billions parameters ) in Speech
Recognition, Image Processing and NLP
1Imagenet classification with deep convolutional neural networks, in NIPS 2012
SRA-SV | Cloud Research Lab Slide 12
Model-Parallel Framework
User
defined
model
Auto-generation
of model topology
Auto-partition of
topology over
cluster
c1
c2
Auto-deployment
of topology (in-
memory)
c3
Neuron-like
programming
Message-based
communication
Message-driven
computation
Parallelize a big machine learning model over a cluster
SRA-SV | Cloud Research Lab Slide 13
Architecture over Yarn
Node Manager
Node manager
Controller
Partition and
deploy topology
Node manager
Application Master
Container
Container
Container
Data Communication:
• node-level
• group-level
Control comm. based on
Thrift
Data comm. based on Netty
SRA-SV | Cloud Research Lab Slide 14
Execution Engine
• Execution Engine (Deep Neural Net)
– Training layer by layer controlled by
Execution Engine..
– Progress reporting
– Process control: end user can control the
training process, and even restart the
process from a certain point
– System snapshot for fault tolerance
Input
RBM
RBMSoftmax
Fully connected
• Generic Execution Engine
– Abstract the common design pattern from our development
experiences of deep neural net algorithm.
– Generalized to support various other algorithms
SRA-SV | Cloud Research Lab Slide 15
Model-parallel is still not scalable enough over Big Data
SRA-SV | Cloud Research Lab Slide 16
Deep Learning Platform: Hybrid of Data-parallelism and Model-
parallelism
……..Data Chunk
Model-parallel Model-parallel
Data Chunk
……..
Parameter
Server 1
Parameter
Server n
……..
Parameters coordination
Data-parallelism
Lots of model
instances
Parameter servers
help models learn
each other
SRA-SV | Cloud Research Lab Slide 17
Distributed Parameter Servers
Client Client Client
HBase/HDFS
In-memory
cache/storage
In-memory
cache/storage
In-memory
cache/storage
Server 1 Server 2 Server 3
Netty communication layer
Currently we support asynchronous parameter pulls and push
Synchronized version is also supported
Pull/Push/Sync
SRA-SV | Cloud Research Lab Slide 18
Deep Learning Algorithms
Aim at three major application fields: speech recognition, image
processing and NLP
What we have developed Our Roadmap
Feed Forward Neural Network
Restricted Boltzmann Machine
Deep Belief Network
Sparse Auto-encoder
Convolutional Neural Network
Recurrent Neural Network
SRA-SV | Cloud Research Lab Slide 19
Summary
• We are providing our Hadoop-based data platform
– hundreds machines, petabytes of storages
– Hadoop ecosystem (MapReduce, HBase, Yarn, HDFS, Zookeeper, Oozie, Lipstick, Mahout etc.)
– In-house ETL pipeline
– In-house unified web portal with SSO
• We are working hard on big learning to make our platform intelligent
– Large-scale graph-based machine learning
– Large-scale deep learning
– And many more under progress
Q&A

More Related Content

What's hot

Apache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine LearningApache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine LearningRomeo Kienzler
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...Spark Summit
 
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep LearningLeveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep LearningDatabricks
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Spark Summit
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Spark Summit
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterJen Aman
 
Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita ShamgunovReal-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita ShamgunovDatabricks
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approachPavel Mezentsev
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkDatabricks
 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...Databricks
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On SparkSpark Summit
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...Adam Gibson
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache SparkGeospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache SparkDatabricks
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
How to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsHow to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsDatabricks
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersAhsan Javed Awan
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileDatabricks
 

What's hot (20)

Apache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine LearningApache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine Learning
 
MLeap: Release Spark ML Pipelines
MLeap: Release Spark ML PipelinesMLeap: Release Spark ML Pipelines
MLeap: Release Spark ML Pipelines
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
 
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep LearningLeveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
 
Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita ShamgunovReal-Time Image Recognition with Apache Spark with Nikita Shamgunov
Real-Time Image Recognition with Apache Spark with Nikita Shamgunov
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approach
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
Which Is Deeper - Comparison Of Deep Learning Frameworks On Spark
 
Self driving computers active learning workflows with human interpretable ve...
Self driving computers  active learning workflows with human interpretable ve...Self driving computers  active learning workflows with human interpretable ve...
Self driving computers active learning workflows with human interpretable ve...
 
Paddle_Spark_Summit
Paddle_Spark_SummitPaddle_Spark_Summit
Paddle_Spark_Summit
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache SparkGeospatial Analytics at Scale with Deep Learning and Apache Spark
Geospatial Analytics at Scale with Deep Learning and Apache Spark
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
How to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML modelsHow to use Apache TVM to optimize your ML models
How to use Apache TVM to optimize your ML models
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 

Similar to Data platform at Samsung (Big Learning)

Building distributed deep learning engine
Building distributed deep learning engineBuilding distributed deep learning engine
Building distributed deep learning engineGuangdeng Liao
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWSJonathan Holloway
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Insight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerInsight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerSynerzip
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Pradeeban Kathiravelu, Ph.D.
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create PyData
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so differentRyan Dawson
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1Milind gunjan
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in SparkSnappyData
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduceGareth Rogers
 

Similar to Data platform at Samsung (Big Learning) (20)

Building distributed deep learning engine
Building distributed deep learning engineBuilding distributed deep learning engine
Building distributed deep learning engine
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
Tutorial4
Tutorial4Tutorial4
Tutorial4
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Insight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerInsight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark Kerzner
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so different
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
 

Recently uploaded

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 

Recently uploaded (20)

Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 

Data platform at Samsung (Big Learning)

  • 1. SRA-SV | Cloud Research LabSRA-SV | Cloud Research Lab Guangdeng Liao Zhan Zhang Samsung Cloud Research Lab Data Platform at Samsung
  • 2. SRA-SV | Cloud Research Lab Slide 2 Our Mission: provide scalable, reliable, and secure storage and computation for Samsung R&D Samsung Data Platform Resources: • Hundreds of machines • Petabytes of storage • keep increasing..
  • 3. SRA-SV | Cloud Research Lab Slide 3 What we have in our platform Distributed MR processing Data warehousing with Hive/Pig In-house web-based ETL portal Many more.. Offline K-V store HBase In-house Blob store Online Storm Many more.. Online Apache Mahout ElasticSearch In house unified web portal In house Single Sign On Visualization Many more.. Dev. & management tools By using platform, we already significantly improve ETL process, data management and processing for other teams!!
  • 4. SRA-SV | Cloud Research Lab Slide 4 So, are we done? No. Many more complex challenges.
  • 5. SRA-SV | Cloud Research Lab Slide 5 Challenge #1: How to build scalable and efficient machine learning over Big Data?
  • 6. SRA-SV | Cloud Research Lab Slide 6 MR-based Mahout is good but... Not good at expressing data dependency and iterative algorithms like PageRank Map: distribute rank to link targets Reduce: collect ranks from multiple sources Iterate         n i i i tC tPR N xPR 1 )( )( )1( 1 )(  One job/iteration Startup penaltyI/O Penalty Unfortunately, a lot of MLDM are iterative jobs
  • 7. SRA-SV | Cloud Research Lab Slide 7 Graph naturally represents data dependency
  • 8. SRA-SV | Cloud Research Lab Slide 8 Graph-based Processing: Think like a Vertex Scheduling p p p p p p p In-memory data graph over a cluster Communication – Message-based – Shared memory- based Vertex abstraction – Think like a vertex’s – In-memory processing Execution engine – Bulk synchronous parallel – Asynchronous parallel Popular frameworks: – Giraph – GraphLab
  • 9. SRA-SV | Cloud Research Lab Slide 9 Graph-based Machine Learning We used Apache Giraph 1.0 and developed machine learning library over it: Alternative Least Square (ALS) Weight ALS SGD ( Matrix Factorization) Bias SGD Belief Propagation Recommendation Graphical Model KMeans KMeans++ Fuzzy-Clustering Clustering We see one magnitude order of speedups compared to MR-based approach in our cluster
  • 10. SRA-SV | Cloud Research Lab Slide 10 Challenge #2: How to make Big Model + Big Data like Deep Learning scalable and efficient?
  • 11. SRA-SV | Cloud Research Lab Slide 11 One example: Deep Learning1 Many more examples (millions to billions parameters ) in Speech Recognition, Image Processing and NLP 1Imagenet classification with deep convolutional neural networks, in NIPS 2012
  • 12. SRA-SV | Cloud Research Lab Slide 12 Model-Parallel Framework User defined model Auto-generation of model topology Auto-partition of topology over cluster c1 c2 Auto-deployment of topology (in- memory) c3 Neuron-like programming Message-based communication Message-driven computation Parallelize a big machine learning model over a cluster
  • 13. SRA-SV | Cloud Research Lab Slide 13 Architecture over Yarn Node Manager Node manager Controller Partition and deploy topology Node manager Application Master Container Container Container Data Communication: • node-level • group-level Control comm. based on Thrift Data comm. based on Netty
  • 14. SRA-SV | Cloud Research Lab Slide 14 Execution Engine • Execution Engine (Deep Neural Net) – Training layer by layer controlled by Execution Engine.. – Progress reporting – Process control: end user can control the training process, and even restart the process from a certain point – System snapshot for fault tolerance Input RBM RBMSoftmax Fully connected • Generic Execution Engine – Abstract the common design pattern from our development experiences of deep neural net algorithm. – Generalized to support various other algorithms
  • 15. SRA-SV | Cloud Research Lab Slide 15 Model-parallel is still not scalable enough over Big Data
  • 16. SRA-SV | Cloud Research Lab Slide 16 Deep Learning Platform: Hybrid of Data-parallelism and Model- parallelism ……..Data Chunk Model-parallel Model-parallel Data Chunk …….. Parameter Server 1 Parameter Server n …….. Parameters coordination Data-parallelism Lots of model instances Parameter servers help models learn each other
  • 17. SRA-SV | Cloud Research Lab Slide 17 Distributed Parameter Servers Client Client Client HBase/HDFS In-memory cache/storage In-memory cache/storage In-memory cache/storage Server 1 Server 2 Server 3 Netty communication layer Currently we support asynchronous parameter pulls and push Synchronized version is also supported Pull/Push/Sync
  • 18. SRA-SV | Cloud Research Lab Slide 18 Deep Learning Algorithms Aim at three major application fields: speech recognition, image processing and NLP What we have developed Our Roadmap Feed Forward Neural Network Restricted Boltzmann Machine Deep Belief Network Sparse Auto-encoder Convolutional Neural Network Recurrent Neural Network
  • 19. SRA-SV | Cloud Research Lab Slide 19 Summary • We are providing our Hadoop-based data platform – hundreds machines, petabytes of storages – Hadoop ecosystem (MapReduce, HBase, Yarn, HDFS, Zookeeper, Oozie, Lipstick, Mahout etc.) – In-house ETL pipeline – In-house unified web portal with SSO • We are working hard on big learning to make our platform intelligent – Large-scale graph-based machine learning – Large-scale deep learning – And many more under progress
  • 20. Q&A