1. Jongwook Woo
HiPIC
CalStateLA
2019 i-CON (개방형 혁신 네트워크)
컨퍼런스
Dec 3 2019
Jongwook Woo, PhD, jwoo5@calstatela.edu
Big Data AI Center (BigDAI)
California State University Los Angeles
AI시대 오픈이노베이션의 중요성
2. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
개방형 혁신 오픈 네트워크 소개
New Technology: Big Data and Deep Learning
개방형 혁신 오픈 네트워크
3. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself
Experience:
Since 2002, Professor at California State University Los Angeles
– PhD in 2001: Computer Science and Engineering at USC
4. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Myself: S/W Development Lead
http://www.mobygames.com/game/windows/matrix-online/credits
5. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Collaboration with HDP, CDH, Oracle, Amazon
using Hadoop Big Data
https://www.cloudera.com/more/customers/csula.html
6. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
개방형 혁신 오픈 네트워크 소개
New Technology: Big Data and Deep Learning
개방형 혁신 오픈 네트워크
7. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
개방형 혁신 네트워크
* i-CON (innovation - Communication Open
Network) :
산학연 등 다양한 혁신 주체 간 자유로운 소통
–그를 통해 혁신을 이루는 열린 네트워크를 의미
–대‧중소기업, 연구자, 금융 등의 연결과 협력을 통해
• 기술개발과제 발굴‧기획, 사업화, 투자 등 혁신활동의 허브
역할을 수행하는「개방형 혁신 네트워크 i-CON*」출범
8. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
개방형 혁신 오픈 네트워크 소개
New Technology: Big Data and Deep Learning
개방형 혁신 오픈 네트워크
9. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
New Technology: Big Data
What is Big Data? Data or Systems?
Large Scale Data?
– Many people only see the data point of view
– 3 Vs, 5Vs
Systems?
– YES
Big Data [1]
New Computer Systems to store and compute Large-Scale data
– How to Store and Process large scale data
• Not only for computing power
• Parallel Distributed Computing Systems => Data Intensive Super Computer
– Does not need to be Tera-/Peta-Bytes of data set
• Linearly Scalable
10. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
11. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Traditional Way
Becomes too Expensive
12. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
13. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Handling: Another Way
Not Expensive
http://blog.naver.com/PostView.nhn?blogId=dosims&logNo=221127053677
1409년(태종 9) 최해산(崔海山), 아버지 최무선(崔茂宣)
[출처] 조선의 비밀 병기 : 총통기 화차(銃筒機火車)|작성자 도심
14. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Super Computer vs Big Data vs Cloud
Traditional Super Computer
(Parallel File Systems: Lustre, PVFS, GPFS)
Cluster for Store
Big Data (Hadoop, Spark, Distributed Deep Learning)
Cluster for Compute and Store
(Distributed File Systems: HDFS, GFS)
However, Cloud Computing adopts
this separated architecture:
with High Speed N/W and Object
Storage
Cluster for Compute
15. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data: Linearly Scalable
Some people questions that the system to handle 1 ~ 3GB of
data set is not Big Data
Well…. add more servers as more data in the future in Big Data platform
– it is linearly scalable once built
– n time more computing power ideally
Data Size: < 3 GB Data Size: 200 TB >
Add n
servers
16. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data is great for Small Business
Your Business data is the value and Big Data
Customer data
Operational data
You have your specific data
Big Company does not have a specific data as you have
Potentials
Your customer data
– Smart marketing and Sales
– Advertisement
Your operational data
– Efficient operation, For Example, Smart*:
• Smart Factory, Smart City
17. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Data Analysis & Visualization
Sentiment Map of Alphago
Positive
Negative
18. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Businesses popular in 5 miles of CalStateLA,
USC , UCLA
19. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Analysis and Prediction Flow
Data Collection
Batch API: Yelp, Google
Streaming: Twitter, Apache
NiFi, Kafka, StereamSets,
Storm
Open Data: Government
Data Storage
HDFS, S3, Object Storage,
NoSQL DB (Couchbase)…
Data Filtering
Hive, Pig
Data Analysis and Science
Hive, Pig, Spark, Deep Learning,
BI Tools (Qlik, Tableau, …)
Data Visualization
Qlik, Excel PowerMap,
Tableau, Looker, …
- Big Data Engineering
- Big Data Analysis
- Big Data Science Deep Learning
- Data Visualization
20. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Azure ML Studio and Spark ML Result Comparison
Big Data Science: Ad Click Fraud Prediction, 7GB data
TWO-CLASS
DECISION
JUNGLE
(AzureML)
TWO-CLASS
DECISION
FOREST
(AzureML)
DECISION
TREE
CLASSIFIER
(Databricks)
RANDOM
FOREST
CLASSIFIER
(Databricks)
DECISION TREE
CLASSIFIER
(Balanced
Sample Data,
Oracle)
RANDOM
FOREST
CLASSIFIER
(Balanced
Sample Data,
Oracle)
AUC 0.905 0.997 0.815 0.746 0.896 0.893
PRECISION 1.0 0.992 0.822 0.878 0.935 0.934
RECALL 0.001 0.902 0.633 0.495 0.807 0.800
TP 35 47,199 86,683 67,726 111,187 110,220
FP 0 377 18,727 9,408 7,712 7,791
TN 52,306 406,228 7,112,961 7,122,280 545,302 545,223
FN 406,605 5,142 50,074 69,031 26,604 27,571
Run Time 2 hrs 2-3 hrs 22 mins 50 mins 24 sec 2 mins
21. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Science: Transaction Data Fraud Detection
Model Area under
ROC
Precision Recall
DecisionTreeClassifier
RandomForestClassifier 0.909573
LogisticRegression
Size: 470 MB (=> 718MB)
6,362,620 records
Not that large scale data comparing to data set > GB
https://www.kaggle.com/ntnu-testimon/paysim1
3 models in Spark Cluster with different combinations of the
parameters
Times taken: 1 hour with 3 Spark clsters
In theory of Linear Scalability: 2 minutes with 30 Spark clsters
22. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Experimental Results in AWS
Execution times
3 nodes:
–40min – 70mins
11 nodes
–10min – 20mins
23. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Data Scale Driving: Deep Learning Process
Deep Learning and Massive Data [3]
“Machine Learning Yearning” Andrew Ng 2016
24. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Deep learning experts
The
Chasm
Big Data Engineers, Scientists, Analysts, etc.
Another Gap between Deep Learning and Big Data
Communities [6]
25. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster
Existing Big Data cluster with massive data set without using
Big Data
Too slow in data
migration and
single server fails
Single GPU
server for Deep
Learning?
Single server for
Python and R
Traditional
Machine Learning?
Big Data Cluster
26. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Leveraging Big Data Cluster
Existing Big Data cluster
Big Data Engineering
Big Data Analysis
Big Data Science
Distributed Deep Learning
– Integrate Deep Learning to the cluster
Not needs data migration and can leverage the
parallel computing and existing large scale data
Big Data Cluster
27. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Big Data Prediction with DDL
DDL: Distributed Deep Learning
Tensor Flow
Distributed Training and Inference in Spark cluster
DDL
28. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Spark ML and DDL [2-5]
Deep Learning in Spark cluster
Distributed Deep Learning
DDL
DDL lib
DDL lib
Deep Learning in Spark
29. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
AWS Review Dataset
Predictive Analysis
Prediction of rating
– important measures for purchase and selling
Spark ML: ALS (Alternating Least Squares) algorithm
DDL (Distributed Deep Learning): Neural Collaborative Filtering(NCF)
Dataset : - https://s3.amazonaws.com/amazon-reviews-
pds/tsv/index.txt
Products reviewed between 2005 and 2015 are analyzed
Total product reviews : 9.57 million
File Size : 5.26 GB
30. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Performance
31. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary: Mean Absolute Error
32. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Contents
Myself
개방형 혁신 오픈 네트워크 소개
New Technology: Big Data and Deep Learning
개방형 혁신 오픈 네트워크
33. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Small Business
How to adopt the new technology
Training for new technology
Collaboration
Government
34. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Training
Emerging Technology every moment
IT companies lead the industry not university
How to catch up with?
– Training
Company with new technology
Always deliver training
– Big Data
• Cloudera, Hortonworks
– AI Deep Learning
• Traditional Concept
– Stanford, UC Berkeley, edx, IBM, H2O
35. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Training (Cont’d)
Training by Company
3 - 4days/Week
–$2,500 - $3,000
–Practical
• with theory + hands-on exercise
• Instructor paid well
• Employer send their engineers to learn the new technology in a
few weeks
36. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Trained but No Experience with bad management in Korea
Sang-Ryung Battle:
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
37. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Trained Well With Experience and Good management in Japan
Battle of Nagashino,
1575, Japan
38. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Trained but No Experience with bad management in Korea (Cont’d)
Sang-Ryung Battle:
From 2017 Korean
Blockbuster Movie,
“The Fortress”
(남한산성)
39. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Collaboration
Multiple entities
IT company
– Expert in Big Data, Deep Learning
Business
– Domain Expert
For example, Big Data in Smart Factory
IT
– What the manufacturer wants to analyze and predict?
– What data and How the manufacturer generates it?
– What Tags are? How to put sensors to motors? What is SCADA?
OT
– What is Big Data and Big Data Science
– How to implement Machine Learning/Deep Learning?
40. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Collaboration
For example, Los Angeles
Weekly Coffee time or non-formal meetings with diverse
backgrounds
– Professors, Computer Scientists, Investor, Journalists, Government
officials, Digital Artists from Universities, Companies in any area
– Just chat
• and can find out the opportunities to work together
• And find out the solutions
41. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Government
Open Data in Los Angeles
Open Data: https://data.lacity.org/
Location based open data: http://geohub.lacity.org/
Data Science Federation City of Los Angeles
https://dsf.lacity.org/
– is a partnership between The City of Los Angeles and Los Angeles area colleges and
universities to tackle tough city problems.
– work on tough city problems that will make a difference in many areas and expand on
early work in data science and data-driven decision making for City Government.
Jongwook Woo, BigDAI Center
– Dalyapraz Dauletbak, Jongwook Woo, "Traffic Data Analysis and Prediction
using Big Data", APIC-IST 2019, pp127-133, ISSN 2093-0542
• Outstanding Paper Award received
– LA Zoo Data Analysis and Prediction
42. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
City of Los Angeles: DSF
43. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Jams and other traffic incidents reported
by users in Dec 2017 – Jan 2018:
(Dalyapraz Dauletbak, Jongwook woo at BigDAI Center, CalStateLA)
44. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Government (Cont’d)
대한민국
중소기업 지원 및 프로젝트 공고
사무실 저렴 임대
미국 중소기업
45. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
https://www.visualcapitalist.com/ranked-the-20-easiest-countries-for-
doing-business
46. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Government (Cont’d)
중소기업 지적 권리:
대기업과의 관계에 있어서 기술개발, 수익모델
– 무료 POC
– 무료 연구개발 솔루션 계획서
• 기술 개발 도용 장기 소송전
중소기업
작은 내수시장
민간주도 투자부족
– 국가 지원 외의 민간투자 최하위?
– 기술 평가 전문가 이용 부족
– 인건비 (생각보다 한국이 인건비 높음)
사업가들의 신용불량 전락 위험성
기업에 지우는 과도한 준 조세부담 (4대 보험 기업부담 강제)
고용 유연성 부족
47. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Government (Cont’d)
미국 공정위 (Fair Trade Commission)
공정위 조사 들어가는 즉시 대기업 해체
– Microsoft: MS 오피스 윈도우 통합 독점 해체
– AT&T:
대한민국 중기부 와 공정거래위원회?
중소기업 지적 보호막 필요
– 대기업의 관여나 포식 방어막 필요
• 대기업과의 공정 경쟁과 상생을 통한 중소기업의 발전 지원
중소기업 애로 해결 필요
– 시장 진입 퇴출 자유자재 환경 조성
48. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Summary
Open Innovation 정의
Big Data AI 같은 기술의 트레이닝 필요성
Open Innovation
협업 나아갈 길
중소기업 생존에 필요한 큰 그림의 정부 역할 필요
49. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
Questions?
50. Big Data Artificial Intelligence Center (BigDAI)
Jongwook Woo
CalStateLA
References
1. “Rating Prediction using Deep Learning and Spark”, Monika Mishra, Mingoo Kang, Jongwook
Woo, The 11th International Conference on Internet (ICONI 2019), Dec 15-18 2019, Hanoi,
Vietnam
2. “BigDL: Bringing Ease of Use of Deep Learning for Apache Spark”, Jason Dai, Radhika
Rangarajan, Databricks, Spark Summit 2017
3. “Building Deep Learning Applications for Big Data: An Introduction to Analytics Zoo :
Distributed TensorFlow, Keras and BigDL on Apache Spark”, Jennie Wang, Guoqiong Song, CVPR
2018, Salt Lake City, Utah, June 18-22 2018
4. “Building Deep Learning Applications for Big Data: An Introduction to Analytics Zoo :
Distributed TensorFlow, Keras and BigDL on Apache Spark”, Jason Dai, AAAI 2019 Tutorial
Forum, Thirty-Third Conference on Artificial Intelligence, January 27 – 28, 2019, Honolulu,
Hawaii, USA
5. “User-based real-time product recommendations leveraging deep learning using Analytics Zoo
on Apache Spark and BigDL”, Luyang Wang, Guoqiong Song, Jing (Nicole) Kong, Maneesha
Bhalla, Strata Data Conference 2019, March 25-28, 2019, San Francisco, CA
6. “Leveraging NLP and Deep Learning for Document Recommendation in the Cloud”, Guoqiong
Song, Spark + AI Summit 2019