I discuss the failure of LOD and the reasons. From the lessons learned, LOD2 got launched four plus (4+) years ago and is about to the completed. What can you say about the future trend of Big Data from the lessons?
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Lessons Learned from Lod Failure and Big Data : The Future Trend
1. Lessons Learned from LOD
(Linked Open Data) Failure and
Big Data:
The Future Trend
Youngwhan Lee, Ph. D.
전화: 010-7997-0345
이메일: nicklee@konkuk.ac.kr
Facebook: Youngwhan Nick Lee
Twitter: nicklee002
1
3. Internet Today
2010:
• Estimated 1011 Web pages in the World
2012:
•
•
•
Social Media: Facebook (1 Billion Monthly Active Users)
문자 발명후 2003년까지 5 엑사 바이트 2012년 현재 매일 7 엑사바이트 데이터 생성 중
Is “big data” a big pile of garbage?
1-3
4. Web Explosion and Big Data
•
•
Number of Web Users (Mar. 2012): 2.3 Billion
1011 Web pages in the World (Est. 2010)
– Since the inception of Web, there were 7000 days (i.e. 20 years). This means humans
create over 10 Million pages a day.
•
Digital Information Created in the year 2010: 1 zetabytes (1021)
-
-
•
"There was 5 exabytes of information created between the dawn of civilization through
2003, but that much information is now created every 2 days, and the pace is
increasing.“ –Eric Schmitt (2010)
2012, almost 7 exabytes are created everyday.
We call it “Big Data.”
What does this mean?
7. 빅데이터/웹에서의 정보/지식 추출
• 정보 검색
– SEO(Search Engine Optimization) PageRank, EdgeRank
• Data Mining: 프로그램에 의한 정보(지식) 추출 가능
– 통계분석, Rule-based Analysis, 신경망 분석
– Visualization
데이터사이언스
• 지식공학 이용
– RDF/OWL 사용한 온톨로지 누적 연결
– Raw Data 연결하고 분석 가능하도록 개방 (Linked Open Data; LOD)
– 프로그램에 의한 논리분석 가능한 지식 추출 가능
• SPARQL
• RIF(Rule-based Interface Framework)
지식공학
• 인간의 힘 이용: 큐레이션
– 인간의 눈과 지식을 이용하여 정보를 필터하고 종합
• 예: pinterest.com, videocooki.com, storify.com, scoop.it, curated.by
9. Longtail Phenomena in
The Long Tail by Chris Anderson (Wired, Oct. ´04) adopted to
information domains
Longtail Applications
Popularity
Mobile Apps
iPhone Apps
Android Apps
SNS Apps
Facebook Apps
Twitter Apps
LOD and Others
Medical Apps
공공 정보 활용 Apps
…
…
…
Bighead Applications
…
…
10. 지식공학에서의 접근
• 온톨로지 구축
– Cyc
– WolframAlpha
– Siri
• 데이터의 웹(Web of Data)
– LOD LOD2
15. Linked Open Data (LOD) Principles
Linking Open Data (LOD) is to connect and to open data to public
A little history of LOD Project
Tim Berners-Lee proposed LOD(Linking Open Data) project (2006)
Since the proposal, numerous countries and organizations participated, caused LOD to
explode in terms of the number of data
Wikipedia DBpedia (www.dbpedia.org)
Bio2RDF project opened in 27 fields of Biology, Genetics, Medical-related, of which the
data sets are about 2.3 billions (Bio2RDF.org) (2008.10)
BBC announced to participate LOD project (www.bbc.org), now one of the institutes
actively utilizing the data
US Data.gov released 5 billion data triples
US Library of Congress announced to join LOD project.
(http://id.loc.gov/authorities/sh85042531#concept)
NY Times ( data.nytimes.com) release their data of 150 years of publication (2009.10)
US Whitehouse release a plan to open data in RDF (2009.11)
4 Principles
of LOD
1.
2.
3.
4.
Use URIs as names for things
Use HTTP URIs
When someone looks up a URI, provide useful information
Include links to other URIs
24. Web 3.0: Merging the two Perspectives
WWW Propoal
(1989)
Semantic
Web
Technology
Innovation
Perspective
LOD Proposal (2006)
“GGG” Proposal (2007)
Knowledge-based Semantics
Next Generation Web
Data-based Semantics
Market
Behavior
Perspective
WEB 1.0
WEB 2.0
Web 3.0
“WEB2” Proposal (2009)
Technical Proposal Phase
Practical Use Phase
25. But no Champaign…
• Definition Unclear
– Berners-Lee’s 4 principles are ambiguous
•
•
•
•
Interpretation difficult
Inconsistent
Difficult both to learn and use
Difficult to build browsers and reasoners
• “Free” to use
Full of incomplete and inconsistent RDFs, no way
to make them evolve
In short, “Garbage in, Garbage out” experienced
26. Solution to LOD problems: LOD2
• LOD2 Stack: A Technical Approach
– Linked Data Management
– Enrichment and Quality Improvement
– Various Tools to use
•
•
•
•
•
Storage and Querying
Revision and authoring
Interlinking and fusing
Classification and enrichment
…
27. Q: Is this technical approach for LOD good enough?
A: Business approach is
definitely needed.
28. Big Data
What did we do with big data in 2013?
What would we do with big data in 2014?
30. Implication
• Issue: Have and Have-not are
separated
– E. g. in marketing
• 4Ps
– Price, product, place, promotion
• STP
– Segmentation, targeting, and positioning
39. 참고문헌
• 웹3.0 세상을 바꾸고 있다.
– 이영환
• A Semantic Web Primer (Cooperative Information Systems series)
– Grigoris Antoniou, Frank van Harmelen
• Semantic Web for the Working Ontologist, Second Edition: Effective
Modeling in RDFS and OWL
– Dean Allemang, James Hendler
• 온톨로지: 인터넷 진화의 열쇠
– 노상규, 박진수
• 월드와이드웹
– 팀 버너스-리
• 큐레이션
– 스티븐 로젠바움 저, 이시은 역
40. Web sites
• Problems of Linked Data
– http://milicicvuk.com/blog/2011/07/26/problems-of-linked-data14-identity/
• LOD2
– http://lod2.eu/Welcome.html
– http://stack.lod2.eu/blog/
• How to Define Web 3.0
– http://howtosplitanatom.com/news/how-to-define-web-30-2/
• SPARQL by Example
– http://www.cambridgesemantics.com/semantic-university/sparqlby-example#(1)
• Practical P-P-P-Problems with Linked Data
– http://www.mkbergman.com/917/practical-p-p-p-problems-withlinked-data/
• Linked-Data-Api
– https://code.google.com/p/linked-data-api/