SlideShare uma empresa Scribd logo
1 de 31
Introduction to DISQL Chen Xiaoming Senior Engineer of  Baidu IBASE Dept. 陈晓鸣 百度基础平台部 高级工程师 1
What is DISQL? 2
DISQL is a distributed programming frameworkwidely used in Baidu 3
Contents Problems Solution Examples Rationales Adoption 4
Problems 5
Problems statistical analysis of logs extraction of fields in order to generate reports 6
Problems statistical analysis of features  features of web pages, web sites, ads, user preferences, etc in order to provide data for data mining and machine learning 7
Problems common operations selecting, filtering, grouping, sorting, joining, etc 8
Solution 9
A Platform named Log Statistical Platform, a.k.a. LSP web-based convenient for secondary development convenient for task/data/rights management 10
A Programming Framework named DIstributed SQL, a.k.a. DISQL provide SQL-like operators which can be combined arbitrarily encapsulate distributed algorithms automatic code generation 11
Application Programming Interfaces named Distributed Query, a.k.a. DQuery DSL-style APIsembedded in well-known programming languages PHP so far, C++/Python,… in the future using method chainingtechnique to provide fluent interface data-flow in the form of DAGcomposed by chains of methods 12
Three Edit Modes – Simple Mode 13
Three Edit Modes – DQuery Mode 14
Three Edit Modes – Complex Mode 15
Hierarchy  16
DISQL Architecture Simple Mode DQuery Mode Complex Mode Edit Modes PHP C++ Python APIs Normalizer Optimizer Splitter Planner Coder Translators Data-flow Schema Storage APIs Computing APIs 17 Runtimes
LSP Architecture 18 data presentation & monitoring  third party apps data access layer data management layer computing layer storage systems computing systems
Examples 19
Example 1 – word count 20
Example 2 given a log of query and ad shows extract site field from url field filter sites with regex calculate the amount of query and ad shows per site  output in JSON format 21
Code in DQuery Mode 22
Rationales 23
Use Case Driven VS Completeness Our Solution Problem Problem Problem Problem 24
Internal DSL VS External DSL take advantage of: parsers, libraries and VMs of the host languages users and communities language features different from Pig, Hive, Sawzall, etc 25
Open/Closed Principles “open for extension, closed for modification” open for single machine algorithms, closed for distributed algorithms also different from Pig, Hive, Sawzall, … 26
Adoption 27
Users …… …… 28
Usage throughput/day:		hundreds of TB tasks/day:		thousands total tasks:		> 1 million 29
Q&A also welcome to contact me with: ,[object Object]
Email:	chenxiaoming@baidu.com

Mais conteúdo relacionado

Semelhante a Introduction to DISQL, a distributed programming framework widely used in Baidu

seanresume15-a
seanresume15-aseanresume15-a
seanresume15-aSean Lynch
 
Cookbook for Building An App
Cookbook for Building An AppCookbook for Building An App
Cookbook for Building An AppManish Jain
 
Startup Engineering Cookbook
Startup Engineering CookbookStartup Engineering Cookbook
Startup Engineering CookbookManish Jain
 
Online Fitness Gym Documentation
Online Fitness Gym Documentation Online Fitness Gym Documentation
Online Fitness Gym Documentation Abhishek Patel
 
Real Time Streaming with Flink & Couchbase
Real Time Streaming with Flink & CouchbaseReal Time Streaming with Flink & Couchbase
Real Time Streaming with Flink & CouchbaseManuel Hurtado
 
Tony Reid Resume
Tony Reid ResumeTony Reid Resume
Tony Reid Resumestoryhome
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
Cognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 EnhancementsCognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 EnhancementsSenturus
 
The Race To 50 Million Page Views
The Race To 50 Million Page ViewsThe Race To 50 Million Page Views
The Race To 50 Million Page ViewsLogicworksNY
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Procurement Exchange - Our underlying technology
Procurement Exchange - Our underlying technologyProcurement Exchange - Our underlying technology
Procurement Exchange - Our underlying technologyGlenn Turley
 
NLS Banking Solutions - NQuest BI
NLS Banking Solutions - NQuest BINLS Banking Solutions - NQuest BI
NLS Banking Solutions - NQuest BIkarthik nagarajan
 

Semelhante a Introduction to DISQL, a distributed programming framework widely used in Baidu (20)

seanresume15-a
seanresume15-aseanresume15-a
seanresume15-a
 
Cookbook for Building An App
Cookbook for Building An AppCookbook for Building An App
Cookbook for Building An App
 
Startup Engineering Cookbook
Startup Engineering CookbookStartup Engineering Cookbook
Startup Engineering Cookbook
 
Symphony Driver Essay
Symphony Driver EssaySymphony Driver Essay
Symphony Driver Essay
 
Sivagama_sundari_Sakthivel_Resume_2016
Sivagama_sundari_Sakthivel_Resume_2016Sivagama_sundari_Sakthivel_Resume_2016
Sivagama_sundari_Sakthivel_Resume_2016
 
Online Fitness Gym Documentation
Online Fitness Gym Documentation Online Fitness Gym Documentation
Online Fitness Gym Documentation
 
Real Time Streaming with Flink & Couchbase
Real Time Streaming with Flink & CouchbaseReal Time Streaming with Flink & Couchbase
Real Time Streaming with Flink & Couchbase
 
Presentation
PresentationPresentation
Presentation
 
Ankur_Srivastava
Ankur_SrivastavaAnkur_Srivastava
Ankur_Srivastava
 
Tony Reid Resume
Tony Reid ResumeTony Reid Resume
Tony Reid Resume
 
Brijesh Soni
Brijesh SoniBrijesh Soni
Brijesh Soni
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Cognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 EnhancementsCognos Analytics Release 6: March 2017 Enhancements
Cognos Analytics Release 6: March 2017 Enhancements
 
The Race To 50 Million Page Views
The Race To 50 Million Page ViewsThe Race To 50 Million Page Views
The Race To 50 Million Page Views
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Madhava_Sr_JAVA_J2EE
Madhava_Sr_JAVA_J2EEMadhava_Sr_JAVA_J2EE
Madhava_Sr_JAVA_J2EE
 
Procurement Exchange - Our underlying technology
Procurement Exchange - Our underlying technologyProcurement Exchange - Our underlying technology
Procurement Exchange - Our underlying technology
 
NLS Banking Solutions - NQuest BI
NLS Banking Solutions - NQuest BINLS Banking Solutions - NQuest BI
NLS Banking Solutions - NQuest BI
 

Introduction to DISQL, a distributed programming framework widely used in Baidu