SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
2012-01-07
Who am I



!  Who am I
   – 
   –  pengtao@baidu.com
   – 
       • 
  – 
       • 
       •            "
1.



! 
     –            2010                                            81.9%
          •  CNNIC,                                                  2011


     –  Google effects on memory
          •                             v.s.
          •                                    v.s.
               –  (Sparrow, 2011)

          •  The Internet has become a primary form of external or transactive memory,
             where information is stored collectively outside ourselves.
1.



! 
     –                           ​1⁄1 ×     2
          •  Query #$ url             +
     – 
                                 ​1⁄2 ×     3
                                      +
          •                      ​1⁄3 ×     1
!                                       +
     –  MAP
                                 ​1⁄4 ×     2
                                        +
     –  DCG                       ​1⁄5 ×    2
                                        +
     –  nDCG
     –  ERR
                                   ​1⁄6 ×   2




                                     =
     –  …                        5.0667
2.



!  Side by side
2.



! 
     – 
          •          E v.s.        C
     –  10000
          •    log      10000 query      E C
     –  1000
          •  10000 query        1000
     –  100
          •  1000 diff    PM     100           review
          •  30 good) : 50 (same) : 20 (bad)

                                                        PM
2.



! 
     – 
          •      v.s. pm    query
     – 
          •  “    ”




                      PM
3.



!         crowdsourcing)
     –                          evaluator)
     – 
     –    evaluator

!  WSE
   – 
   – 
   – 
   – 
2.


!  WSE   evaluator
3.



!         Lesson1:
     – 
3.



!         Lesson2:
     – 
     – 
3.



!         Lesson3:
     – 

     – 
          •  Economics


     – 
          • 
          •  evaluator
3.



!  WSE
   – 
   –          10w

!          crowdsourcing
     –    reCaptcha
     –    Amazon Mechanical Turk
     –    ESP Game
     –    Human computation
3.



! 
     – 
     –  AB testing, Bucket testing

                   50%


                                          %
                   50%

 100%
3.



!         AB testing   ?
     – 
3.



!         AB testing   ?
     – 
3.



!  AB testing
    –           +
    – 
    – 
    – 
3.



!  AB testing
    –  1T
      •  cubeproducer, disql    hadoop
   –  1G olap
      •  infobright, mondrian
   –  1M
      •  ABreport
3.



!  AB testing
    – 
      • 
           –  Overall Evaluation Criteria
                »  ("Crook,"2009)"
           –  Queryrank:
                »                    )
                »                        )
      • 
           – 
           – 
3.



! 
     – 
          • 
     – 
          • 
               –    AA test
               – 
          • 
               – 
               – 
3.



! 
     – 
3.



! 
     –  50% v.s. 50%
     – 
                       B1    a1   i1        u1
            Baidu
                                  i2   d1
                       B2
                                  i3        u2
                             a2
                       B3              d2
                                  i4
                       B4              d3   u3
! 
     – 
     –    DCG

! 
     –    PM review
     –    crowdsourcing
     –    AB testing
! 
     –    v.s. AB testing

     – 

     –        v.s.
1   wse
2

                                                                  )
                            )                      sid        X       X’
/Cookie
                Sid=1001)
                                                         M1

                BWS                                M2             N2



     User)log                   internal)log
                                                     M10
                                                   BWS
关注我们:t.baidu-tech.com

          资料下载和详细介绍:infoq.com/cn/zones/baidu-salon
“畅想•交流•争鸣•聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目
的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期
只关注一个焦点话题。

讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华
和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。



                  InfoQ 策划·组织·实施
                  关注我们:weibo.com/infoqchina

Mais conteúdo relacionado

Semelhante a 22期.百度彭滔 搜索引擎评估与用户行为分析

Active Learning in Recommender Systems
Active Learning in Recommender SystemsActive Learning in Recommender Systems
Active Learning in Recommender SystemsNeil Rubens
 
Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드Jinho Jung
 
Just Count the Love-Hate Squares
Just Count the Love-Hate SquaresJust Count the Love-Hate Squares
Just Count the Love-Hate SquaresKyle Teague
 
Ppt ฝึกทักษะการวาดภาพความคิด 171212
Ppt ฝึกทักษะการวาดภาพความคิด 171212Ppt ฝึกทักษะการวาดภาพความคิด 171212
Ppt ฝึกทักษะการวาดภาพความคิด 171212ชำนิ รักษายศ
 
Trust in Recommender Systems a historical overview and recent developments
Trust in Recommender Systems
a historical overview and recent developmentsTrust in Recommender Systems
a historical overview and recent developments
Trust in Recommender Systems a historical overview and recent developmentsPaolo Massa
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databasessjwoodman
 
ISWC 2012 "Efficient execution of top-k SPARQL queries"
ISWC 2012 "Efficient execution of top-k SPARQL queries"ISWC 2012 "Efficient execution of top-k SPARQL queries"
ISWC 2012 "Efficient execution of top-k SPARQL queries"Sara Magliacane
 
양희송 청어람활용법(2)
양희송 청어람활용법(2)양희송 청어람활용법(2)
양희송 청어람활용법(2)Sue Hyun Jung
 

Semelhante a 22期.百度彭滔 搜索引擎评估与用户行为分析 (12)

Active Learning in Recommender Systems
Active Learning in Recommender SystemsActive Learning in Recommender Systems
Active Learning in Recommender Systems
 
Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드Leadership Guide, 초보팀장을 위한 리더십 가이드
Leadership Guide, 초보팀장을 위한 리더십 가이드
 
Just Count the Love-Hate Squares
Just Count the Love-Hate SquaresJust Count the Love-Hate Squares
Just Count the Love-Hate Squares
 
ADMSP Introduction to Social Media Measurement
ADMSP Introduction to Social Media MeasurementADMSP Introduction to Social Media Measurement
ADMSP Introduction to Social Media Measurement
 
Ppt ฝึกทักษะการวาดภาพความคิด 171212
Ppt ฝึกทักษะการวาดภาพความคิด 171212Ppt ฝึกทักษะการวาดภาพความคิด 171212
Ppt ฝึกทักษะการวาดภาพความคิด 171212
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
Trust in Recommender Systems a historical overview and recent developments
Trust in Recommender Systems
a historical overview and recent developmentsTrust in Recommender Systems
a historical overview and recent developments
Trust in Recommender Systems a historical overview and recent developments
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
ISWC 2012 "Efficient execution of top-k SPARQL queries"
ISWC 2012 "Efficient execution of top-k SPARQL queries"ISWC 2012 "Efficient execution of top-k SPARQL queries"
ISWC 2012 "Efficient execution of top-k SPARQL queries"
 
양희송 청어람활용법(2)
양희송 청어람활용법(2)양희송 청어람활용법(2)
양희송 청어람활용법(2)
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

22期.百度彭滔 搜索引擎评估与用户行为分析

  • 2. Who am I !  Who am I –  –  pengtao@baidu.com –  •  –  •  •  "
  • 3.
  • 4. 1. !  –  2010 81.9% •  CNNIC, 2011 –  Google effects on memory •  v.s. •  v.s. –  (Sparrow, 2011) •  The Internet has become a primary form of external or transactive memory, where information is stored collectively outside ourselves.
  • 5. 1. !  –  ​1⁄1 × 2 •  Query #$ url + –  ​1⁄2 × 3 + •  ​1⁄3 × 1 !  + –  MAP ​1⁄4 × 2 + –  DCG ​1⁄5 × 2 + –  nDCG –  ERR ​1⁄6 × 2 = –  … 5.0667
  • 6.
  • 8. 2. !  –  •  E v.s. C –  10000 •  log 10000 query E C –  1000 •  10000 query 1000 –  100 •  1000 diff PM 100 review •  30 good) : 50 (same) : 20 (bad) PM
  • 9. 2. !  –  •  v.s. pm query –  •  “ ” PM
  • 10.
  • 11. 3. !  crowdsourcing) –  evaluator) –  –  evaluator !  WSE –  –  –  – 
  • 12. 2. !  WSE evaluator
  • 13. 3. !  Lesson1: – 
  • 14. 3. !  Lesson2: –  – 
  • 15. 3. !  Lesson3: –  –  •  Economics –  •  •  evaluator
  • 16. 3. !  WSE –  –  10w !  crowdsourcing –  reCaptcha –  Amazon Mechanical Turk –  ESP Game –  Human computation
  • 17.
  • 18. 3. !  –  –  AB testing, Bucket testing 50% % 50% 100%
  • 19. 3. !  AB testing ? – 
  • 20. 3. !  AB testing ? – 
  • 21. 3. !  AB testing –  + –  –  – 
  • 22. 3. !  AB testing –  1T •  cubeproducer, disql hadoop –  1G olap •  infobright, mondrian –  1M •  ABreport
  • 23. 3. !  AB testing –  •  –  Overall Evaluation Criteria »  ("Crook,"2009)" –  Queryrank: »  ) »  ) •  –  – 
  • 24. 3. !  –  •  –  •  –  AA test –  •  –  – 
  • 25. 3. !  – 
  • 26. 3. !  –  50% v.s. 50% –  B1 a1 i1 u1 Baidu i2 d1 B2 i3 u2 a2 B3 d2 i4 B4 d3 u3
  • 27.
  • 28. !  –  –  DCG !  –  PM review –  crowdsourcing –  AB testing
  • 29. !  –  v.s. AB testing –  –  v.s.
  • 30.
  • 31.
  • 32. 1 wse
  • 33. 2 ) ) sid X X’ /Cookie Sid=1001) M1 BWS M2 N2 User)log internal)log M10 BWS
  • 34. 关注我们:t.baidu-tech.com 资料下载和详细介绍:infoq.com/cn/zones/baidu-salon “畅想•交流•争鸣•聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目 的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期 只关注一个焦点话题。 讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华 和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。 InfoQ 策划·组织·实施 关注我们:weibo.com/infoqchina