SlideShare uma empresa Scribd logo
1 de 34
[object Object],Brian David Eoff & Mike Dewar Hadoop World 2011
 
http://tiredfreakedoutweirdogames.webs.com/photos/LOLCATS/lolcat7.gif http://bit.ly/n4nWV1
 
 
 
 
DATA HOW WE HANDLE
{  "a": "Mozilla/5.0 (Windows; U; Windows NT 6.0;....",  "c": "DE",  "nk": 0,  "tz": "Europe/Berlin",  "gr": "07",  "g": "mwN8js",  "i": "***.***.*.**",  "h": "iIIbEk",  "k": "*******-*****-*****-********",  "l": "ctctpro",  "al": "de-de,de;q=0.8,en-us;q=0.5,en;q=0.3",  "hh": "conta.cc",  "r": "direct",  "u": " http://myemail.constantcontact.com....... ",  "t": 1304207999,  "hc": 1304089158,  "cy": "Menden",  "ll": [51.43330001831055, 7.800000190734863]}
[object Object],[object Object],[object Object],[object Object],The Scale of the Data
HADOOP HOW WE USE
[object Object],[object Object],[object Object],Primarily AWS Hadoop Setup Verisign Datacenter ,[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Sample Jobs
[object Object],[object Object],[object Object],WHAT IS
[object Object]
https://bitly.com/pIyS1O
Treat Each Click as a Draw from a Distribution
Half-life of a link
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],Aims of the Simple Model
(Deterministic) AR Model
Training: 1000 link time series across a week
A 9th order model is chosen using cross validation based on model predicted output Testing: 2000 link time series across a week
 
Histogram of Model Predicted Output Correlation (r^2) NORMAL
ROBOT SPOTTING
[object Object],[object Object],Types of Robots
SUN CITY PALM DESERT - SUN CITY SHADOW HILLS CA Tours, MLS, Plans, Info, MORE http://bit.ly/nda6QH+ TOP SALE Viagra from USD 0.90 per pill, Cialis from USD 1.75 per pill
 
SUN CITY PALM DESERT - SUN CITY SHADOW HILLS CA Tours, MLS, Plans, Info, MORE http://imageshack.us/clip/my-videos/8/tcib.mp4/ 2011 CMA Nominees | Playlist | VEVO TOP SALE Viagra from USD 0.90 per pill, Cialis from USD 1.75 per pill National Association for College Admission Counselling log-in
 
Free Game For Kids Registration - Pittsburgh Penguins - Fan Zone (2756 clicks) Video: Carra's top five transfers - Liverpool FC (915 clicks) Clip of the Week: Toews Nails Tot | NBC Chicago (179 clicks) Allegro.pl nie działa  (919 clicks) DallasCowboys.com - Official Site of the Dallas Cowboys (683 clicks) Mao’s Room (2339 clicks) Manchester United Official Web Site - Ashley Young was long term United target (526 clicks) Shocking! Lady Gaga Poses Sans Makeup for Harper's Bazaar Cover - UsMagazine.com (12288 clicks) Runway - Runway TV Collections Fashion Magazine - Nick Carter: From the Backstreet to Taking Off (3662 clicks) The GQ&A: Drive Director Nicolas Winding Refn (601 clicks)
 
CONCLUSION

Mais conteúdo relacionado

Destaque

37至59号墩特殊地段承台支护方案
37至59号墩特殊地段承台支护方案37至59号墩特殊地段承台支护方案
37至59号墩特殊地段承台支护方案kombergke
 
English resume leonardo ribeiro ict professional
English resume leonardo ribeiro ict professionalEnglish resume leonardo ribeiro ict professional
English resume leonardo ribeiro ict professionalle_nardo
 
Where the Whales live: the pyramid model of F2P design
Where the Whales live: the pyramid model of F2P designWhere the Whales live: the pyramid model of F2P design
Where the Whales live: the pyramid model of F2P designNicholas Lovell
 
Comunicación y la madre que parió al social media (Pablo Herreros)
Comunicación y la madre que parió al social media (Pablo Herreros)Comunicación y la madre que parió al social media (Pablo Herreros)
Comunicación y la madre que parió al social media (Pablo Herreros)Comunica2 Campus Gandia
 
R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)
R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)
R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)David Piao Chiu
 
Hm international mold flow standard
Hm international mold flow standardHm international mold flow standard
Hm international mold flow standardMark Chan
 
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...David Piao Chiu
 
Wish you were here? You could be!
Wish you were here? You could be!Wish you were here? You could be!
Wish you were here? You could be!Roc Search
 
Data Integration & Data Quality Open Source (spanish)
Data Integration & Data Quality Open Source (spanish)Data Integration & Data Quality Open Source (spanish)
Data Integration & Data Quality Open Source (spanish)Stratebi
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Edureka!
 
L’Oréal Finance: Meet our employees
L’Oréal Finance: Meet our employees L’Oréal Finance: Meet our employees
L’Oréal Finance: Meet our employees L'Oréal Talent
 
20161127 doradora09 japanr2016_lt
20161127 doradora09 japanr2016_lt20161127 doradora09 japanr2016_lt
20161127 doradora09 japanr2016_ltNobuaki Oshiro
 
Mercuri Urval Assessment Overview
Mercuri Urval Assessment OverviewMercuri Urval Assessment Overview
Mercuri Urval Assessment OverviewMartin Ystrøm
 

Destaque (20)

37至59号墩特殊地段承台支护方案
37至59号墩特殊地段承台支护方案37至59号墩特殊地段承台支护方案
37至59号墩特殊地段承台支护方案
 
English resume leonardo ribeiro ict professional
English resume leonardo ribeiro ict professionalEnglish resume leonardo ribeiro ict professional
English resume leonardo ribeiro ict professional
 
Where the Whales live: the pyramid model of F2P design
Where the Whales live: the pyramid model of F2P designWhere the Whales live: the pyramid model of F2P design
Where the Whales live: the pyramid model of F2P design
 
Comunicación y la madre que parió al social media (Pablo Herreros)
Comunicación y la madre que parió al social media (Pablo Herreros)Comunicación y la madre que parió al social media (Pablo Herreros)
Comunicación y la madre que parió al social media (Pablo Herreros)
 
Cartilha cfess final_grafica
Cartilha cfess final_graficaCartilha cfess final_grafica
Cartilha cfess final_grafica
 
R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)
R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)
R2 Games F2P Monetization Presentation (Dec 2015 Montevideo GameLab)
 
Hm international mold flow standard
Hm international mold flow standardHm international mold flow standard
Hm international mold flow standard
 
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
 
Wish you were here? You could be!
Wish you were here? You could be!Wish you were here? You could be!
Wish you were here? You could be!
 
Data Integration & Data Quality Open Source (spanish)
Data Integration & Data Quality Open Source (spanish)Data Integration & Data Quality Open Source (spanish)
Data Integration & Data Quality Open Source (spanish)
 
MIT Who Are We
MIT Who Are WeMIT Who Are We
MIT Who Are We
 
BG Medical
BG MedicalBG Medical
BG Medical
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!
 
Docker Swarm scheduling in 1.12
Docker Swarm scheduling in 1.12Docker Swarm scheduling in 1.12
Docker Swarm scheduling in 1.12
 
L’Oréal Finance: Meet our employees
L’Oréal Finance: Meet our employees L’Oréal Finance: Meet our employees
L’Oréal Finance: Meet our employees
 
letgo presentation
letgo presentation letgo presentation
letgo presentation
 
20161127 doradora09 japanr2016_lt
20161127 doradora09 japanr2016_lt20161127 doradora09 japanr2016_lt
20161127 doradora09 japanr2016_lt
 
Emnlp読み会資料
Emnlp読み会資料Emnlp読み会資料
Emnlp読み会資料
 
Who we are
Who we areWho we are
Who we are
 
Mercuri Urval Assessment Overview
Mercuri Urval Assessment OverviewMercuri Urval Assessment Overview
Mercuri Urval Assessment Overview
 

Semelhante a Hadoop World 2011: Building a Model of Organic Link Traffic - Michael Dewar - Bitly

GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScalePatrick Chanezon
 
Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scalejgoulah
 
Transformation Communities
Transformation CommunitiesTransformation Communities
Transformation CommunitiesKristin Wolff
 
Parisjs fastvideoandimages
Parisjs fastvideoandimagesParisjs fastvideoandimages
Parisjs fastvideoandimagesDoug Sillars
 
Lessons From Spider Support
Lessons From Spider SupportLessons From Spider Support
Lessons From Spider SupportOliver Brett
 
New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0Dinis Cruz
 
Is your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposiumIs your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposiumDoug Sillars
 
What You Need to Know About Web App Security Testing in 2018
What You Need to Know About Web App Security Testing in 2018What You Need to Know About Web App Security Testing in 2018
What You Need to Know About Web App Security Testing in 2018Ken DeSouza
 
Developing web applications in 2010
Developing web applications in 2010Developing web applications in 2010
Developing web applications in 2010Ignacio Coloma
 
Build Your First Mobile App in 1 hour with Windows App Studio
Build Your First Mobile App in 1 hour with Windows App StudioBuild Your First Mobile App in 1 hour with Windows App Studio
Build Your First Mobile App in 1 hour with Windows App StudioNick Landry
 
061203_futurewebapps_tempo
061203_futurewebapps_tempo061203_futurewebapps_tempo
061203_futurewebapps_tempocjin cheng
 
Security Testing by Ken De Souza
Security Testing by Ken De SouzaSecurity Testing by Ken De Souza
Security Testing by Ken De SouzaQA or the Highway
 
Tech trends on startups for 2011
Tech trends on startups for 2011Tech trends on startups for 2011
Tech trends on startups for 2011Josep Amoros
 
Cloudy Open Source and DevOps
Cloudy Open Source and DevOpsCloudy Open Source and DevOps
Cloudy Open Source and DevOpsMatt O'Keefe
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013Jon Arne Sæterås
 
FGS 2011: Panel: Metrics From Top Game Developers
FGS 2011: Panel: Metrics From Top Game DevelopersFGS 2011: Panel: Metrics From Top Game Developers
FGS 2011: Panel: Metrics From Top Game Developersmochimedia
 

Semelhante a Hadoop World 2011: Building a Model of Organic Link Traffic - Michael Dewar - Bitly (20)

GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and ScaleGDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
GDD Japan 2009 - Designing OpenSocial Apps For Speed and Scale
 
2010 And Beyond
2010 And Beyond2010 And Beyond
2010 And Beyond
 
Crossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at ScaleCrossing the Production Barrier: Development at Scale
Crossing the Production Barrier: Development at Scale
 
Transformation Communities
Transformation CommunitiesTransformation Communities
Transformation Communities
 
Parisjs fastvideoandimages
Parisjs fastvideoandimagesParisjs fastvideoandimages
Parisjs fastvideoandimages
 
Lessons From Spider Support
Lessons From Spider SupportLessons From Spider Support
Lessons From Spider Support
 
New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0
 
Is your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposiumIs your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposium
 
What You Need to Know About Web App Security Testing in 2018
What You Need to Know About Web App Security Testing in 2018What You Need to Know About Web App Security Testing in 2018
What You Need to Know About Web App Security Testing in 2018
 
Developing web applications in 2010
Developing web applications in 2010Developing web applications in 2010
Developing web applications in 2010
 
Build Your First Mobile App in 1 hour with Windows App Studio
Build Your First Mobile App in 1 hour with Windows App StudioBuild Your First Mobile App in 1 hour with Windows App Studio
Build Your First Mobile App in 1 hour with Windows App Studio
 
061203_futurewebapps_tempo
061203_futurewebapps_tempo061203_futurewebapps_tempo
061203_futurewebapps_tempo
 
An open web for all
An open web for allAn open web for all
An open web for all
 
Security Testing by Ken De Souza
Security Testing by Ken De SouzaSecurity Testing by Ken De Souza
Security Testing by Ken De Souza
 
Tech trends on startups for 2011
Tech trends on startups for 2011Tech trends on startups for 2011
Tech trends on startups for 2011
 
Cloudy Open Source and DevOps
Cloudy Open Source and DevOpsCloudy Open Source and DevOps
Cloudy Open Source and DevOps
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013Mobile is slow - Over the Air 2013
Mobile is slow - Over the Air 2013
 
FGS 2011: Panel: Metrics From Top Game Developers
FGS 2011: Panel: Metrics From Top Game DevelopersFGS 2011: Panel: Metrics From Top Game Developers
FGS 2011: Panel: Metrics From Top Game Developers
 
Edinburgh
EdinburghEdinburgh
Edinburgh
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Hadoop World 2011: Building a Model of Organic Link Traffic - Michael Dewar - Bitly

  • 1.
  • 2.  
  • 4.  
  • 5.  
  • 6.  
  • 7.  
  • 8. DATA HOW WE HANDLE
  • 9. { "a": "Mozilla/5.0 (Windows; U; Windows NT 6.0;....", "c": "DE", "nk": 0, "tz": "Europe/Berlin", "gr": "07", "g": "mwN8js", "i": "***.***.*.**", "h": "iIIbEk", "k": "*******-*****-*****-********", "l": "ctctpro", "al": "de-de,de;q=0.8,en-us;q=0.5,en;q=0.3", "hh": "conta.cc", "r": "direct", "u": " http://myemail.constantcontact.com....... ", "t": 1304207999, "hc": 1304089158, "cy": "Menden", "ll": [51.43330001831055, 7.800000190734863]}
  • 10.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17. Treat Each Click as a Draw from a Distribution
  • 19.
  • 20.
  • 22. Training: 1000 link time series across a week
  • 23. A 9th order model is chosen using cross validation based on model predicted output Testing: 2000 link time series across a week
  • 24.  
  • 25. Histogram of Model Predicted Output Correlation (r^2) NORMAL
  • 27.
  • 28. SUN CITY PALM DESERT - SUN CITY SHADOW HILLS CA Tours, MLS, Plans, Info, MORE http://bit.ly/nda6QH+ TOP SALE Viagra from USD 0.90 per pill, Cialis from USD 1.75 per pill
  • 29.  
  • 30. SUN CITY PALM DESERT - SUN CITY SHADOW HILLS CA Tours, MLS, Plans, Info, MORE http://imageshack.us/clip/my-videos/8/tcib.mp4/ 2011 CMA Nominees | Playlist | VEVO TOP SALE Viagra from USD 0.90 per pill, Cialis from USD 1.75 per pill National Association for College Admission Counselling log-in
  • 31.  
  • 32. Free Game For Kids Registration - Pittsburgh Penguins - Fan Zone (2756 clicks) Video: Carra's top five transfers - Liverpool FC (915 clicks) Clip of the Week: Toews Nails Tot | NBC Chicago (179 clicks) Allegro.pl nie działa (919 clicks) DallasCowboys.com - Official Site of the Dallas Cowboys (683 clicks) Mao’s Room (2339 clicks) Manchester United Official Web Site - Ashley Young was long term United target (526 clicks) Shocking! Lady Gaga Poses Sans Makeup for Harper's Bazaar Cover - UsMagazine.com (12288 clicks) Runway - Runway TV Collections Fashion Magazine - Nick Carter: From the Backstreet to Taking Off (3662 clicks) The GQ&A: Drive Director Nicolas Winding Refn (601 clicks)
  • 33.  

Notas do Editor

  1. Thank you for having me. Who I am (Ph.D in CS, Caverlee’s Infolab at Texas A&M, Scientist at bitly). This was joint work with Mike Dewar, a fellow scientist working at bitly. Unfortunatly do to some rather lame visa issues, he can’t be here.
  2. Bitly is a URL shortening and analytics service. We have been around for a little over three years. We are located in the Meat Packing District here in New York City. We have a five member science team.
  3. What is a URL shortener, why do you need such a thing.
  4. ... then you share. Bitly powers many custom URL shorteners, such as NYTime, WashingtonPost, ESPN and O’Reilly
  5. Shortening allows analytics. Adding a ‘+’ to the end of any bitly link. Traffic patterns, referrer details, location data. Information is free to all. What can we build on top of this data?
  6. Internally, we’ve built a variety of search services. So, this is a screenshot of the bitly search engine.
  7. We also can use the data to track trends. We can determine when phrases are occurring an abnormal amount.
  8. agent / country / timezone / global hash / ip address / cookie / user language / referrer / url / timestamp / hash creation / city / lat lon Single line delimited json
  9. minimum over 500 clicks per second
  10. bitly is primarily a python shop, we use streaming on Hadoop, and many of use heavily use the MrJob framework from Yelp. Which is awesome.
  11. Types of questions we use Hadoop/ MapReduce to answer.
  12. What is an organic traffic?
  13. Time series binned on the minute.
  14. typical click stream / cumulative representation / binning (makes the baby shannon cry) / horrible derivatives made everything noisy
  15. Probability density function. What is the likely hood there will be a click at this time.
  16. quick factoid before moving on to the model, 3 hours (Most), 7 hours on Youtube
  17. using bitly for promotion & decision making / links go ‘viral’ / talk about today / gotta do this very very quickly
  18. first thing I’ve done, talk about today, is to throw away the rise and just look at the decay AutoRegression Model (Polynomial Regression for TimeSeries) (Line Fitting, Curve Fitting) Polynomial Regression
  19. using least squares - fit one model to 1000 different time series
  20. the only real difficulty is model selection / rich on computation short on time / fit a bunch of models with different temporal orders and look at their model predicted output
  21. The blue line is the referred rate based on the PDF, the green line is the model prediction.
  22. r squared / blunt threshold / many false negatives / gets job done
  23. we can dig a bit deeper into this data using the AR model, learn something about the model as well
  24. not “spam” but also not interesting
  25. most uncorrelated links by correlation with model predicted output
  26. (sort of) overfitting: a 9th order model is too flexible
  27. 3rd order model does much better
  28. Bottom link, we see abnormal traffic to such sites, which are spam
  29. let’s just check that the highly correlated links look less suspect