SlideShare a Scribd company logo
1 of 28
Mind Control
MANIPULATING BOTS FOR SUCCESS
Advanced SEO SUMMIT
1492450200
Why Should you care?
What you get out of this talk
D Ability to understand how Crawl bots function and View your Website
D Ability decrypt and understand Bot behaviour and Triggers
D Prioritize Pages and content as well as identify roadblocks
[~]$ whoami
NAV43Tyler McConville
D CEO & CO-Founder of NAV43
D Technical Search Engine Optimizer
D 7 YEAR SERP JOURNEY
D Survivor of 2013
ThoseWho HaveFallen…
In the SERPs
Agenda
D A very short understanding of how Google works.
D A background on Crawl Bot duties
D Tracking those f#@kers!
D The Secret Data
D Key Take-Aways
D Questions anyone?
Disclaimer
This is NOT an exploit resource
D It’s just an understanding from tests ヽ༼ຈ‫ل‬͜ ຈ༽ノ
D …and some implementation specific oddities
Google has done nothing [especially] wrong
D To the contrary, their bots are quite organized
Modifying your server and misdirecting Google bots can be Damaging
D If not done right. You have been forewarned..
Duplicating this is NOT guaranteed to Rank you
D I’m looking at you.. Understand the concepts before implementing them.
Background
Google, Crawl Bots, Bot Behaviour.
The now
infamous Google
Crawl
[ Initial Connection ]
E.G.: BASIC WEB CRAWL SIMPLIFIED
Google Bots and Duties
Google Bots a brief explanation
D C ra w l e r -> A discovery program for Google!
D A bot that mines “meta-data” and organizes
“relationship” mapping
D Technically a robust scraper running within a cluster
D Spoiler: Google Crawl bots aren’t intelligent!
List of Google Crawl Bots and functions
D Desktop
D Standard Website Scraper
D Smartphone
D Standard Scraper with mobile rules
D Image
D Image check & meta data gatherer
D Video
D Video Render & Processing
D News
D App
Removing the Distortion
What we want to do
D Isolate Google crawlers from Users!
What to actually do
D Mine Server Logs and Compile Repository
D recommended `An upper limit of 2 months is suggested for crawl logs`
D When shipping logs, send over encrypted state to ELK stack on the network
D basically to keeping your info.. yours, a logical first step...
What implementation will also do
D Store symmetric crawl schedules (so you can find the otherwise random actions...)
D Give real time feedback on crawl errors and application issues...
“ Find bots, you must”
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248
"http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200
8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF"
"Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET
/pics/5star2000.gif HTTP/1.0" 200 4005
"http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"
123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics
Shipping Logs.. WOOHOO
MUST FIND: GBOT
SERVER: Here’s Everything,
BABY.
*Command to Slice the Server Log file for shipping: split -l 200000 logfile.log
E L K
S T A C K
by the docs
Diagram based on:
https://logz.io/learn/complete-guide-elk-stack/
Google
Bot
Webserver
Visualization Dashboard
So what just happened?
D ClientConnects: “Everytime a client (or bot) connects to the server through apache, a server entry is
formed in the apache-access log.”
D “ W h e n t h i s i s c o m p l e t e , the log shipper will send the sliced log files into logstash which will be
processed and then placed into Elasticsearch.”*
D E l a s t i c s e a rc h : “…provides a distributed, multitenant-capable full-text search engine with an HTTP
web interface and schema-free JSON documents…”
D Kibana: “This is the final step to visualizing the data within Elasticsearch so that we can sort and compile
the data we need. […] This is where it get’s interesting, we will live here.” *
*Elasticsearch quoted from here: https://qbox.io/blog/what-is-elasticsearch
Background Summary
We’re Looking Here
For These
Because of That
Apache.log
Mission
We want to be able to see data trends and crawl patterns from Google bots navigating
the webserver.
We want to gather any contextual information that we can use for forensic purposes,
regardless of whether or not we can accomplish the above
We (as an adversary) want to be able to map Google bots direct actions and compile data
trends to be able to predict and guide the data a Google bot digests in a crawl.
D We want to do this without manipulating Google bots.. Too much. ;)
S e c r 3 t s
The Observed Data
RequestsUser Agent
Server Response IP Address
Source ID*
Exits
+
Organization! We got ’em…all.
The Data itself.
D JSON Format / Organized and saved
D Re q u i red for future processingD Oh..
And it can be real time ;)
Approach Premise:
D Organize Server Logs into Line-items
D Requests are relatively small
D S e r v e r logs are larger and cluttered
D … they are a “log” after all.
D Create relationships based off JSON requests
D
FindAES from: http://jessekornblum.com/tools/
Grouping Data
_New_Search
4 Agent
4 Time
4/8 Response [“code”]
4/8 Request [“URL”]
CSslUserContext
Look Complicated? Let’s go more into this on a later date.
_LOG_ITEM
4 IP Address
4 Request
4 Server Response
4/8 Referral
4 Timestamp
4/8 Agent
_Relationship_MAP
4 Request URL
4 SERVER Response
4 Bot Type
... ...
4 Bytes downloaded
? Referral link
? Exit
The Results
This functions do three things:
D Isolate drop-off or dead spots
D Return deep error reporting
D Check natural flow of Crawl bot types
A Complete Website Google Crawl Map
*There are many ways to visualize this.
So how do we manipulate?
Scrape.. scrape
Pre-Planned crawl routes
D B o t s a r e n o t S m a r t
D As Bots will always follow a given path that’s
laid out to them, we are always in control.
D Silo’s are not dead. Instead they are topically
focused.
D Provide a crawl path that makes sense for a user
navigating your funnel. Outline relationships and
target dead areas.
Bottom line:
Create absolute relationship links between
pages for bots.
Reactive Server Actions
D P r e p r o g r a m m e d r e s p o n s e a c t i o n s
D You can program your server to act reactive to
requests and craft responses accordingly. Ie:
Using 304 response codes.
D Addressing incorrect internal referral
sources.
D Provide crawl bot with unique meta-data within
headers and avoid silly server errors. Bottom line:
You control your server and how it behaves.
Google bots are just here for the ride.
ThE
Ta K e a w a y s
You CAN track crawl bots and form trends!
Crawlbots are just scrapers!
Fin.
QUestions?
tyler@nav43.com

More Related Content

Viewers also liked

Modern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with PuppetModern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with PuppetPuppet
 
Hadoop / Spark on Malware Expression
Hadoop / Spark on Malware ExpressionHadoop / Spark on Malware Expression
Hadoop / Spark on Malware ExpressionMapR Technologies
 
Elegant Ways of Handling PHP Errors and Exceptions
Elegant Ways of Handling PHP Errors and ExceptionsElegant Ways of Handling PHP Errors and Exceptions
Elegant Ways of Handling PHP Errors and ExceptionsZendCon
 
concepto de colección local
concepto de colección localconcepto de colección local
concepto de colección localguestf488db7
 
Architecting your Splunk deployment
Architecting your Splunk deploymentArchitecting your Splunk deployment
Architecting your Splunk deploymentSplunk
 
Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?LeanIX GmbH
 
Human Capital in de 21e eeuw
Human Capital in de 21e eeuwHuman Capital in de 21e eeuw
Human Capital in de 21e eeuwhan mesters
 
Say no to var_dump
Say no to var_dumpSay no to var_dump
Say no to var_dumpbenwaine
 
Powerupcloud - Customer Case Studies
Powerupcloud - Customer Case StudiesPowerupcloud - Customer Case Studies
Powerupcloud - Customer Case StudiesJonathyan Maas ☁
 
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)Amazon Web Services
 
Delphi XE2, door André Mussche op de 4DotNet Developers Day
Delphi XE2, door André Mussche op de 4DotNet Developers DayDelphi XE2, door André Mussche op de 4DotNet Developers Day
Delphi XE2, door André Mussche op de 4DotNet Developers DayHanneke Dotnet
 
Joomladagen 2015 Joomla Performance
Joomladagen 2015 Joomla PerformanceJoomladagen 2015 Joomla Performance
Joomladagen 2015 Joomla PerformanceSimon Kloostra
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesSushant Shankar
 

Viewers also liked (16)

Modern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with PuppetModern Infrastructure from Scratch with Puppet
Modern Infrastructure from Scratch with Puppet
 
Book of Fauna and Flora
Book of Fauna and FloraBook of Fauna and Flora
Book of Fauna and Flora
 
Hadoop / Spark on Malware Expression
Hadoop / Spark on Malware ExpressionHadoop / Spark on Malware Expression
Hadoop / Spark on Malware Expression
 
Elegant Ways of Handling PHP Errors and Exceptions
Elegant Ways of Handling PHP Errors and ExceptionsElegant Ways of Handling PHP Errors and Exceptions
Elegant Ways of Handling PHP Errors and Exceptions
 
concepto de colección local
concepto de colección localconcepto de colección local
concepto de colección local
 
Architecting your Splunk deployment
Architecting your Splunk deploymentArchitecting your Splunk deployment
Architecting your Splunk deployment
 
Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?
 
Human Capital in de 21e eeuw
Human Capital in de 21e eeuwHuman Capital in de 21e eeuw
Human Capital in de 21e eeuw
 
Say no to var_dump
Say no to var_dumpSay no to var_dump
Say no to var_dump
 
Mohamed Ahmed Abdelkhalek
Mohamed Ahmed AbdelkhalekMohamed Ahmed Abdelkhalek
Mohamed Ahmed Abdelkhalek
 
Powerupcloud - Customer Case Studies
Powerupcloud - Customer Case StudiesPowerupcloud - Customer Case Studies
Powerupcloud - Customer Case Studies
 
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
 
Delphi XE2, door André Mussche op de 4DotNet Developers Day
Delphi XE2, door André Mussche op de 4DotNet Developers DayDelphi XE2, door André Mussche op de 4DotNet Developers Day
Delphi XE2, door André Mussche op de 4DotNet Developers Day
 
Joomladagen 2015 Joomla Performance
Joomladagen 2015 Joomla PerformanceJoomladagen 2015 Joomla Performance
Joomladagen 2015 Joomla Performance
 
Wapenrusting
WapenrustingWapenrusting
Wapenrusting
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion houses
 

Recently uploaded

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 

Recently uploaded (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 

Tyler McConville: Mind Controlling Crawl Bots for Success

  • 1. Mind Control MANIPULATING BOTS FOR SUCCESS Advanced SEO SUMMIT 1492450200
  • 2. Why Should you care? What you get out of this talk D Ability to understand how Crawl bots function and View your Website D Ability decrypt and understand Bot behaviour and Triggers D Prioritize Pages and content as well as identify roadblocks
  • 3. [~]$ whoami NAV43Tyler McConville D CEO & CO-Founder of NAV43 D Technical Search Engine Optimizer D 7 YEAR SERP JOURNEY D Survivor of 2013 ThoseWho HaveFallen… In the SERPs
  • 4. Agenda D A very short understanding of how Google works. D A background on Crawl Bot duties D Tracking those f#@kers! D The Secret Data D Key Take-Aways D Questions anyone?
  • 5. Disclaimer This is NOT an exploit resource D It’s just an understanding from tests ヽ༼ຈ‫ل‬͜ ຈ༽ノ D …and some implementation specific oddities Google has done nothing [especially] wrong D To the contrary, their bots are quite organized Modifying your server and misdirecting Google bots can be Damaging D If not done right. You have been forewarned.. Duplicating this is NOT guaranteed to Rank you D I’m looking at you.. Understand the concepts before implementing them.
  • 7. The now infamous Google Crawl [ Initial Connection ] E.G.: BASIC WEB CRAWL SIMPLIFIED
  • 8. Google Bots and Duties Google Bots a brief explanation D C ra w l e r -> A discovery program for Google! D A bot that mines “meta-data” and organizes “relationship” mapping D Technically a robust scraper running within a cluster D Spoiler: Google Crawl bots aren’t intelligent! List of Google Crawl Bots and functions D Desktop D Standard Website Scraper D Smartphone D Standard Scraper with mobile rules D Image D Image check & meta data gatherer D Video D Video Render & Processing D News D App
  • 9. Removing the Distortion What we want to do D Isolate Google crawlers from Users! What to actually do D Mine Server Logs and Compile Repository D recommended `An upper limit of 2 months is suggested for crawl logs` D When shipping logs, send over encrypted state to ELK stack on the network D basically to keeping your info.. yours, a logical first step... What implementation will also do D Store symmetric crawl schedules (so you can find the otherwise random actions...) D Give real time feedback on crawl errors and application issues... “ Find bots, you must” 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)" 123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)" 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)" 123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics
  • 10. Shipping Logs.. WOOHOO MUST FIND: GBOT SERVER: Here’s Everything, BABY. *Command to Slice the Server Log file for shipping: split -l 200000 logfile.log
  • 11. E L K S T A C K by the docs Diagram based on: https://logz.io/learn/complete-guide-elk-stack/ Google Bot Webserver Visualization Dashboard
  • 12. So what just happened? D ClientConnects: “Everytime a client (or bot) connects to the server through apache, a server entry is formed in the apache-access log.” D “ W h e n t h i s i s c o m p l e t e , the log shipper will send the sliced log files into logstash which will be processed and then placed into Elasticsearch.”* D E l a s t i c s e a rc h : “…provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents…” D Kibana: “This is the final step to visualizing the data within Elasticsearch so that we can sort and compile the data we need. […] This is where it get’s interesting, we will live here.” * *Elasticsearch quoted from here: https://qbox.io/blog/what-is-elasticsearch
  • 13. Background Summary We’re Looking Here For These Because of That Apache.log
  • 14. Mission We want to be able to see data trends and crawl patterns from Google bots navigating the webserver. We want to gather any contextual information that we can use for forensic purposes, regardless of whether or not we can accomplish the above We (as an adversary) want to be able to map Google bots direct actions and compile data trends to be able to predict and guide the data a Google bot digests in a crawl. D We want to do this without manipulating Google bots.. Too much. ;)
  • 15. S e c r 3 t s
  • 16. The Observed Data RequestsUser Agent Server Response IP Address Source ID* Exits +
  • 17. Organization! We got ’em…all.
  • 18. The Data itself. D JSON Format / Organized and saved D Re q u i red for future processingD Oh.. And it can be real time ;) Approach Premise: D Organize Server Logs into Line-items D Requests are relatively small D S e r v e r logs are larger and cluttered D … they are a “log” after all. D Create relationships based off JSON requests D FindAES from: http://jessekornblum.com/tools/
  • 19. Grouping Data _New_Search 4 Agent 4 Time 4/8 Response [“code”] 4/8 Request [“URL”] CSslUserContext Look Complicated? Let’s go more into this on a later date. _LOG_ITEM 4 IP Address 4 Request 4 Server Response 4/8 Referral 4 Timestamp 4/8 Agent _Relationship_MAP 4 Request URL 4 SERVER Response 4 Bot Type ... ... 4 Bytes downloaded ? Referral link ? Exit
  • 20. The Results This functions do three things: D Isolate drop-off or dead spots D Return deep error reporting D Check natural flow of Crawl bot types A Complete Website Google Crawl Map *There are many ways to visualize this.
  • 21. So how do we manipulate? Scrape.. scrape
  • 22. Pre-Planned crawl routes D B o t s a r e n o t S m a r t D As Bots will always follow a given path that’s laid out to them, we are always in control. D Silo’s are not dead. Instead they are topically focused. D Provide a crawl path that makes sense for a user navigating your funnel. Outline relationships and target dead areas. Bottom line: Create absolute relationship links between pages for bots.
  • 23. Reactive Server Actions D P r e p r o g r a m m e d r e s p o n s e a c t i o n s D You can program your server to act reactive to requests and craft responses accordingly. Ie: Using 304 response codes. D Addressing incorrect internal referral sources. D Provide crawl bot with unique meta-data within headers and avoid silly server errors. Bottom line: You control your server and how it behaves. Google bots are just here for the ride.
  • 24. ThE Ta K e a w a y s
  • 25. You CAN track crawl bots and form trends!
  • 26. Crawlbots are just scrapers!
  • 27. Fin.