SlideShare uma empresa Scribd logo
1 de 14
Simple web-scraping with Mechanize and Nokogiri Nov 8 th  2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
The need ,[object Object],[object Object]
Parse  HTML pages (DOM) =>  Nokogiri
Installation ,[object Object],[object Object]
! Version 1.0.0 can be more stable than 2.x.x for some complex queries (" gem install mechanize -v 1.0.0 " to enforce it).
Basic example require   'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content =>   "Riviera.rb"
Main usage ,[object Object]
Use the agent to  perform HTTP(S) requests  (get, post). Each request gives a Nokogiri page.
Parse the page  using CSS selectors, XPath, DOM iterators.
Fill and post forms  using intuitive helpers.
Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text   =>   'Green King' ) . first . click page3 = agent. back agent. user_agent  =  'My user agent'
Common parsing Selectors page. root . css ( 'body div.myclass' ) . each   {   | element |  …  } page. root . xpath ( '//h3/a[@class="l"]' ) . eac h   {   | element |  …  }
Common parsing Elements < div > < a   href = &quot; http://www.google.com &quot; > Click here < img   src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] =>   &quot;http: // www.google.com&quot; element. content =>   &quot;    Click here       &quot; element. children . second . name =>   &quot;img&quot; element. parent . name =>   &quot;div&quot; element
Filling and submitting forms Basic example Google search form =  agent. get ( 'http://www.google.com' ) . forms . first form. q  =  'Rivierarb' results_page = form. submit

Mais conteúdo relacionado

Mais procurados

A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...Ho Chi Minh City Software Testing Club
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutescameronbot
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With BloggingTakatsugu Shigeta
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsKonstantin Kudryashov
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management SystemValent Mustamin
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netVijay Saklani
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestraFabio Akita
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page siteMoneer kamal
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Pamela Fox
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rob
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワークTakatsugu Shigeta
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayFITC
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryDavid Park
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Richard Rabins
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersRobert Nyman
 

Mais procurados (20)

Changing Template Engine
Changing Template EngineChanging Template Engine
Changing Template Engine
 
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutes
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With Blogging
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projects
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management System
 
Capybara
CapybaraCapybara
Capybara
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.net
 
Ajax
AjaxAjax
Ajax
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestra
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page site
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008
 
BDD with cucumber
BDD with cucumberBDD with cucumber
BDD with cucumber
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワーク
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be Okay
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQuery
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
 
Symfony2
Symfony2Symfony2
Symfony2
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - Fronteers
 

Semelhante a Mechanize at the Ruby Drink-up of Sophia, November 2011

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint DevZeddy Iskandar
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]Chris Toohey
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileChris Toohey
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsAtlassian
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPStephan Schmidt
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Sergey Ilinsky
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2borkweb
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile WidgetsJose Palazon
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Finalematrix
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!Herman Peeren
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlJoomla!Days Netherlands
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgetsgiurca
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPyucefmerhi
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages pptIblesoft
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overviewreybango
 

Semelhante a Mechanize at the Ruby Drink-up of Sophia, November 2011 (20)

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint Dev
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
 
ASP_NET Features
ASP_NET FeaturesASP_NET Features
ASP_NET Features
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial Gadgets
 
Vb.Net Web Forms
Vb.Net  Web FormsVb.Net  Web Forms
Vb.Net Web Forms
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHP
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2
 
BluePrint Mobile Framework
BluePrint Mobile FrameworkBluePrint Mobile Framework
BluePrint Mobile Framework
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile Widgets
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Final
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nl
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgets
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITP
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages ppt
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overview
 

Mais de rivierarb

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013rivierarb
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013rivierarb
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013rivierarb
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012rivierarb
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012rivierarb
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012rivierarb
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012rivierarb
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011rivierarb
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011rivierarb
 

Mais de rivierarb (9)

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011
 

Último

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Mechanize at the Ruby Drink-up of Sophia, November 2011

  • 1. Simple web-scraping with Mechanize and Nokogiri Nov 8 th 2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
  • 2.
  • 3. Parse HTML pages (DOM) => Nokogiri
  • 4.
  • 5. ! Version 1.0.0 can be more stable than 2.x.x for some complex queries (&quot; gem install mechanize -v 1.0.0 &quot; to enforce it).
  • 6. Basic example require 'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content => &quot;Riviera.rb&quot;
  • 7.
  • 8. Use the agent to perform HTTP(S) requests (get, post). Each request gives a Nokogiri page.
  • 9. Parse the page using CSS selectors, XPath, DOM iterators.
  • 10. Fill and post forms using intuitive helpers.
  • 11. Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text => 'Green King' ) . first . click page3 = agent. back agent. user_agent = 'My user agent'
  • 12. Common parsing Selectors page. root . css ( 'body div.myclass' ) . each { | element | … } page. root . xpath ( '//h3/a[@class=&quot;l&quot;]' ) . eac h { | element | … }
  • 13. Common parsing Elements < div > < a href = &quot; http://www.google.com &quot; > Click here < img src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] => &quot;http: // www.google.com&quot; element. content => &quot; Click here &quot; element. children . second . name => &quot;img&quot; element. parent . name => &quot;div&quot; element
  • 14. Filling and submitting forms Basic example Google search form = agent. get ( 'http://www.google.com' ) . forms . first form. q = 'Rivierarb' results_page = form. submit
  • 15. Filling and submitting forms Fields When your HTML form has < input … name = &quot;myfield&quot; >...< / input > you can write form. myfield = 'The field value' form. field_with ( :name => 'myfield' ) . value = 'The field value' form. checkboxfield = '1' form. selectfield = '5'
  • 16. Filling and submitting forms Buttons ! Mechanize does not add the value of the button being clicked ! If the web server cares for buttons values in POST data, add them manually. < input type = &quot;submit&quot; name = &quot;btn1&quot; value = &quot;Clicked&quot;>...< / input > form. add_field ! ( 'btn1' , 'Clicked' ) b utton = form. button_with ( :name => 'btn1' ) page = form. click_button ( button )
  • 17.
  • 21. Use HTML parsers other than Nokogiri
  • 22. Does not have JavaScript engine (therefore no Ajax)
  • 23.
  • 26. Nokogiri element API This presentation is available under CC-BY license by Muriel Salvan
  • 27. Q/A