SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
DevTools to crawl Webpages.
DevTools




09.05.12   @chrschneider   2
DevTools

                                    … Apache … toolset of low level Java components
                                    focused on HTTP and associated protocols.“



  ●   HttpComponents Core
          … is a set of low level HTTP transport components

  ●   HttpComponents Client
          … provides reusable components for client-side ... HTTP connection
          management.

  ●   HttpComponents AsyncClient (DEV)
          … ability to handle a great number of concurrent connections ... more ...
          performance in terms of a raw data throughput.

  ●   Commons HttpClient (Legacy)
         … All users of Commons HttpClient 3.x are strongly encouraged to upgrade to
         HttpClient 4.1.


09.05.12                               @chrschneider                                  3
DevTools

                                                      HttpComponents Client




       Example Components

           ●   Get, Post, Delete, … Request Objects

           ●   Cookie Manager

           ●   SSL

           ●   Content Encoding Aware

           ●   HTTP Authentication (Basic, Digest, ...)




09.05.12                                   @chrschneider                      4
DevTools

                                                      HttpComponents Client Example




           public final static void main(final String[] args) throws Exception
           {

                final HttpClient httpclient = new DefaultHttpClient();
                try
                {
                      final HttpGet httpget = new HttpGet("http://www.google.com/");

                      System.out.println("executing request " + httpget.getURI());

                      // Create a response handler
                      final ResponseHandler<String> responseHandler = new BasicResponseHandler();
                      final String responseBody = httpclient.execute(httpget, responseHandler);
                      System.out.println("----------------------------------------");
                      System.out.println(responseBody);
                      System.out.println("----------------------------------------");

                }
                finally
                {
                      httpclient.getConnectionManager().shutdown();
                }
           }


                                                              http://hc.apache.org/httpcomponents-client-ga/examples.html


09.05.12                                         @chrschneider                                                   5
DevTools

                      HttpComponents Client




               Demo




09.05.12   @chrschneider                      6
DevTools




           … is an asynchronous event-driven network application framework for rapid
           development of maintainable high performance protocol servers & clients.




                          See: http://netty.io/




09.05.12                             @chrschneider                                     7
DevTools

                                      … is a "GUI-Less browser for Java programs"


 Features (extraction):
  ● Support for the HTTP and HTTPS protocols

  ● Support for cookies

  ● Ability to specify whether failing responses from the server should throw exceptions

    or should be returned as pages of the appropriate type (based on content type)
  ● Ability to customize the request headers being sent to the server

  ● Support for HTML responses



   ●   Support for submitting forms
   ●   Support for clicking links
   ●   Support for walking the DOM model of the HTML document
   ●   JavaScript support




09.05.12                             @chrschneider                                 8
DevTools

                                                  … is a "GUI-Less browser for Java programs"



      @Test
      public void homePage() throws Exception
      {
            final WebClient webClient = new WebClient();
            final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");

           System.out.println(page.getTitleText());

           assertEquals("Welcome to HtmlUnit", page.getTitleText());

           final String pageAsXml = page.asXml();
           assertTrue(pageAsXml.contains("<body class="composite">"));

           final String pageAsText = page.asText();
           assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols"));

           webClient.closeAllWindows();
      }




                                                                               http://htmlunit.sourceforge.net/gettingStarted.html


09.05.12                                        @chrschneider                                                          9
DevTools

                                                                 … is a "GUI-Less browser for Java programs"




           @Test
           public void   getElements() throws Exception
           {
                 final   WebClient webClient = new WebClient();
                 final   HtmlPage page = webClient.getPage("http://some_url");
                 final   HtmlDivision div = page.getHtmlElementById("some_div_id");
                 final   HtmlAnchor anchor = page.getAnchorByName("anchor_name");

                 webClient.closeAllWindows();
           }


                                                                                                       Luxus :)



     Note: Also html tables are supported. They wrote easy wrapper classes to walk though them. … Handy!
     http://htmlunit.sourceforge.net/table-howto.html




                                                                                                       http://htmlunit.sourceforge.net/gettingStarted.html


09.05.12                                                      @chrschneider                                                                   10
DevTools

                                             … automates browsers. That's it.




    Selenium-WebDriver supports the following browsers along with the
    operating systems these browsers are compatible with.
      ●    Google Chrome 12.0.712.0+
      ●    Internet Explorer 6, 7, 8, 9 - 32 and 64-bit where applicable
      ●    Firefox 3.0, 3.5, 3.6, 4.0, 5.0, 6, 7
      ●    Opera 11.5+
      ●    HtmlUnit 2.9
      ●    Android – 2.3+ for phones and tablets (devices & emulators)
      ●    iOS 3+ for phones (devices & emulators) and 3.2+ for tablets (devices
           & emulators)


09.05.12                                    @chrschneider                          11
DevTools

                                … automates browsers. That's it.




                           The Selenium Family

           Selenium IDE



                                               Also c#, Phython, Ruby, ...
           Selenium WebDriver

                                                               Also on Windows and Mac



           Selenium Grid



09.05.12                     @chrschneider                                      12
DevTools

                                … automates browsers. That's it.




                           The Selenium Family

                                    … create quick bug reproduction scripts
           Selenium IDE
                                    … create scripts to aid in automation-aided
                                    exploratory testing


           Selenium WebDriver       … create robust, browser-based regression
                                    automation

                                    … scale and distribute scripts across many
                                    environments
           Selenium Grid

                                                                     http://seleniumhq.org/


09.05.12                     @chrschneider                                     13
DevTools

                                            Requirements for Selenium WebDriver with Firefox
                                                             (and HtmlUnit)




              Dependencies                                        Browser Binaries
   <dependency>
         <groupId>org.seleniumhq.selenium</groupId>
         <artifactId>selenium-java</artifactId>
         <version>2.21.0</version>
   </dependency>

   <dependency>
         <groupId>org.seleniumhq.selenium</groupId>
         <artifactId>selenium-htmlunit-driver</artifactId>
         <version>2.21.0</version>
   </dependency>

   <dependency>
         <groupId>org.seleniumhq.selenium</groupId>




                                                                           it.
         <artifactId>selenium-firefox-driver</artifactId>




                                                                          's
         <version>2.21.0</version>




                                                                        at
                                                                      Th
   </dependency>




09.05.12                                         @chrschneider                           14
DevTools

                                                               Basic Selenium example




    @Test
    public void testSeleniumWithFirefox() throws InterruptedException
    {
          final WebDriver webDriver = new FirefoxDriver();

           webDriver.get("http://www.majug.de");

           final WebElement veranstaltungenLink = webDriver.findElement(By.linkText("Veranstaltungen"));

           veranstaltungenLink.click();

           // Close the browser
           Thread.sleep(5000);
           webDriver.quit();
    }




09.05.12                                           @chrschneider                                           15
DevTools

                                        Selenium WebDriver Locator Strategies




 It's also possible to call findElements(...) to get a List<> of WebElements.:

               List<WebElement> hits = webDriver.findElements(By.tagName("a"));




09.05.12                                     @chrschneider                        16
DevTools

                                      Selenium WebDriver Interactions




  If you got a webElement, you can...

     ●   webElement.click() it

     ●   webElement.sendKeys(...) to it

     ●   webElement.submit() on it.


  It is also possible to perform “Actions“ like DoubleClick, DragAndDrop, ClickAndHold, …
  with the “Actions“ class.




09.05.12                                  @chrschneider                          17
DevTools

                           Selenium WebDriver




              Demo




09.05.12   @chrschneider                        18
DevTools

                                                        Selenium WebDriver Pitfalls




    Newbie Pitfalls:

    ●   Selenium doesn't wait until the hole site is loaded (Keyword: Implicit wait)
    ●   webElement.xPath(“@// ...“) starts from root of the DOM (use “.//...“ instead)
    ●   Google brings up “Selenium RC“ solutions. This is the old Selenium project.
    ●   A reference to a WebElement will become invalid if the driver “moves“ to
        another page.
    ●   Firefox doesn't run on our CI because it is a headless system (try Xvfb)
    ●   New XPath 2.0 functions (like ends-with(...)) are failing. This is because Selenium
        uses the driver's native Xpath engine. For Firefox this means it is Xpath 1.0 today.




09.05.12                                @chrschneider                                 19
Noch Fragen?
Vielen Dank für Ihre Aufmerksamkeit!

Mais conteúdo relacionado

Mais procurados

UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
Tobias Schneck
 
George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management
webcontent2007
 
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
Tobias Schneck
 

Mais procurados (17)

GWT Introduction and Overview - SV Code Camp 09
GWT Introduction and Overview - SV Code Camp 09GWT Introduction and Overview - SV Code Camp 09
GWT Introduction and Overview - SV Code Camp 09
 
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
UI-Testing - Selenium? Rich-Clients? Containers? @APEX connect 2018
 
Java Web Programming [9/9] : Web Application Security
Java Web Programming [9/9] : Web Application SecurityJava Web Programming [9/9] : Web Application Security
Java Web Programming [9/9] : Web Application Security
 
When dynamic becomes static
When dynamic becomes staticWhen dynamic becomes static
When dynamic becomes static
 
George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management George Thiruvathukal, User Experiences with Plone Content Management
George Thiruvathukal, User Experiences with Plone Content Management
 
Selenium Clinic Eurostar 2012 WebDriver Tutorial
Selenium Clinic Eurostar 2012 WebDriver TutorialSelenium Clinic Eurostar 2012 WebDriver Tutorial
Selenium Clinic Eurostar 2012 WebDriver Tutorial
 
Introduction tomaven
Introduction tomavenIntroduction tomaven
Introduction tomaven
 
softshake 2014 - Java EE
softshake 2014 - Java EEsoftshake 2014 - Java EE
softshake 2014 - Java EE
 
JEE Programming - 05 JSP
JEE Programming - 05 JSPJEE Programming - 05 JSP
JEE Programming - 05 JSP
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 
Protractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applicationsProtractor framework – how to make stable e2e tests for Angular applications
Protractor framework – how to make stable e2e tests for Angular applications
 
Se lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
Se lancer dans l'aventure microservices avec Spring Cloud - Julien RoySe lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
Se lancer dans l'aventure microservices avec Spring Cloud - Julien Roy
 
Maven basic concept
Maven basic conceptMaven basic concept
Maven basic concept
 
Testing Ext JS and Sencha Touch
Testing Ext JS and Sencha TouchTesting Ext JS and Sencha Touch
Testing Ext JS and Sencha Touch
 
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
UI Testing - Selenium? Rich-Clients? Containers? (SwanseaCon 2018)
 
In the Brain of Hans Dockter: Gradle
In the Brain of Hans Dockter: GradleIn the Brain of Hans Dockter: Gradle
In the Brain of Hans Dockter: Gradle
 
Automated php unit testing in drupal 8
Automated php unit testing in drupal 8Automated php unit testing in drupal 8
Automated php unit testing in drupal 8
 

Destaque

Medical management of hiv infection
Medical management of hiv infectionMedical management of hiv infection
Medical management of hiv infection
Imran Khan
 
Medical students amnesia
Medical students amnesiaMedical students amnesia
Medical students amnesia
Imran Khan
 
Eustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_planEustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_plan
edharewood
 
Webportfolio nayuribe c
Webportfolio nayuribe cWebportfolio nayuribe c
Webportfolio nayuribe c
Nayuribe Ch
 
2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Report2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Report
edharewood
 

Destaque (16)

Cr9 ppt
Cr9 pptCr9 ppt
Cr9 ppt
 
Medical management of hiv infection
Medical management of hiv infectionMedical management of hiv infection
Medical management of hiv infection
 
Arthritis
ArthritisArthritis
Arthritis
 
Medical students amnesia
Medical students amnesiaMedical students amnesia
Medical students amnesia
 
Work sample
Work sampleWork sample
Work sample
 
Eustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_planEustace_Harewood_security_company_business_plan
Eustace_Harewood_security_company_business_plan
 
DMC Event Presentation for 4 26-2012
DMC Event Presentation for 4 26-2012DMC Event Presentation for 4 26-2012
DMC Event Presentation for 4 26-2012
 
De tapete a chica de ensueño
De tapete a chica de ensueñoDe tapete a chica de ensueño
De tapete a chica de ensueño
 
Webportfolio nayuribe c
Webportfolio nayuribe cWebportfolio nayuribe c
Webportfolio nayuribe c
 
2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Report2010 - 2011 Bermuda Salary Trends Report
2010 - 2011 Bermuda Salary Trends Report
 
Value of Enhanced Hotel Security
Value of Enhanced Hotel SecurityValue of Enhanced Hotel Security
Value of Enhanced Hotel Security
 
Marilyn Monroe
Marilyn MonroeMarilyn Monroe
Marilyn Monroe
 
Embedding with Tableau Server
Embedding with Tableau ServerEmbedding with Tableau Server
Embedding with Tableau Server
 
Paxil: New Indication, New Patients to Help
Paxil: New Indication, New Patients to HelpPaxil: New Indication, New Patients to Help
Paxil: New Indication, New Patients to Help
 
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
Focus MS: Accessing the Use of a Patient Centric Model when Treating Multiple...
 
ADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
ADKN, Co. Consulting Team Qsymia Strategic Marketing PlanADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
ADKN, Co. Consulting Team Qsymia Strategic Marketing Plan
 

Semelhante a Innoplexia DevTools to Crawl Webpages

Selenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep ShardaSelenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep Sharda
Er. Sndp Srda
 
Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022
NAVER D2
 
eXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework IntroductioneXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework Introduction
vstorm83
 

Semelhante a Innoplexia DevTools to Crawl Webpages (20)

Knolx session
Knolx sessionKnolx session
Knolx session
 
Selenium Automation in Java Using HttpWatch Plug-in
 Selenium Automation in Java Using HttpWatch Plug-in  Selenium Automation in Java Using HttpWatch Plug-in
Selenium Automation in Java Using HttpWatch Plug-in
 
Mastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium SuccessfullyMastering Test Automation: How To Use Selenium Successfully
Mastering Test Automation: How To Use Selenium Successfully
 
Advanced JavaScript
Advanced JavaScriptAdvanced JavaScript
Advanced JavaScript
 
Session on Selenium 4 : What’s coming our way? by Hitesh Prajapati
Session on Selenium 4 : What’s coming our way? by Hitesh PrajapatiSession on Selenium 4 : What’s coming our way? by Hitesh Prajapati
Session on Selenium 4 : What’s coming our way? by Hitesh Prajapati
 
Selenium 4 - What's coming our way - v1.0.pptx
Selenium 4 - What's coming our way - v1.0.pptxSelenium 4 - What's coming our way - v1.0.pptx
Selenium 4 - What's coming our way - v1.0.pptx
 
Selenium.pptx
Selenium.pptxSelenium.pptx
Selenium.pptx
 
C# Security Testing and Debugging
C# Security Testing and DebuggingC# Security Testing and Debugging
C# Security Testing and Debugging
 
Selenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep ShardaSelenium Introduction by Sandeep Sharda
Selenium Introduction by Sandeep Sharda
 
Week 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. WuWeek 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. Wu
 
Automated ui testing with selenium. drupal con london 2011
Automated ui testing with selenium. drupal con london 2011Automated ui testing with selenium. drupal con london 2011
Automated ui testing with selenium. drupal con london 2011
 
Modern Web Technologies
Modern Web TechnologiesModern Web Technologies
Modern Web Technologies
 
Developing Java Web Applications
Developing Java Web ApplicationsDeveloping Java Web Applications
Developing Java Web Applications
 
Session on Selenium Powertools by Unmesh Gundecha
Session on Selenium Powertools by Unmesh GundechaSession on Selenium Powertools by Unmesh Gundecha
Session on Selenium Powertools by Unmesh Gundecha
 
Selenium WebDriver training
Selenium WebDriver trainingSelenium WebDriver training
Selenium WebDriver training
 
Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022Deview 2013 mobile browser internals and trends_20131022
Deview 2013 mobile browser internals and trends_20131022
 
HTML5 Intoduction for Web Developers
HTML5 Intoduction for Web DevelopersHTML5 Intoduction for Web Developers
HTML5 Intoduction for Web Developers
 
eXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework IntroductioneXo Platform SEA - Play Framework Introduction
eXo Platform SEA - Play Framework Introduction
 
Zend Framework Quick Start Walkthrough
Zend Framework Quick Start WalkthroughZend Framework Quick Start Walkthrough
Zend Framework Quick Start Walkthrough
 
The Theory Of The Dom
The Theory Of The DomThe Theory Of The Dom
The Theory Of The Dom
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Innoplexia DevTools to Crawl Webpages

  • 1. DevTools to crawl Webpages.
  • 2. DevTools 09.05.12 @chrschneider 2
  • 3. DevTools … Apache … toolset of low level Java components focused on HTTP and associated protocols.“ ● HttpComponents Core … is a set of low level HTTP transport components ● HttpComponents Client … provides reusable components for client-side ... HTTP connection management. ● HttpComponents AsyncClient (DEV) … ability to handle a great number of concurrent connections ... more ... performance in terms of a raw data throughput. ● Commons HttpClient (Legacy) … All users of Commons HttpClient 3.x are strongly encouraged to upgrade to HttpClient 4.1. 09.05.12 @chrschneider 3
  • 4. DevTools HttpComponents Client Example Components ● Get, Post, Delete, … Request Objects ● Cookie Manager ● SSL ● Content Encoding Aware ● HTTP Authentication (Basic, Digest, ...) 09.05.12 @chrschneider 4
  • 5. DevTools HttpComponents Client Example public final static void main(final String[] args) throws Exception { final HttpClient httpclient = new DefaultHttpClient(); try { final HttpGet httpget = new HttpGet("http://www.google.com/"); System.out.println("executing request " + httpget.getURI()); // Create a response handler final ResponseHandler<String> responseHandler = new BasicResponseHandler(); final String responseBody = httpclient.execute(httpget, responseHandler); System.out.println("----------------------------------------"); System.out.println(responseBody); System.out.println("----------------------------------------"); } finally { httpclient.getConnectionManager().shutdown(); } } http://hc.apache.org/httpcomponents-client-ga/examples.html 09.05.12 @chrschneider 5
  • 6. DevTools HttpComponents Client Demo 09.05.12 @chrschneider 6
  • 7. DevTools … is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. See: http://netty.io/ 09.05.12 @chrschneider 7
  • 8. DevTools … is a "GUI-Less browser for Java programs" Features (extraction): ● Support for the HTTP and HTTPS protocols ● Support for cookies ● Ability to specify whether failing responses from the server should throw exceptions or should be returned as pages of the appropriate type (based on content type) ● Ability to customize the request headers being sent to the server ● Support for HTML responses ● Support for submitting forms ● Support for clicking links ● Support for walking the DOM model of the HTML document ● JavaScript support 09.05.12 @chrschneider 8
  • 9. DevTools … is a "GUI-Less browser for Java programs" @Test public void homePage() throws Exception { final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net"); System.out.println(page.getTitleText()); assertEquals("Welcome to HtmlUnit", page.getTitleText()); final String pageAsXml = page.asXml(); assertTrue(pageAsXml.contains("<body class="composite">")); final String pageAsText = page.asText(); assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols")); webClient.closeAllWindows(); } http://htmlunit.sourceforge.net/gettingStarted.html 09.05.12 @chrschneider 9
  • 10. DevTools … is a "GUI-Less browser for Java programs" @Test public void getElements() throws Exception { final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http://some_url"); final HtmlDivision div = page.getHtmlElementById("some_div_id"); final HtmlAnchor anchor = page.getAnchorByName("anchor_name"); webClient.closeAllWindows(); } Luxus :) Note: Also html tables are supported. They wrote easy wrapper classes to walk though them. … Handy! http://htmlunit.sourceforge.net/table-howto.html http://htmlunit.sourceforge.net/gettingStarted.html 09.05.12 @chrschneider 10
  • 11. DevTools … automates browsers. That's it. Selenium-WebDriver supports the following browsers along with the operating systems these browsers are compatible with. ● Google Chrome 12.0.712.0+ ● Internet Explorer 6, 7, 8, 9 - 32 and 64-bit where applicable ● Firefox 3.0, 3.5, 3.6, 4.0, 5.0, 6, 7 ● Opera 11.5+ ● HtmlUnit 2.9 ● Android – 2.3+ for phones and tablets (devices & emulators) ● iOS 3+ for phones (devices & emulators) and 3.2+ for tablets (devices & emulators) 09.05.12 @chrschneider 11
  • 12. DevTools … automates browsers. That's it. The Selenium Family Selenium IDE Also c#, Phython, Ruby, ... Selenium WebDriver Also on Windows and Mac Selenium Grid 09.05.12 @chrschneider 12
  • 13. DevTools … automates browsers. That's it. The Selenium Family … create quick bug reproduction scripts Selenium IDE … create scripts to aid in automation-aided exploratory testing Selenium WebDriver … create robust, browser-based regression automation … scale and distribute scripts across many environments Selenium Grid http://seleniumhq.org/ 09.05.12 @chrschneider 13
  • 14. DevTools Requirements for Selenium WebDriver with Firefox (and HtmlUnit) Dependencies Browser Binaries <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>2.21.0</version> </dependency> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-htmlunit-driver</artifactId> <version>2.21.0</version> </dependency> <dependency> <groupId>org.seleniumhq.selenium</groupId> it. <artifactId>selenium-firefox-driver</artifactId> 's <version>2.21.0</version> at Th </dependency> 09.05.12 @chrschneider 14
  • 15. DevTools Basic Selenium example @Test public void testSeleniumWithFirefox() throws InterruptedException { final WebDriver webDriver = new FirefoxDriver(); webDriver.get("http://www.majug.de"); final WebElement veranstaltungenLink = webDriver.findElement(By.linkText("Veranstaltungen")); veranstaltungenLink.click(); // Close the browser Thread.sleep(5000); webDriver.quit(); } 09.05.12 @chrschneider 15
  • 16. DevTools Selenium WebDriver Locator Strategies It's also possible to call findElements(...) to get a List<> of WebElements.: List<WebElement> hits = webDriver.findElements(By.tagName("a")); 09.05.12 @chrschneider 16
  • 17. DevTools Selenium WebDriver Interactions If you got a webElement, you can... ● webElement.click() it ● webElement.sendKeys(...) to it ● webElement.submit() on it. It is also possible to perform “Actions“ like DoubleClick, DragAndDrop, ClickAndHold, … with the “Actions“ class. 09.05.12 @chrschneider 17
  • 18. DevTools Selenium WebDriver Demo 09.05.12 @chrschneider 18
  • 19. DevTools Selenium WebDriver Pitfalls Newbie Pitfalls: ● Selenium doesn't wait until the hole site is loaded (Keyword: Implicit wait) ● webElement.xPath(“@// ...“) starts from root of the DOM (use “.//...“ instead) ● Google brings up “Selenium RC“ solutions. This is the old Selenium project. ● A reference to a WebElement will become invalid if the driver “moves“ to another page. ● Firefox doesn't run on our CI because it is a headless system (try Xvfb) ● New XPath 2.0 functions (like ends-with(...)) are failing. This is because Selenium uses the driver's native Xpath engine. For Firefox this means it is Xpath 1.0 today. 09.05.12 @chrschneider 19
  • 20. Noch Fragen? Vielen Dank für Ihre Aufmerksamkeit!