For years there have been competing frameworks for driving the browser. First there was Watir, released by Bret Pettichord. Really good framework for driving the browser but written for Ruby people. Secondly there was Selenium, created by Jason Huggins, written in JavaScript. It is a good framework and works in many more browsers than Watir. Then came along WebDriver, created by Simon Stewart. Originally created to work against 2 browsers, IE and HTMLUnit. So what is happening now? Well, the future is happening now and it is Selenium 2!
Let's start by making sure that we know. We have Selenium IDE which is a Firefox addon that allows people to record, tweak and then replay there tests. When they are happy then they export them into their favourite language. Once testers have it in their favourite language they can then expand the test and add the necessary bits to it. Then we can carry on using RC directly or we can have a look at Grid to distribute the execution of our tests. Selenium is a wonderful tool for all of this since it runs in multiple browsers since its written in JavaScript. The downside to Selenium is that it is written in JavaScript and is limited by the JavaScript sandbox in the browser.
Webdriver is a way of driving the browser and only does so for a handful of browsers. We can see that it works for handful of browsers. Why do we do it like this? WebDriver tightly couples to the browser using the technology that best fits. WebDriver is a developer focused API so is extremly object orientated. So if we look at Internet Explorer it accesses the browser via the COM layer using some automation hooks that Microsoft have put there and maintain. The core code is written in C++. FirefoxDriver is a Firefox addon that accesses items at the Chrome layer ChromeDriver is a Chrome Extension that allows drive Chrome. The Android driver is an APK that allows us to drive the Web View and the same with iPhone This means that since we are tightly bound to the browser we don't have to use a server in the middle. WebDriver Scales up and down as we see fit! THe other benefit is It means that on browsers or devices we can send more native keystrokes. WedDriver also tries to limit the interaction the test can do by only allowing the test to interact with elements that a user would. An example of this is if you try click on an element that has a display:none in the style Webdriver will throw an exception saying the element isnt visible.
So I am sure that we have all tried to use a site that has a content editable element on the page. Selenium makes this extremely easy doesn't it? Well not it doesn't, it fails horribly. So if we run this code *REPL selenium.type If we do this in WebDriver we have text appearing... why? Well in this case we have a situation where the synthetic nature of Selenium has let us down. JavaScript doesn't know how to set its value. By sending through more native key strokes we can see how we can access this. But with all the new, tightly bound code how does each of the languages work?
All of the drivers have a common interface that all languages can speak to. The WebDriver project uses a wire protocol that allows you to speak to it via REST. So when we start each browser that we support there is a webserver within that we then speak to. This means that if we want to try get support for new browsers they just need to implement the API and we can then use it from the RemoteWebDriver. This simplifies the way that the drivers communicate which means that anyone looking at the code can try help out where need be.
So why should they merge? Each project has their strengths and weaknesses. For a number of things Selenium is a lot better than WebDriver. As I mentioned earlier, Webdriver only works for browsers that it has official bindings for because we are so tightly coupled to the browser. Selenium on the other hand works in all browsers that support JavaScript. Selenium, by being JavaScript, is bound by the JavaScript sandbox in the browser.
WebDriver is not bound by the JavaScript sand box so now we can do things a lot better and easier. Want to do a file upload? Not a problem! When you need to type something and then press the down arrow you no longer need to know what the ascii code is for the down arrow. We have simplified that with a Keys object and when you want that key pressed, we try fire all the events that are needed in the right order. Since both projects have their strengths and weaknesses and combining them makes the automation framework a lot stronger
WebDriver is designed by developers for developers. In Selenium you would get the Selenium object and then work with that one object call what you need. What you need could be anything from clicking to typing and was always against 1 object. WebDriver on the other hand follows a number of good object orientated design principles. We have a driver that starts the browser and gives us ways to find what we need on the page and returns another object representing the item in the DOM. We then use that object to do what we need. So if we take a simple form we just tell it to find the textbox. We then tell the object representing the textbox that we want to send it some key strokes and there we have it.
Selenium 2 is a lot faster than Selenium 1. I have seen speed up of at least 3 times on a number of projects Removing the need for a man-in-the-middle helps with the speed improvements. Since we are more closely bound to the browser we can take better advantage of how it does things. For Example, if we know that the browser can search for elements quicker than a library can we try do that. This means if you supply a CSS Selector and we know the browser can use it, we let the browser return the element instead of going through Sizzle. This can be increased when we don't have to rely on the Selenium server to act as an middle man for your tests. All Languages supported by the project also have access to HTMLUnit which is a headless browser by using the RemoteWebDriver
Since Selenium 2 has the ability to bind to the browser without the need for a server or we can add one when we see fit allows the new Selenium implementation to scale up and scale down. One of the selenium commiters has been experimenting at trying to get 36 different sessions running on a server. Being able to control that amount of browsers without needing to manage a full blow grid
So we kinda of saw this eariler but let us have a better look at it. So I just said that we use the driver to find an object on the page. We then execute and interact with them **REPL driver = webdriver.Chrome() textbox = driver.find_element_by_id("id") textbox.send_keys("let's type something") **REPL Having the code work this way is more how we think about code when we do normal OO development
All the Selenium 1 code will work as you have always used it! The new code is being placed next to the old Se1 code. The Selenium RC is becoming Selenium Server because it can now handle both the Selenium 1 API and the Selenium 2 RemoteDriver. There is a lot of code that is duplicated between selenium and webdriver, especially the JavaScript. The team is breaking this up into small JavaScript modules that can then be used between the two projects. We call these JavaScript modules Atoms since each of one these has the smallest amount of code that it needs to fulfil that task. This can then be compiled into the C++ code for use within IEDriver or can be packaged into the Firefox Addon for driving Mozilla Firefox or the ChromeExtension for driving Google Chrome.
Moving from Selenium 1 to 2 is a simple as creating a WebDriver object and injecting that into Selenium and using the exact same Selenium API that we have grown to love. It shows that with a 2 line change to your tests we can suddenly be using the new Se2 code. We have taken a lot of care in trying to make sure that we can be 100% backwards compatible. This will hopefully give people a really easy upgrade path.
What about Selenium Grid? A number of companies have invested heavily in running their tests in Parallel using Selenium Grid? What is going to happen to Selenium Grid with the changes. The OSS has been to fortunate to have had a donation from eBay. They have donated a new version of the Grid, which we are calling Grid 2, It can cope with both the Selenium 1 and Selenium 2 API's making it fully compatible with all that we need.
We all know that having working mobile sites is what is needed as more people start using Android phones or Tablets. Or if they are using iPhones, iPods or even iPads to view your site. Mobile versions of sites are becoming extremely important. Selenium 2 has good support for Android and iOS devices. The selenium project has servers that are installable onto these devices that allow us to test web applications. Let us see an example of this
Selenium 1 would never have really had a good chance at doing mobile, it would have required going through a number of different proxies to work and then there is no guarantee that the JavaScript would have worked. Since Selenium 2 tries to bind as closely to the browser as it can we can now start executing our commands So lets start up the iPhone Simulator since this is a Mac. Let's create a Selenium Driver for the iPhone iPhoneDriver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.IPHONE) Let's tell it to get a page. iPhoneDriver.get("http and get it to click on a button el = iPhoneDriver.find_element_by_id("id") el.send_keys("typing on an iPhone") **REPL
The future is starting to look like the browsers will give you access to Selenium.