2. What is a (ro)bot?
• A piece of software (command-line, UI, service)
• Involved in automating some (manual) process
• Usually implemented in order to
• Make money
• Save money or some other resource
• Gather useful information
3. Some Examples
• Amazon book bots
• Spam bots
• Chat bots
• Trading bots
• Web crawlers/miners
4. Levels of automation
• Levels of activation
• Manual = you run a console/UI executable/script
• One off tasks
• Fully automatic = service,Windows scheduler
• Regular tasks
• Interaction modes
• Fully automatic = no interaction, only program updates
• Coarse tuning = small level of control to fine tune behavior
Possibly define starting parameters
• Fine tuning = bot with a human front-end
5. Polling Service
public partial class PollingService : ServiceBase
{
private readonly Thread workerThread;
public PollingService()
{
InitializeComponent();
workerThread = new Thread(DoWork);
workerThread.SetApartmentState(ApartmentState.STA);
}
protected override void OnStart(string[] args)
{ workerThread.Start(); }
protected override void OnStop() { workerThread.Abort(); }
private static void DoWork()
{
while (true)
{
log.Info("Doing work...");
// do some work, then
Thread.Sleep(1000);
}
}
}
6. Polling Service Usage
var service = new PollingService();
ServiceBase[] servicesToRun = new ServiceBase[] { service };
if (Environment.UserInteractive)
{
Console.CancelKeyPress += (x, y) => service.Stop();
service.Start();
Console.WriteLine("Running service, press a key to stop");
Console.ReadKey();
service.Stop();
Console.WriteLine("Service stopped. Goodbye.");
}
else
{
ServiceBase.Run(servicesToRun);
}
8. Processing Cmd Line Args
if (args != null && args.Length == 1 && args[0].Length > 1
&& (args[0][0] == '-' || args[0][0] == '/'))
{
switch (args[0].Substring(1).ToLower())
{
case "install":
case "i":
if (!ServiceInstallerUtility.InstallMe())
Console.WriteLine("Failed to install service");
break;
case "uninstall":
case "u":
if (!ServiceInstallerUtility.UninstallMe())
Console.WriteLine("Failed to uninstall service");
break;
default:
Console.WriteLine("Unrecognized parameters.");
break;
}
}
9. Gauging the Environment
• Available API (hooray!)
• Web scraping
• HTML parsing is easy, but…
• Captcha solving – not so much
• Screen scraping
• Taking screenshots and analysing pixels
• Manual input & subsequent analysis
• E.g., game Monte Carlo methods
10. Web Scraping
• Getting data from the web
• Parsing the data
• Caching/storing data as necessary
• Different approaches to HTML grab
• Grab it raw
• Spin up a ‘headless’ or proper browser
• Parsing HTML
• Not always correct
• HtmlAgilityPack
11. WatiN
• Web testing framework
• Opens up an actual, physical browser
• You might not need it!
• If you mine static content, just use WebClient etc.
• Real UI! Might want to do it under a separate account
• Throwing UI is nasty
• Some browser choice
12. WatiN Example
using (var browser = new IE("http://www.pokemon.com"))
{
var doc = new HtmlDocument();
doc.LoadHtml(browser.Body.OuterHtml);
var h1 =
doc.DocumentNode.SelectNodes("//h3").First();
Console.WriteLine(h1.InnerText);
}
13. Some Random Notes
• Launching IE requires STA
• Watin should be executed in 32-bit
• Polling is not ideal:WatiN and HTML parsing is not
fast!