It's not the bugs you know that kill a website. It's the ones you can't see, lurking just out of sight, that get you. Learn how Lafayette College identified the Lovecraftian code horrors lurking beneath its feet with tools like Splunk (server log analysis), OSSEC (server-side bad behavior monitor) and SiteImprove (web page auditing tool) and then surgically eliminated the problems. Examples include PHP scripts spewing error notices into logs, undiscovered CAS authentication failures, and thumbnail generation scripts that choke on large files.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Just another bughunt
1. Just another bughunt?
Tools to improve your site without nuking it from orbit
Ken Newquist (@knewquist) | Charles Fulton (@mackensen) #DPA11
2. Who we are
Ken Newquist
Director, Web Applications Development
Lafayette College
Charles Fulton
Senior Web Applications Developer
Lafayette College
#DPA11
3. Rebuild or Fix?
● Your website’s problems may seem
intractable
● The temptation to nuke the bugs and start
fresh is strong
● We’ve found tools that identify the problems
so we can surgically eliminate them
○ (and find a few issues we didn’t know about in the
process)
#DPA11
13. Discovering your web
presence
● Define expected
behavior with OSSEC &
Nagios
● Test expectations with
Siteimprove & Splunk
● Here be monsters
#DPA11
15. The Lost Thumbnails
● Site: Moodle
● Tools: Splunk, OSSEC
● Outcome: Improved
Apache configuration
#DPA11
16. Sky falling!
● Splunk reported ~400 500 internal server
errors within a few minutes
● Also showed concentrated bursts of 404
errors when viewing resources
● Concern within department that sky was
falling
#DPA11
17. Sky not falling!
● System ran out of memory generating
thumbnails from massive images; threw
500s
● Preview of missing images generated the
404s
#DPA11
18. Outcomes
● Memory limits were not reasonable
● Users do not report catastrophic errors
#DPA11
20. What Lies Beneath
● 500 errors are reserved for server issues
● WordPress has notions of its own
○ Double-submitted comment? 500 error
○ Missing a required field? 500 error
○ Blank comment? 500 error
● OSSEC would ban all of these for bad
behavior
#DPA11
22. Outcomes
● Learned reasonable mistakes can yield
unreasonable error codes
● Hacked core to return 200s and 400s
instead
● Core is discussing what to do
○ https://core.trac.wordpress.org/ticket/11286
#DPA11
23. Revenge of the Base
Theme
● Site: WordPress
● Tools: Siteimprove
● Outcome: WordPress
theme fix; Apache
configuration change
#DPA11
25. Nothing to see here … oh wait--
● Developer dismissed initial reports of login
issues as user error
● Then Siteimprove said we had 1,800 new
broken links
● A two-character change in RHEL defaults for
httpd.conf broke WordPress
#DPA11
26. Lessons
● Small changes have vast consequences
● Documentation is doubleplusgood
#DPA11
27. The Incredible
Shrinking Provost
● Site: Drupal
● Tools: Splunk
● Outcome: Cleaned data in
ERP system
#DPA11
28. Who’s the fairest of them all?
● The directory passes the search query via a
GET parameter
● Splunk told us our associate provost, “Jane
Doe”, was most-searched by an order of
magnitude
#DPA11
29. ...we searched for “Jane Doe”...
...and the search returned...
...NOTHING!
#DPA11
35. Outcomes
● No one cares that we fixed the Virtual Tour
○ (we feel better though)
#DPA11
36. Mr. Foo and Mr. Bar
● Site: WordPress
● Tools: Splunk
● Outcome: Disproved long-standing
alleged bug
#DPA11
37. I swear I wasn’t there!
● Various reports over the years alleging that
WordPress improperly reported another user
was editing a post
● Much speculation and theorizing in absence
of facts
#DPA11
39. The Cache That
Wouldn’t Die
● Site: WordPress
● Tools: Nagios
● Outcome: Database
size reduced by two-thirds
#DPA11
40. Doom at 11….
● Nagios had
concerns
● MySQL ran out of
disk space
● Size of WordPress
DB tripled in two
weeks
#DPA11
41. SELECT option_name FROM wp_190_options WHERE option_name LIKE
"displayed_gallery%";
...
| displayed_gallery_rendering_ffffb5e48845fbb7b3347244f8aa06d4 |
| displayed_gallery_rendering_ffffd6d9f2ab40195295c70f775b0ee8 |
| displayed_gallery_rendering_ffffe1416b8d969e25ec7a6094282bbe |
| displayed_gallery_rendering_ffffe8e4a0c399605f434bd51be2d9d7 |
+--------------------------------------------------------------+
722141 rows in set (2.28 sec)
Pretty terminal dumps?
#DPA11
42. …Salvation at Noon
● The Google Mini found something terrible
lurking in club websites
● NextGEN Gallery bug caused near-endless
crawl by the mini
● Code bug meant the cache never expired
#DPA11
43. Outcomes
● NextGEN Gallery has stability issues
● Listen to Nagios
● It’s turtles all the way down
#DPA11
44. Attack of the Python
Script
● Site: WordPress
● Tools: Nagios, Splunk
● Outcome: Quickly
identified source of
massive load event
#DPA11
45. Traffic Jam!
● Load on a server
spiked at 800%
● Seemed bad
● Nagios had more
concerns
#DPA11
46. Hello there!
● Splunk real-time monitoring revealed top
client IPs
● We’re very popular with a misconfigured IIS
Server in Oregon and its “Python-urllib/3.4”
script
#DPA11
47. Outcomes
● Banned the IP on
the proxy
● Began developing
rate-limiting rules
for OSSEC
#DPA11
49. Bughunting on the cheap
W3C Link Checker
● Reports on broken links to a specified depth
● http://validator.w3.org/checklink
Google Webmaster Tools
● Details on broken links and server errors
● https://www.google.com/webmasters/tools/
#DPA11
50. More options
● Bureau of Internet Accessibility
○ Cheaper than Siteimprove
○ Broken link and accessibility reports
○ http://www.boia.org
● Google Analytics
○ Identify high-traffic broken pages
○ http://google.com/analytics
● vim | grep
○ Eyeballing your logs can’t hurt
#DPA11
52. Did we really fix all those errors?
Or is logging broken?
#DPA11
53. Takeaways
● Data are free
● Bugs are hard to find
● Reports are expensive
● Good reports make finding bugs easy
● You can improve your site without rebuilding
it from scratch
● You will find more bugs than you can fix
#DPA11