SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
BP108 Worst Practices…. Orlando we have a
problem
Paul Mooney | Senior Architect, Bluewave
Bill Buchan | Directory, HADSL




                           1
Stand for the pledge....




2
I

    state your name




3
Pledge solemnly




4
To fill out the evaluation for this session




      5
And give excellent evaluations




 6
No matter how hungover Bill is




 7
And further more....




8
I shall wear a yellow banana costume to the
   closing session and memorise the legal
   disclaimer at the end of every session.




       9
So say we all :)




10
BP108 Worst Practices…. Orlando we have a
problem
Paul Mooney | Senior Architect, Bluewave
Bill Buchan | Directory, HADSL




                          11
Whoarewe?
●   Bill Buchan
     ▬   Supreme commander
     ▬   HADSL
     ▬   We make a very good product


●   Paul Mooney
     ▬   Lowly admin guy
     ▬   Bluewave Technology
     ▬   Does a lot of Notes stuff




                  12
What did we do this weekend?




         13
What did we do this weekend?




         14
What did we do this weekend?




         15
What did we do this weekend?




         16
What is this session about?
●   Mistakes are made by everyone
     ▬   How do you deal with them?
     ▬   Blame?
     ▬   Ignore?
     ▬   Denial?


●   Large or small
     ▬   Enterprise or SMB


●   Know your environment
     ▬   Especially if you inherit it


●   Prevention beats the hell out of the cure
     ▬   As you will see

                   17
Other titles for this session


“    Worst Practices. You can’t fix stupid”
Paul Mooney


               “    Worst Practices. Because you’re worth it”
               Duffbert



                          “    Worst Practices. It’s back and its pi$$ed”
                          Bill




                18
Other titles for this session

“    Worst Practices. The investigation into non IBM caused software
related situations that may or may not of caused momentary challenges to
the strategic paradigm in the collaborated community of the smarter live e-
planet which is not affiliated, related or liable to IBM, any subsidiary, partner,
software, product, vendor, technology, hardware, gadget, race, colour,
creed, gender, life form, non life-form, past, present or future...




Express”




                    19
Agenda
●   For each case study, we shall
     ▬   Look at the errors
     ▬   Diagnose the problem
     ▬   Determine the problem
     ▬   How was it resolved
     ▬   What lessons can be learned
●   We have 10 case studies
●   We cover both infrastructure and development…
●   All new
●   All true
     ▬   Seriously, you can’t make these up....




                  20
Case Studies
●   In case of emergency        ●   Readers and Authors
●   Out of employment               Unpeeled
●   “Sure, thats an easy        ●   Workspace Blues
    change”                     ●   Web Views from Hell
●   Mobile madness              ●   Discovered Check
●   DAOS done dumbly            ●   Lotusknows lotusnotes




                   Shorties..




             21
1 In case of emergency
●   The Story…
     ▬   A large corporate environment
     ▬   Moved to managed service
     ▬   Single data centre
     ▬   Mother of all SANs
            ▬  Domino/Citrix/Notes
     ▬   Excellent contingency documentation
           ▬  Full procedures documented
     ▬   Data centre shuts down over weekend
           ▬   Everything is down




                  22
1 In case of emergency
●   The Cause
     ▬   New data centre
     ▬   Full data centre air conditioning
     ▬   Power for DC air conditioning
           ▬   Linked to power system for building air conditioning
     ▬   Weekend
           ▬   Fire alarm tripped
           ▬   False alarm
           ▬  Automated system
           ▬  Shuts off air conditioning
           ▬  INCLUDING DATA CENTRE
     ▬   Things get warm in there
           ▬  SAN did not shut down from heat
           ▬  Hard crash of SAN




                  23
1 In case of emergency
●   The Investigation and Resolution
     ▬   All SME’s called to site
     ▬   Gentle power up of data centre
     ▬   Documentation for restart of SAN and primary applications
     ▬   Bring everything back online.. slowly
            ▬  Damage to backup system and drives
     ▬   Full test of systems/applications/user access
     ▬   Restart everything again
     ▬   Check logs
●   This took two days
     ▬   Why?




                 24
1 In case of emergency
●   Lessons learned
     ▬   You get what you pay for
           ▬  Full documentation on restore process is excellent
           ▬  Everything proceduralised for this situation in documentation
           ▬  Presented to customer before implementation of Data Centre
           ▬  Called the “redbook”
     ▬   Customer kept the redbook
           ▬  Saved as a PDF on the SAN
           ▬  Backup not available




                  25
2. Readers and Authors unpeeled
●   The Story…
     ▬   Very large customer
     ▬   Mission critical evaluation system
     ▬   Lots of reader fields
     ▬   And one day, lots of documents disappeared
●   The Investigation
     ▬   Analysed replication history, logs, etc.
●   Gotcha!
     ▬   Only one user had a large number of deletions..
     ▬   Did you delete 1,565 documents ?
           ▬   No....




                   26
2. Readers and Authors unpeeled
●   Cause
     ▬   He was impatient and didn't want to wait for the hub to replicate with his server...
           ▬ He had manually replicated between a hub based server, and his normal server
           ▬ It had used his credentials
           ▬ It had removed all the documents he didn’t have access to
●   Resolution
     ▬   Clear replica history and re-replicate
           ▬  Thankfully it was caught quickly
     ▬   Shoot the user
     ▬   Fix the access control on the servers




                  27
2. Readers and Author fields unpeeled
●   Lessons learned
     ▬   Reader/Author fields are dangerous. Like scissors. Don’t run with them
     ▬   Don’t let users replicate between servers
●   And while we’re at it (and you’ve heard all this before)
     ▬   Always use canonicalised names in a reader, author or names field.
           ▬   Don’t use abbreviated or common names for hierarchical users
     ▬   Multiple entries in a field require use of a multi-value field!
           ▬  Well, d’oh!
     ▬   Always create a database role or group entry for the servers
     ▬   Anytime you want to change them
           ▬  Stand well back
           ▬   Test, test and test again
           ▬   Don’t run it on live..
     ▬   Full Access Administrator is your friend




                  28
3 Out of employment
●   The Story…
     ▬   18,000 user site
     ▬   Limited administrators / developers
     ▬   Developer goes on leave Friday evening
     ▬   Monday morning
            ▬  The developer has received 6000 emails from different people
            ▬  Help desk going nuts with calls about vacations
●   The Investigation
     ▬   Hands off everything
     ▬   Review support calls
     ▬   Review log file
     ▬   Numerous agents executing
     ▬   Opened a mail file




                 29
3 Out of employment
●   Gotcha!
     ▬   Out of office agent enabled
     ▬   In everyone’s mail file
     ▬   By one developer
●   The Cause
     ▬   The developer was working
     ▬   Call from Manager
           ▬   He forgot to enable OOO
           ▬   Asked dev to enable for him
     ▬   Dev enabled OOO
           ▬   In template (person was working there)
     ▬   Design task ran
           ▬   Everyone’s OOO enabled




                  30
3 Out of employment
●   Resolution
     ▬   Disable OOO in template
     ▬   Run the design task
     ▬   Inform users
     ▬   Address emails


●   Lessons learned
     ▬   “Quick fixes for users are bad ideas”
           ▬  Call logging system
     ▬   Template modification in production domain is dangerous
           ▬  Release management systems?
           ▬  Application template management?
     ▬   Just because you can, doesn’t mean you should
           ▬   The Boss should of logged a support call




                 31
4. Workspace blues
●   The Story…
     ▬   A multinational, multibillion pound approval system
     ▬   About a heartbeat before a huge deal closes...
           ▬  All the users are locked out of the production web application
●   The Investigation
     ▬   Check logs
     ▬   Check http stack
     ▬   Check change control
     ▬   Check security


●   Gotcha!
     ▬   The Database Access Control ‘Advanced’ setting ‘Maximum Internet Access’ had been
         set to ‘none’




                  32
4. Workspace blues
●   Cause
     ▬   Release team were deploying a new test application into the test environment
     ▬   The developer’s workspace had became corrupted
     ▬   The icon for the ‘test’ environment was mistakenly pointing at Production
     ▬   The same User ID had ‘god’ access in Test and Production
     ▬   Changes being made to “test” were actually being made to production
●   Resolution
     ▬   DONT use the same ID for all environments.
     ▬   For release, use ID’s that only work in the target environment
●   Lessons learned
     ▬   Shoot the Admin / Developer (we can’t agree on this)
     ▬   Shoot both?
     ▬   Don't rely on ‘Workspace’ icons. Workspace, even in 8.5.1, becomes corrupt surprisingly
         easily




                  33
5 That’s an easy change
●   The Story…
     ▬   24,000 global user site
           ▬  Centralised management of domino domain
           ▬  Dynamic, cutting edge site
     ▬   Acquire new companies frequently
           ▬  Mandatory migration to domino
     ▬   One day
           ▬  VAST majority of inbound internet email fails
           ▬  SMTP Domain errors




                  34
5 That’s an easy change
●   The Investigation
     ▬   Hands off everything
     ▬   Check DNS
     ▬   Check MX records
     ▬   Review support calls
           ▬  Any commonalities in the domain suffix?
     ▬   Check delivery failures
           ▬  Delivery failures in Domino excellent
     ▬   Ask for change control logs
     ▬   Review router logs
     ▬   Check SMTP servers
           ▬   Enhance SMTP logging
●   Gotcha
     ▬   Checked the global domain document




                  35
36
37
38
39
40
41
42
43
It replicates to the SMTP server.......




          44
45
5 That’s an easy change
●   The Cause
     ▬   The administrator was asked to add a new smtp domain to the global domain document
           ▬  This means the domino server will accept messages addressed to that domain
     ▬   He did
           ▬  Accidentally selected all the other alternate domains
           ▬  added “new” domain
           ▬  Saved
           ▬  Router reloads configuration
           ▬  Stops accepting inbound email for other domains




                 46
5 That’s an easy change
●   Resolution
     ▬   Add in the alternate domains again
           ▬  Do this on SMTP inbound server if under pressure
           ▬  Tell router update configuration
           ▬  Mails start to come inbound again
           ▬  Shoot administrator


●   Lessons learned
           ▬   Limit access to names.nsf
           ▬   CHANGE CONTROL
           ▬   Procedure your changes
           ▬   Test processes




                  47
6. Web views from Hell
●   The Story…
     ▬   A large, mandatory tracking system, web clients
     ▬   Users would wait 10-20 minutes to open a view,
            ▬  10-20 minutes to page forward, etc
     ▬   Servers were on their knees
●   The Investigation
     ▬   The web application used a notes agent to construct the client view
     ▬   It generated XML for the entire view, and passed this back




                  48
6. Web views from Hell
●   Gotcha!
     ▬   Each page being passed back was over 4mb in size!
●   Cause
     ▬   The view contained every view entry, despite the client only being able to display the
         first 100 or so
     ▬   The developer, in testing, only had 100 or so documents. The target system had
         100,000+ documents
●   Resolution
     ▬   Fix the views
            ▬  Use AJAX from the client to call ‘ReadViewEntries’ and render properly
●   Lessons learned
     ▬   XML is great - when used correctly.
           ▬   When used badly, it can perform as badly as ‘old’ technology




                  49
7 Mobile madness
●   The Story…
     ▬   Small customer site
     ▬   Implemented Lotus Traveler (1st release)
     ▬   All is well
     ▬   Person loses their windows mobile device
     ▬   Reports it to IT
     ▬   IT said they will disable account
●   Next day
     ▬   Person’s address book “empties”
     ▬   New contacts start appearing
           ▬   People the customer does not know




                  50
7 Mobile madness
●   The Investigation
     ▬   Check Log.nsf
     ▬   Check traveler configuration
●   Gotcha
     ▬   Account still active on traveler
     ▬   Mobile device in use!
     ▬   BUT - The admin guy reset the password?!




                  51
7 Mobile madness
●   The Cause
     ▬   The administrator was asked to “deal with losing mobile device”
     ▬   No device wipe at the time on WMD
     ▬   Reset the internet password to “password” to lock out account
           ▬  Problem - the last password was “password”
           ▬  Account remained active
     ▬   Meanwhile.....
           ▬  Thief ignores traveler
           ▬   Deletes all the contacts on the device
           ▬   This syncs with the contacts in the mail file
           ▬   This syncs with the pnab
           ▬   Using “Synchronise contacts feature”




                  52
7 Mobile madness
●   Unusual resolution
     ▬   One of the “new” entries is DAD.. with mobile number
     ▬   The customer KNEW the dad
     ▬   Customer calls the mobile number
     ▬   A conversation is had
     ▬   Mobile device returned... with apology
●   Lessons learned
     ▬   Incident lead password resets need to be more complex
     ▬   Password reset policies need to be implemented in your domain
     ▬   Complexity levels need to ben implemented in your domain




                 53
8. Discovered Check
●   The Story…
     ▬   A huge organisation tracking client confidential information
     ▬   Being moved from one part of the organisation to another
     ▬   One side outsourced to a service provider
     ▬   System was migrated, but some agents consistently failed
●   The Investigation
     ▬   The agents’ code was examined in some depth - no bugs found
     ▬   Systems were examined in detail (where available)
            ▬  No obvious misconfigurations found
     ▬   Finally, a copy of the system was ran in isolation...




                  54
8. Discovered Check
●   Gotcha!
     ▬   The agents were timing out
●   Cause
     ▬   The agents were designed for a run-time of 1 hour, not the 10 or 15 minutes default
     ▬   The old system was set to a 1 hour timeout
     ▬   The old system didn’t kill agents
●   Resolution
     ▬   Recoded agents to run in 10 or 15 minute time windows




                  55
8. Discovered Check
●   Lessons learned
     ▬   Every time a system changes ownership, information is lost
     ▬   In this case, an assumption about server setup - each server had to set agent timeout to
         an hour
     ▬   Institute proper logging for all agents
           ▬ Start time, end time, all run-time errors, terminations event
           ▬ In this case, no end-time comments led to the assumption that the terminated
             agent completed successfully
     ▬   Document - in the agents - that they require a non-standard environment
     ▬   Test, test, and test again, in a representative test environment
           ▬   Same hardware, same documents, same views, same versions




                  56
9 DAOS done dumbly
●   The Story…
     ▬   Mid size customer site in Europe
     ▬   Desperately low on free disk space
     ▬   Eagerly awaiting DAOS to be released
     ▬   Implemented in in January 2009
           ▬  It was out only a few days
     ▬   Few months later OS failure
     ▬   Box to be rebuilt
     ▬   Customers start to complain opening attachments




                 57
9 DAOS done dumbly
●   The Investigation
     ▬   Review the error message
     ▬   DAOS .nlo related
     ▬   Review daoscatalog database
     ▬   Query for missing .NLO files
●   Gotcha
     ▬   ALL the .NLO’s missing




                 58
9 DAOS done dumbly
●   The cause
     ▬   When doing rebuild
     ▬   Administrator restored all .nsf files from the previous night’s backup
     ▬   Nightly backups did not include the DAOS directory
     ▬   Monthly backups did not include DAOS directory
           ▬  No copy of attachments
●   Resolution
     ▬   Replica copies of mail files from DR server
     ▬   If there were no replicas of these .nsf files?
             ▬  Prayer?




                  59
9 DAOS done dumbly
●   Lessons learned
     ▬   Be aware of what DAOS is
     ▬   Fantastic tool
           ▬  Very new
           ▬  Different container model
     ▬   Understand DAOS and backups
     ▬   Understand any new technology before you implement it
     ▬   COMMUNICATE
           ▬  Did you tell your backup team about DAOS
     ▬   Disaster recovery tests?
            ▬ Complete after every major system change/upgrade
            ▬ AND, on schedule too!




                 60
10. LotusKnows lotusnotes
●   The Story…
     ▬   Very Large financial organisation using Lotus Notes for everything
     ▬   Request for a new expense approval application
     ▬   Remote developer sends in the application for deployment
     ▬   1 year later... SERIOUS security breaches noticed.
            ▬ SERIOUS INVESTIGATION
            ▬ SERIOUS FINGER-POINTING
            ▬ Who approved what?
●   The Investigation
     ▬   Check database access
     ▬   Check reader fields/author signatures
     ▬   Check ACL
     ▬   Some expense claims appear to be forged. The client is blaming Notes Security.
●   Gotcha
     ▬   We reviewed some expense reports...
     ▬   Expenses being submitted, approved, closed in sub - 5 minutes!

                  61
10. Lotusknows lotusnotes
●   The cause
     ▬   We reviewed the customer site
     ▬   We found a H: drive directory called H:notesid files
     ▬   We found a text file which stated all passwords were ‘lotusnotes’
     ▬   People were switching IDs and approving their own expenses
     ▬   People were also deleting the submission/approval application emails
●   The Resolution
     ▬   Stop laughing
     ▬   Shoot the admin
     ▬   Enforce password changes
●   Lessons Learnt
     ▬   Security is holistic. It all is connected. There is no point in creating the worlds most
         secure safe, and then writing the combination on the door.




                  62
Shorties...
●   Don’t have two domino domains with the same name in your
    organisation
●   Check the code in you directory
     ▬   Is there anything that shouldn’t be there
●   Using shared mail in 2010 should be illegal
●   Dont keep “encrypted” text in clear text
●   Remember to renew domains
     ▬   We didn’t




                  63
Summary
●   Everyone makes mistakes
●   Put systems in place to prevent the obvious ones
●   Its how you deal with them that makes you professional
     ▬   No Blame Culture
     ▬   Admit soon, Admit well…
●   Major system disasters
     ▬   Sometimes cant be prevented
     ▬   Are usually the combination of many small errors


●   Learn from these mistakes!




                 64
Thank you


●   Paul Mooney (pmooney@pmooney.net)
●   Bluewave
●   pmooney.net / bluewavegroup.eu


●   Bill Buchan   (bill@billbuchan.com)
●   hadsl
●   billbuchan.com / hadsl.com




             65

Mais conteúdo relacionado

Destaque

Entwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscriptEntwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscriptBill Buchan
 
Dev buchan 30 proven tips
Dev buchan 30 proven tipsDev buchan 30 proven tips
Dev buchan 30 proven tipsBill Buchan
 
Speed geeking-lotusscript
Speed geeking-lotusscriptSpeed geeking-lotusscript
Speed geeking-lotusscriptBill Buchan
 
Webinar developing contacts
Webinar developing contactsWebinar developing contacts
Webinar developing contactsBob Kaplitz
 
Lotusphere 2008 - Jumpstart 206 - Web Services Bootcamp
Lotusphere 2008 - Jumpstart 206 - Web Services BootcampLotusphere 2008 - Jumpstart 206 - Web Services Bootcamp
Lotusphere 2008 - Jumpstart 206 - Web Services BootcampBill Buchan
 
Nllug 2010-web-services
Nllug 2010-web-servicesNllug 2010-web-services
Nllug 2010-web-servicesBill Buchan
 
Lotusphere 2009 The 11 Commandments
Lotusphere 2009 The 11 CommandmentsLotusphere 2009 The 11 Commandments
Lotusphere 2009 The 11 CommandmentsBill Buchan
 
Scotlands broadband
Scotlands broadbandScotlands broadband
Scotlands broadbandBill Buchan
 
Marykirk Raft Race Agm
Marykirk Raft Race AgmMarykirk Raft Race Agm
Marykirk Raft Race AgmBill Buchan
 
Dev buchan everything you need to know about agent design
Dev buchan everything you need to know about agent designDev buchan everything you need to know about agent design
Dev buchan everything you need to know about agent designBill Buchan
 
Dev buchan best practices
Dev buchan best practicesDev buchan best practices
Dev buchan best practicesBill Buchan
 
iLug 2008 Web Services
iLug 2008 Web ServicesiLug 2008 Web Services
iLug 2008 Web ServicesBill Buchan
 
Entwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscriptEntwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscriptBill Buchan
 
Entwickercamp - Development for Administrators
Entwickercamp - Development for AdministratorsEntwickercamp - Development for Administrators
Entwickercamp - Development for AdministratorsBill Buchan
 
Admin2012 buchan web_services-v101
Admin2012 buchan web_services-v101Admin2012 buchan web_services-v101
Admin2012 buchan web_services-v101Bill Buchan
 
How to Survive Multi-Device User Interface Design with UIML
How to Survive Multi-Device User Interface Design with UIMLHow to Survive Multi-Device User Interface Design with UIML
How to Survive Multi-Device User Interface Design with UIMLJo Vermeulen
 
Everything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus scriptEverything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus scriptBill Buchan
 
ALLTHINGSTuscany - Digital Tools & Storytelling per la Toscana
ALLTHINGSTuscany - Digital Tools & Storytelling per la ToscanaALLTHINGSTuscany - Digital Tools & Storytelling per la Toscana
ALLTHINGSTuscany - Digital Tools & Storytelling per la ToscanaCostanza Giovannini
 

Destaque (20)

Entwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscriptEntwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscript
 
Ad505 dev blast
Ad505 dev blastAd505 dev blast
Ad505 dev blast
 
Dev buchan 30 proven tips
Dev buchan 30 proven tipsDev buchan 30 proven tips
Dev buchan 30 proven tips
 
Speed geeking-lotusscript
Speed geeking-lotusscriptSpeed geeking-lotusscript
Speed geeking-lotusscript
 
Webinar developing contacts
Webinar developing contactsWebinar developing contacts
Webinar developing contacts
 
Lotusphere 2008 - Jumpstart 206 - Web Services Bootcamp
Lotusphere 2008 - Jumpstart 206 - Web Services BootcampLotusphere 2008 - Jumpstart 206 - Web Services Bootcamp
Lotusphere 2008 - Jumpstart 206 - Web Services Bootcamp
 
Nllug 2010-web-services
Nllug 2010-web-servicesNllug 2010-web-services
Nllug 2010-web-services
 
Lotusphere 2009 The 11 Commandments
Lotusphere 2009 The 11 CommandmentsLotusphere 2009 The 11 Commandments
Lotusphere 2009 The 11 Commandments
 
Scotlands broadband
Scotlands broadbandScotlands broadband
Scotlands broadband
 
Marykirk Raft Race Agm
Marykirk Raft Race AgmMarykirk Raft Race Agm
Marykirk Raft Race Agm
 
Dev buchan everything you need to know about agent design
Dev buchan everything you need to know about agent designDev buchan everything you need to know about agent design
Dev buchan everything you need to know about agent design
 
Dev buchan best practices
Dev buchan best practicesDev buchan best practices
Dev buchan best practices
 
iLug 2008 Web Services
iLug 2008 Web ServicesiLug 2008 Web Services
iLug 2008 Web Services
 
Entwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscriptEntwicker camp2007 calling-the-c-api-from-lotusscript
Entwicker camp2007 calling-the-c-api-from-lotusscript
 
Entwickercamp - Development for Administrators
Entwickercamp - Development for AdministratorsEntwickercamp - Development for Administrators
Entwickercamp - Development for Administrators
 
Admin2012 buchan web_services-v101
Admin2012 buchan web_services-v101Admin2012 buchan web_services-v101
Admin2012 buchan web_services-v101
 
How to Survive Multi-Device User Interface Design with UIML
How to Survive Multi-Device User Interface Design with UIMLHow to Survive Multi-Device User Interface Design with UIML
How to Survive Multi-Device User Interface Design with UIML
 
Event goes social
Event goes socialEvent goes social
Event goes social
 
Everything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus scriptEverything you ever wanted to know about lotus script
Everything you ever wanted to know about lotus script
 
ALLTHINGSTuscany - Digital Tools & Storytelling per la Toscana
ALLTHINGSTuscany - Digital Tools & Storytelling per la ToscanaALLTHINGSTuscany - Digital Tools & Storytelling per la Toscana
ALLTHINGSTuscany - Digital Tools & Storytelling per la Toscana
 

Semelhante a Lotusphere 2008 Worst practices

Gearman: A Job Server made for Scale
Gearman: A Job Server made for ScaleGearman: A Job Server made for Scale
Gearman: A Job Server made for ScaleMike Willbanks
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Kris Buytaert
 
Jonathon Rochelle @ FOWA Feb 07
Jonathon Rochelle @ FOWA Feb 07Jonathon Rochelle @ FOWA Feb 07
Jonathon Rochelle @ FOWA Feb 07carsonsystems
 
Continuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritageContinuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritageRan Levy
 
Devops, Secops, Opsec, DevSec *ops *.* ?
Devops, Secops, Opsec, DevSec *ops *.* ?Devops, Secops, Opsec, DevSec *ops *.* ?
Devops, Secops, Opsec, DevSec *ops *.* ?Kris Buytaert
 
Teaching Open Source In The University
Teaching Open Source In The UniversityTeaching Open Source In The University
Teaching Open Source In The UniversityDominique Cimafranca
 
7 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM20157 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM2015Francesco Degrassi
 
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Demi Ben-Ari
 
Devops Devops Devops, at Froscon
Devops Devops Devops, at FrosconDevops Devops Devops, at Froscon
Devops Devops Devops, at FrosconKris Buytaert
 
Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowKathy Brown
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIURohit Jnagal
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopRoman Nikitchenko
 
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good ServerICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good ServerSerdar Basegmez
 
Taking the plunge: Why you should use new technology on client projects
Taking the plunge: Why you should use new technology on client projectsTaking the plunge: Why you should use new technology on client projects
Taking the plunge: Why you should use new technology on client projectsTommy Ferry
 
Administrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA HumAdministrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA HumAtlassian
 
Administrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA HumAdministrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA HumAtlassian
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipsterJulien Dubois
 

Semelhante a Lotusphere 2008 Worst practices (20)

Gearman: A Job Server made for Scale
Gearman: A Job Server made for ScaleGearman: A Job Server made for Scale
Gearman: A Job Server made for Scale
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
Devops For Drupal
Devops  For DrupalDevops  For Drupal
Devops For Drupal
 
Jonathon Rochelle @ FOWA Feb 07
Jonathon Rochelle @ FOWA Feb 07Jonathon Rochelle @ FOWA Feb 07
Jonathon Rochelle @ FOWA Feb 07
 
Continuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritageContinuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritage
 
Devops, Secops, Opsec, DevSec *ops *.* ?
Devops, Secops, Opsec, DevSec *ops *.* ?Devops, Secops, Opsec, DevSec *ops *.* ?
Devops, Secops, Opsec, DevSec *ops *.* ?
 
Teaching Open Source In The University
Teaching Open Source In The UniversityTeaching Open Source In The University
Teaching Open Source In The University
 
7 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM20157 lessons learned building high availability / performance systems - CM2015
7 lessons learned building high availability / performance systems - CM2015
 
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
 
Devops Devops Devops, at Froscon
Devops Devops Devops, at FrosconDevops Devops Devops, at Froscon
Devops Devops Devops, at Froscon
 
Top Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To KnowTop Tips Every Notes Developer Needs To Know
Top Tips Every Notes Developer Needs To Know
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
 
Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with Hadoop
 
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good ServerICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
ICONUK 2016: Back From the Dead: How Bad Code Kills a Good Server
 
Taking the plunge: Why you should use new technology on client projects
Taking the plunge: Why you should use new technology on client projectsTaking the plunge: Why you should use new technology on client projects
Taking the plunge: Why you should use new technology on client projects
 
Administrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA HumAdministrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA Hum
 
Administrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA HumAdministrivia: Golden Tips for Making JIRA Hum
Administrivia: Golden Tips for Making JIRA Hum
 
Devoxx : being productive with JHipster
Devoxx : being productive with JHipsterDevoxx : being productive with JHipster
Devoxx : being productive with JHipster
 
Headless Android
Headless AndroidHeadless Android
Headless Android
 

Mais de Bill Buchan

Dummies guide to WISPS
Dummies guide to WISPSDummies guide to WISPS
Dummies guide to WISPSBill Buchan
 
WISP for Dummies
WISP for DummiesWISP for Dummies
WISP for DummiesBill Buchan
 
WISP Worst Practices
WISP Worst PracticesWISP Worst Practices
WISP Worst PracticesBill Buchan
 
Marykirk raft race presentation night 2014
Marykirk raft race presentation night 2014Marykirk raft race presentation night 2014
Marykirk raft race presentation night 2014Bill Buchan
 
Dev buchan leveraging
Dev buchan leveragingDev buchan leveraging
Dev buchan leveragingBill Buchan
 
Dev buchan leveraging the notes c api
Dev buchan leveraging the notes c apiDev buchan leveraging the notes c api
Dev buchan leveraging the notes c apiBill Buchan
 
Reporting on your domino environment v1
Reporting on your domino environment v1Reporting on your domino environment v1
Reporting on your domino environment v1Bill Buchan
 
Admin camp 2011-domino-sso-with-ad
Admin camp 2011-domino-sso-with-adAdmin camp 2011-domino-sso-with-ad
Admin camp 2011-domino-sso-with-adBill Buchan
 
Softsphere 08 web services bootcamp
Softsphere 08 web services bootcampSoftsphere 08 web services bootcamp
Softsphere 08 web services bootcampBill Buchan
 
Connections Lotusphere Worst Practices 2013
Connections Lotusphere Worst Practices 2013Connections Lotusphere Worst Practices 2013
Connections Lotusphere Worst Practices 2013Bill Buchan
 
Lotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScript
Lotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScriptLotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScript
Lotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScriptBill Buchan
 
Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...
Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...
Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...Bill Buchan
 
Identity management delegation and automation
Identity management delegation and automationIdentity management delegation and automation
Identity management delegation and automationBill Buchan
 
Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systemsBill Buchan
 

Mais de Bill Buchan (16)

Dummies guide to WISPS
Dummies guide to WISPSDummies guide to WISPS
Dummies guide to WISPS
 
WISP for Dummies
WISP for DummiesWISP for Dummies
WISP for Dummies
 
WISP Worst Practices
WISP Worst PracticesWISP Worst Practices
WISP Worst Practices
 
Marykirk raft race presentation night 2014
Marykirk raft race presentation night 2014Marykirk raft race presentation night 2014
Marykirk raft race presentation night 2014
 
Dev buchan leveraging
Dev buchan leveragingDev buchan leveraging
Dev buchan leveraging
 
Dev buchan leveraging the notes c api
Dev buchan leveraging the notes c apiDev buchan leveraging the notes c api
Dev buchan leveraging the notes c api
 
Bp301
Bp301Bp301
Bp301
 
Ad507
Ad507Ad507
Ad507
 
Reporting on your domino environment v1
Reporting on your domino environment v1Reporting on your domino environment v1
Reporting on your domino environment v1
 
Admin camp 2011-domino-sso-with-ad
Admin camp 2011-domino-sso-with-adAdmin camp 2011-domino-sso-with-ad
Admin camp 2011-domino-sso-with-ad
 
Softsphere 08 web services bootcamp
Softsphere 08 web services bootcampSoftsphere 08 web services bootcamp
Softsphere 08 web services bootcamp
 
Connections Lotusphere Worst Practices 2013
Connections Lotusphere Worst Practices 2013Connections Lotusphere Worst Practices 2013
Connections Lotusphere Worst Practices 2013
 
Lotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScript
Lotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScriptLotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScript
Lotusphere 2007 BP301 Advanced Object Oriented Programming for LotusScript
 
Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...
Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...
Lotusphere 2007 AD507 Leveraging the Power of Object Oriented Programming in ...
 
Identity management delegation and automation
Identity management delegation and automationIdentity management delegation and automation
Identity management delegation and automation
 
Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
 

Lotusphere 2008 Worst practices

  • 1. BP108 Worst Practices…. Orlando we have a problem Paul Mooney | Senior Architect, Bluewave Bill Buchan | Directory, HADSL 1
  • 2. Stand for the pledge.... 2
  • 3. I state your name 3
  • 5. To fill out the evaluation for this session 5
  • 6. And give excellent evaluations 6
  • 7. No matter how hungover Bill is 7
  • 9. I shall wear a yellow banana costume to the closing session and memorise the legal disclaimer at the end of every session. 9
  • 10. So say we all :) 10
  • 11. BP108 Worst Practices…. Orlando we have a problem Paul Mooney | Senior Architect, Bluewave Bill Buchan | Directory, HADSL 11
  • 12. Whoarewe? ● Bill Buchan ▬ Supreme commander ▬ HADSL ▬ We make a very good product ● Paul Mooney ▬ Lowly admin guy ▬ Bluewave Technology ▬ Does a lot of Notes stuff 12
  • 13. What did we do this weekend? 13
  • 14. What did we do this weekend? 14
  • 15. What did we do this weekend? 15
  • 16. What did we do this weekend? 16
  • 17. What is this session about? ● Mistakes are made by everyone ▬ How do you deal with them? ▬ Blame? ▬ Ignore? ▬ Denial? ● Large or small ▬ Enterprise or SMB ● Know your environment ▬ Especially if you inherit it ● Prevention beats the hell out of the cure ▬ As you will see 17
  • 18. Other titles for this session “ Worst Practices. You can’t fix stupid” Paul Mooney “ Worst Practices. Because you’re worth it” Duffbert “ Worst Practices. It’s back and its pi$$ed” Bill 18
  • 19. Other titles for this session “ Worst Practices. The investigation into non IBM caused software related situations that may or may not of caused momentary challenges to the strategic paradigm in the collaborated community of the smarter live e- planet which is not affiliated, related or liable to IBM, any subsidiary, partner, software, product, vendor, technology, hardware, gadget, race, colour, creed, gender, life form, non life-form, past, present or future... Express” 19
  • 20. Agenda ● For each case study, we shall ▬ Look at the errors ▬ Diagnose the problem ▬ Determine the problem ▬ How was it resolved ▬ What lessons can be learned ● We have 10 case studies ● We cover both infrastructure and development… ● All new ● All true ▬ Seriously, you can’t make these up.... 20
  • 21. Case Studies ● In case of emergency ● Readers and Authors ● Out of employment Unpeeled ● “Sure, thats an easy ● Workspace Blues change” ● Web Views from Hell ● Mobile madness ● Discovered Check ● DAOS done dumbly ● Lotusknows lotusnotes Shorties.. 21
  • 22. 1 In case of emergency ● The Story… ▬ A large corporate environment ▬ Moved to managed service ▬ Single data centre ▬ Mother of all SANs ▬ Domino/Citrix/Notes ▬ Excellent contingency documentation ▬ Full procedures documented ▬ Data centre shuts down over weekend ▬ Everything is down 22
  • 23. 1 In case of emergency ● The Cause ▬ New data centre ▬ Full data centre air conditioning ▬ Power for DC air conditioning ▬ Linked to power system for building air conditioning ▬ Weekend ▬ Fire alarm tripped ▬ False alarm ▬ Automated system ▬ Shuts off air conditioning ▬ INCLUDING DATA CENTRE ▬ Things get warm in there ▬ SAN did not shut down from heat ▬ Hard crash of SAN 23
  • 24. 1 In case of emergency ● The Investigation and Resolution ▬ All SME’s called to site ▬ Gentle power up of data centre ▬ Documentation for restart of SAN and primary applications ▬ Bring everything back online.. slowly ▬ Damage to backup system and drives ▬ Full test of systems/applications/user access ▬ Restart everything again ▬ Check logs ● This took two days ▬ Why? 24
  • 25. 1 In case of emergency ● Lessons learned ▬ You get what you pay for ▬ Full documentation on restore process is excellent ▬ Everything proceduralised for this situation in documentation ▬ Presented to customer before implementation of Data Centre ▬ Called the “redbook” ▬ Customer kept the redbook ▬ Saved as a PDF on the SAN ▬ Backup not available 25
  • 26. 2. Readers and Authors unpeeled ● The Story… ▬ Very large customer ▬ Mission critical evaluation system ▬ Lots of reader fields ▬ And one day, lots of documents disappeared ● The Investigation ▬ Analysed replication history, logs, etc. ● Gotcha! ▬ Only one user had a large number of deletions.. ▬ Did you delete 1,565 documents ? ▬ No.... 26
  • 27. 2. Readers and Authors unpeeled ● Cause ▬ He was impatient and didn't want to wait for the hub to replicate with his server... ▬ He had manually replicated between a hub based server, and his normal server ▬ It had used his credentials ▬ It had removed all the documents he didn’t have access to ● Resolution ▬ Clear replica history and re-replicate ▬ Thankfully it was caught quickly ▬ Shoot the user ▬ Fix the access control on the servers 27
  • 28. 2. Readers and Author fields unpeeled ● Lessons learned ▬ Reader/Author fields are dangerous. Like scissors. Don’t run with them ▬ Don’t let users replicate between servers ● And while we’re at it (and you’ve heard all this before) ▬ Always use canonicalised names in a reader, author or names field. ▬ Don’t use abbreviated or common names for hierarchical users ▬ Multiple entries in a field require use of a multi-value field! ▬ Well, d’oh! ▬ Always create a database role or group entry for the servers ▬ Anytime you want to change them ▬ Stand well back ▬ Test, test and test again ▬ Don’t run it on live.. ▬ Full Access Administrator is your friend 28
  • 29. 3 Out of employment ● The Story… ▬ 18,000 user site ▬ Limited administrators / developers ▬ Developer goes on leave Friday evening ▬ Monday morning ▬ The developer has received 6000 emails from different people ▬ Help desk going nuts with calls about vacations ● The Investigation ▬ Hands off everything ▬ Review support calls ▬ Review log file ▬ Numerous agents executing ▬ Opened a mail file 29
  • 30. 3 Out of employment ● Gotcha! ▬ Out of office agent enabled ▬ In everyone’s mail file ▬ By one developer ● The Cause ▬ The developer was working ▬ Call from Manager ▬ He forgot to enable OOO ▬ Asked dev to enable for him ▬ Dev enabled OOO ▬ In template (person was working there) ▬ Design task ran ▬ Everyone’s OOO enabled 30
  • 31. 3 Out of employment ● Resolution ▬ Disable OOO in template ▬ Run the design task ▬ Inform users ▬ Address emails ● Lessons learned ▬ “Quick fixes for users are bad ideas” ▬ Call logging system ▬ Template modification in production domain is dangerous ▬ Release management systems? ▬ Application template management? ▬ Just because you can, doesn’t mean you should ▬ The Boss should of logged a support call 31
  • 32. 4. Workspace blues ● The Story… ▬ A multinational, multibillion pound approval system ▬ About a heartbeat before a huge deal closes... ▬ All the users are locked out of the production web application ● The Investigation ▬ Check logs ▬ Check http stack ▬ Check change control ▬ Check security ● Gotcha! ▬ The Database Access Control ‘Advanced’ setting ‘Maximum Internet Access’ had been set to ‘none’ 32
  • 33. 4. Workspace blues ● Cause ▬ Release team were deploying a new test application into the test environment ▬ The developer’s workspace had became corrupted ▬ The icon for the ‘test’ environment was mistakenly pointing at Production ▬ The same User ID had ‘god’ access in Test and Production ▬ Changes being made to “test” were actually being made to production ● Resolution ▬ DONT use the same ID for all environments. ▬ For release, use ID’s that only work in the target environment ● Lessons learned ▬ Shoot the Admin / Developer (we can’t agree on this) ▬ Shoot both? ▬ Don't rely on ‘Workspace’ icons. Workspace, even in 8.5.1, becomes corrupt surprisingly easily 33
  • 34. 5 That’s an easy change ● The Story… ▬ 24,000 global user site ▬ Centralised management of domino domain ▬ Dynamic, cutting edge site ▬ Acquire new companies frequently ▬ Mandatory migration to domino ▬ One day ▬ VAST majority of inbound internet email fails ▬ SMTP Domain errors 34
  • 35. 5 That’s an easy change ● The Investigation ▬ Hands off everything ▬ Check DNS ▬ Check MX records ▬ Review support calls ▬ Any commonalities in the domain suffix? ▬ Check delivery failures ▬ Delivery failures in Domino excellent ▬ Ask for change control logs ▬ Review router logs ▬ Check SMTP servers ▬ Enhance SMTP logging ● Gotcha ▬ Checked the global domain document 35
  • 36. 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 44. It replicates to the SMTP server....... 44
  • 45. 45
  • 46. 5 That’s an easy change ● The Cause ▬ The administrator was asked to add a new smtp domain to the global domain document ▬ This means the domino server will accept messages addressed to that domain ▬ He did ▬ Accidentally selected all the other alternate domains ▬ added “new” domain ▬ Saved ▬ Router reloads configuration ▬ Stops accepting inbound email for other domains 46
  • 47. 5 That’s an easy change ● Resolution ▬ Add in the alternate domains again ▬ Do this on SMTP inbound server if under pressure ▬ Tell router update configuration ▬ Mails start to come inbound again ▬ Shoot administrator ● Lessons learned ▬ Limit access to names.nsf ▬ CHANGE CONTROL ▬ Procedure your changes ▬ Test processes 47
  • 48. 6. Web views from Hell ● The Story… ▬ A large, mandatory tracking system, web clients ▬ Users would wait 10-20 minutes to open a view, ▬ 10-20 minutes to page forward, etc ▬ Servers were on their knees ● The Investigation ▬ The web application used a notes agent to construct the client view ▬ It generated XML for the entire view, and passed this back 48
  • 49. 6. Web views from Hell ● Gotcha! ▬ Each page being passed back was over 4mb in size! ● Cause ▬ The view contained every view entry, despite the client only being able to display the first 100 or so ▬ The developer, in testing, only had 100 or so documents. The target system had 100,000+ documents ● Resolution ▬ Fix the views ▬ Use AJAX from the client to call ‘ReadViewEntries’ and render properly ● Lessons learned ▬ XML is great - when used correctly. ▬ When used badly, it can perform as badly as ‘old’ technology 49
  • 50. 7 Mobile madness ● The Story… ▬ Small customer site ▬ Implemented Lotus Traveler (1st release) ▬ All is well ▬ Person loses their windows mobile device ▬ Reports it to IT ▬ IT said they will disable account ● Next day ▬ Person’s address book “empties” ▬ New contacts start appearing ▬ People the customer does not know 50
  • 51. 7 Mobile madness ● The Investigation ▬ Check Log.nsf ▬ Check traveler configuration ● Gotcha ▬ Account still active on traveler ▬ Mobile device in use! ▬ BUT - The admin guy reset the password?! 51
  • 52. 7 Mobile madness ● The Cause ▬ The administrator was asked to “deal with losing mobile device” ▬ No device wipe at the time on WMD ▬ Reset the internet password to “password” to lock out account ▬ Problem - the last password was “password” ▬ Account remained active ▬ Meanwhile..... ▬ Thief ignores traveler ▬ Deletes all the contacts on the device ▬ This syncs with the contacts in the mail file ▬ This syncs with the pnab ▬ Using “Synchronise contacts feature” 52
  • 53. 7 Mobile madness ● Unusual resolution ▬ One of the “new” entries is DAD.. with mobile number ▬ The customer KNEW the dad ▬ Customer calls the mobile number ▬ A conversation is had ▬ Mobile device returned... with apology ● Lessons learned ▬ Incident lead password resets need to be more complex ▬ Password reset policies need to be implemented in your domain ▬ Complexity levels need to ben implemented in your domain 53
  • 54. 8. Discovered Check ● The Story… ▬ A huge organisation tracking client confidential information ▬ Being moved from one part of the organisation to another ▬ One side outsourced to a service provider ▬ System was migrated, but some agents consistently failed ● The Investigation ▬ The agents’ code was examined in some depth - no bugs found ▬ Systems were examined in detail (where available) ▬ No obvious misconfigurations found ▬ Finally, a copy of the system was ran in isolation... 54
  • 55. 8. Discovered Check ● Gotcha! ▬ The agents were timing out ● Cause ▬ The agents were designed for a run-time of 1 hour, not the 10 or 15 minutes default ▬ The old system was set to a 1 hour timeout ▬ The old system didn’t kill agents ● Resolution ▬ Recoded agents to run in 10 or 15 minute time windows 55
  • 56. 8. Discovered Check ● Lessons learned ▬ Every time a system changes ownership, information is lost ▬ In this case, an assumption about server setup - each server had to set agent timeout to an hour ▬ Institute proper logging for all agents ▬ Start time, end time, all run-time errors, terminations event ▬ In this case, no end-time comments led to the assumption that the terminated agent completed successfully ▬ Document - in the agents - that they require a non-standard environment ▬ Test, test, and test again, in a representative test environment ▬ Same hardware, same documents, same views, same versions 56
  • 57. 9 DAOS done dumbly ● The Story… ▬ Mid size customer site in Europe ▬ Desperately low on free disk space ▬ Eagerly awaiting DAOS to be released ▬ Implemented in in January 2009 ▬ It was out only a few days ▬ Few months later OS failure ▬ Box to be rebuilt ▬ Customers start to complain opening attachments 57
  • 58. 9 DAOS done dumbly ● The Investigation ▬ Review the error message ▬ DAOS .nlo related ▬ Review daoscatalog database ▬ Query for missing .NLO files ● Gotcha ▬ ALL the .NLO’s missing 58
  • 59. 9 DAOS done dumbly ● The cause ▬ When doing rebuild ▬ Administrator restored all .nsf files from the previous night’s backup ▬ Nightly backups did not include the DAOS directory ▬ Monthly backups did not include DAOS directory ▬ No copy of attachments ● Resolution ▬ Replica copies of mail files from DR server ▬ If there were no replicas of these .nsf files? ▬ Prayer? 59
  • 60. 9 DAOS done dumbly ● Lessons learned ▬ Be aware of what DAOS is ▬ Fantastic tool ▬ Very new ▬ Different container model ▬ Understand DAOS and backups ▬ Understand any new technology before you implement it ▬ COMMUNICATE ▬ Did you tell your backup team about DAOS ▬ Disaster recovery tests? ▬ Complete after every major system change/upgrade ▬ AND, on schedule too! 60
  • 61. 10. LotusKnows lotusnotes ● The Story… ▬ Very Large financial organisation using Lotus Notes for everything ▬ Request for a new expense approval application ▬ Remote developer sends in the application for deployment ▬ 1 year later... SERIOUS security breaches noticed. ▬ SERIOUS INVESTIGATION ▬ SERIOUS FINGER-POINTING ▬ Who approved what? ● The Investigation ▬ Check database access ▬ Check reader fields/author signatures ▬ Check ACL ▬ Some expense claims appear to be forged. The client is blaming Notes Security. ● Gotcha ▬ We reviewed some expense reports... ▬ Expenses being submitted, approved, closed in sub - 5 minutes! 61
  • 62. 10. Lotusknows lotusnotes ● The cause ▬ We reviewed the customer site ▬ We found a H: drive directory called H:notesid files ▬ We found a text file which stated all passwords were ‘lotusnotes’ ▬ People were switching IDs and approving their own expenses ▬ People were also deleting the submission/approval application emails ● The Resolution ▬ Stop laughing ▬ Shoot the admin ▬ Enforce password changes ● Lessons Learnt ▬ Security is holistic. It all is connected. There is no point in creating the worlds most secure safe, and then writing the combination on the door. 62
  • 63. Shorties... ● Don’t have two domino domains with the same name in your organisation ● Check the code in you directory ▬ Is there anything that shouldn’t be there ● Using shared mail in 2010 should be illegal ● Dont keep “encrypted” text in clear text ● Remember to renew domains ▬ We didn’t 63
  • 64. Summary ● Everyone makes mistakes ● Put systems in place to prevent the obvious ones ● Its how you deal with them that makes you professional ▬ No Blame Culture ▬ Admit soon, Admit well… ● Major system disasters ▬ Sometimes cant be prevented ▬ Are usually the combination of many small errors ● Learn from these mistakes! 64
  • 65. Thank you ● Paul Mooney (pmooney@pmooney.net) ● Bluewave ● pmooney.net / bluewavegroup.eu ● Bill Buchan (bill@billbuchan.com) ● hadsl ● billbuchan.com / hadsl.com 65