Slides that accompanied a three-hour crash training course on sysadmin survival skills useful for sysadmins of Evergreen open source library software. Session led by Don McMorris, Equinox Software.
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Evergreen Sysadmin Survival Skills
1. Evergreen Sysadmin
Survival Skills
A presentation for the
Evergreen International Conference 2009
Don McMorris, Equinox Software Inc.
2. Target Audience
• Personnel who will administer the ILS servers or will provide upper-tier
support in an Evergreen installation, including roles in:
• Hardware
• Operating System
• Network
• Monitoring
• Deep analysis/troubleshooting
3. Assumptions
•Experience in administration of Debian-based system at a console level
• File structure organization
• Configuration file maintenance
• dpkg/apt, CPAN, configure/make/make install, etc.
• Success installing basic Evergreen 1.4 server
• Administration of a medium to large deployment
• A lot of the talk will still be relevant to smaller installs too, though!
4. OpenSRF architecture (handout)
• XMPP Server (via ejabberd)
• Facilitates IPC
• OpenSRF Router
• Tracks what applications are running, and where
• Directs requests to individual service instances
• OpenSRF Service
• Takes requests, performs some type of process, and responds to
the client
• Evergreen consists of several OpenSRF services
• OpenSRF Client
• Sends requests to OpenSRF services
• May be standalone or running as part of service (sub-requests)
7. OpenSRF architecture (handout)
• XMPP Server (via ejabberd)
• Facilitates IPC
• OpenSRF Router
• Tracks what applications are running, and where
• Directs requests to individual service instances
• OpenSRF Service
• Takes requests, performs some type of process, and responds to
the client
• Evergreen consists of several OpenSRF services
• OpenSRF Client
• Sends requests to OpenSRF services
• May be standalone or running as part of service (sub-requests)
8. Core OpenSRF Services
• opensrf.settings
• opensrf.math
• Used to test OpenSRF communication chain
• opensrf.dbmath
• Used to test OpenSRF communication chain and application
database connectivity.
11. Typical Hardware Setup
• Database boxes
• Main RW DB
• RO DBs
• Report server w/RO DB
• N+1 Server bricks, each consisting of:
• One quot;headquot; (ejabberd, apache, open-ils.settings)
• One or more quot;dronesquot; (general application processing)
• A quot;single server brickquot; refers to a single server with ejabberd,
apache, and all applicable applications
• Utility servers
• N+1 SIP servers (may be part of server bricks)
• quot;Batch processingquot; server (fine generator, hold targeter, etc)
• 2X memcache servers
• 2X firewall/Load balancers
• Logger
• Monitoring/alerts, mail, etc.
12. Typical Hardware Setup
• Redundant Ethernet
• All servers have dual NICs bonded in active/failover mode
• Each NIC wired to independent Ethernet switch
• Redundant power (on Critical servers [DB, Memcache, etc])
• Multiple power supplies (2-4)
• Each power supply independently fed
• Separate electrical panel
• Separate UPS system
• Hardware RAID
• Debian Linux
16. Network Architecture
• Border connection(s) come in to pair of load balancer/firewall boxes
• Active/Hot Standby
• Linux HA/Heartbeat used to perform automatic failover
• LVS/ldirector used to query brick heads (ldirectorping.txt)
• Brick is quot;removed from rotationquot; when ldirectorping.txt isn't
found or has invalid contents
• Incoming connections load-balanced round-robin to brick heads
• Connections NAT'd
• LAN consists of 2 separate physical switches
• Each server has one connection to each
• Switches are linked via cross-over cable or similar
• Packet manipulators (such as caching proxies) tend to cause issues
17. Network Architecture
• Border connection(s) come in to pair of load balancer/firewall boxes
• Active/Hot Standby
• Linux HA/Heartbeat used to perform automatic failover
• LVS/ldirector used to query brick heads (ldirectorping.txt)
• Brick is quot;removed from rotationquot; when ldirectorping.txt isn't
found or has invalid contents
• Incoming connections load-balanced round-robin to brick heads
• Connections NAT'd
• LAN consists of 2 separate physical switches
• Each server has one connection to each
• Switches are linked via cross-over cable or similar
• Packet manipulators (such as caching proxies) tend to cause issues
18. Open inbound ports (border)
• 80 HTTP
• 443 HTTPS
• 22 SSH (possibly restricted by source IP)
• 210 Z39.50 server
• 6001 SIP2 RAW (NOTE: SIP communication is plain-text)
19. Logging
• Syslog-ng
• Central logging server
• All logs, including Evergreen, Apache, Postgres, and system
• Typical directory structure:
• Evergreen: /var/log/remote/prod/%Y/%m/%d/file.%H.log
• System: /var/log/remote/sys/$HOSTNAME/%Y/%m/%d/file.%H.log
• gzip nightly
• Off-system archive as necessary
20. Checking the logs
• Identify which log file(s) to look at
• What date and time are you looking for?
• Identify thread trace from logs
DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 312345000098765 osrfsys.10.log
...
2009-04-30 10:54:02 DEMOSYS-BRICK0-DRONE1 open-ils.circ: [INFO:5119:ScriptRunner.pm:60:1241097309492131]
script_runner: circ_permit_hold : Patron=98725, Patron_Username=21234000054577,
Patron_Profile_Group=Patron, Patron_Fines=, Patron_OverdueCount=, Patron_Items_Out=,
Patron_Barcode=21234000054577, Patron_Library=Summer City Public Library Requestor=circ1, Copy=3870901,
Copy_Barcode=312345000098765, Copy_status=Available, Circ_Mod=DVD, Circ_Lib=SUMMER,
Copy_location=Stacks, Item_Owning_lib=4, Volume=5834287, Record=1984545, Is_Renewal: no, Is_Precat: no,
Hold_request_lib=SUMMER, Hold_Pickup_Lib=4
...
• Search for thread trace to get the complete thread
DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep ':1241097309492131]' osrfsys.10.log
• If logs have been gzip'd, 'zgrep' can be used
21. Checking the logs (cont.)
• Checking the postgres logs may be worth while also
DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 312345000098765 pg.10.log
...
2009-04-30 10:56:04 DEMOSYS-DB1 postgres[4298]: [607-7] barcode = '312345000098765', call_number =
5834287, circ_as_type = NULL, circ_lib = 4, circ_modifier = 'DVD', circulate = 't'
...
DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 'postgres[4298]: [607-' pg.10.log
...
• In this case, note we use the app name and PID in addition to the
thread trace
postgres[4298]: [607-7]
•postgres = app name
• 4298 = pid
• 607 = thread trace
• Postgres also uses a sequence ID
• “-7” = “this is the 7th line of the query”
22. Monitoring
• ESI uses Nagios, but looking into additions/alternates
• mrtg + snmp
• Splunk
• Ping
• Free memory
• System Load Average
• > 1.0 sustained average on an app server often indicates a “hung”
process
• Critical processes (apache, jabber, memcache, clark-kent.pl, etc)
• Disk Free Space
23. Monitoring (continued)
• Lock file age
• Fine generator
• Hold targeter
• Backup status
• File age
• WAL file age
• Sync to external backup
• Service timeouts (“Returning NULL” in the gateway logs)
• Bandwidth Usage
• Motherboard sensors
• Third-party ping service
24. Database schema (handout)
• Action
• Circulations, holds, surveys, etc
• Actor
• Org Units (libraries, branches, bookmobiles, etc), users, cards,
workstations
• Asset
• Call numbers (volume), copies (items), statcats
• Auditor
• Audit data (usually org units, users, copies, bib records)
• Authority
• Authority Record data
25. Database schema (continued)
• Biblio
• Bibliographic record data (not including metabib entries)
• Config
• Configuration tables (audiences, bib levels, item forms, circ rules,
etc)
• Container
• Buckets (biblio, call number, item, user)
• Metabib
• Indexed bibliographic record fields
• Money
• Fines, payments, etc.
26. Database schema (continued)
• Offline
• Tables for the offline staff client
• Permission
• Profile groups, user group maps, permission allocations, etc.
• Reporter
• Reports, Report Templates, Report Folders, and Report Views
• Vandelay
• Bib/Authority import/export
27. Cron Jobs
• All:
• ntpdate
• Utility:
• Hold targeter
• Hold thawer
• Fine generator
• Reshelving completer
• Notices/collections
• Backup
• General backup scripts
• rsync
28. Backups
• Central file store
• Large systems may use dedicated server
• SCP/rsync/etc. data to central file store
• Many servers (ex: DB) quot;pushquot; data to backup store
• Backup server sometimes quot;pullsquot; data from servers
• Sync data store to external/removable media
• USB hard drive works great for this
• Encrypting backups (eg: with cryptfs) definitely a good idea
• Rotate backups
• Ideally, every morning
• Off-site
• Secure storage (preferably in a safe designed specifically for
backup media)
29. Conify (with demo)
• “System Settings” in staff client
• Moves OU Types, OU Names, permissions, etc. from backend to in staff
client
• Autogen still required
• New in 1.4
• Improvements in 1.6
30. Circ Policies (with demo)
• Circ matrix defined in /openils/var/circ/
# ls -lh /openils/var/circ/
total 64K
-rwxr-xr-x 1 opensrf opensrf 25K 2009-05-18 21:47 circ_duration.js
-rwxr-xr-x 1 opensrf opensrf 730 2009-05-06 16:46 circ_groups.js
-rwxr-xr-x 1 opensrf opensrf 1.6K 2009-05-18 21:00 circ_item_config.js
-rwxr-xr-x 1 opensrf opensrf 8.1K 2009-05-06 16:46 circ_lib.js
-rwxr-xr-x 1 opensrf opensrf 439 2009-05-06 16:46 circ_permit_copy.js
-rwxr-xr-x 1 opensrf opensrf 840 2009-05-06 16:46 circ_permit_hold.js
-rwxr-xr-x 1 opensrf opensrf 652 2009-05-06 16:46 circ_permit_patron.js
-rwxr-xr-x 1 opensrf opensrf 137 2009-05-06 16:46 circ_permit_renew.js
• Major moves from files to DB in 1.6 (some support in 1.4)
• Will facilitate config via staff client
• Very flexible. Can address several aspects of users, copies, bibs, etc.
• Typically will use combination of item library, library performing action
(circulation, hold, etc), patron group, etc.
31. Circ Policies (with demo)
• Circ matrix defined in /openils/var/circ/
# ls -lh /openils/var/circ/
total 64K
-rwxr-xr-x 1 opensrf opensrf 25K 2009-05-18 21:47 circ_duration.js
-rwxr-xr-x 1 opensrf opensrf 730 2009-05-06 16:46 circ_groups.js
-rwxr-xr-x 1 opensrf opensrf 1.6K 2009-05-18 21:00 circ_item_config.js
-rwxr-xr-x 1 opensrf opensrf 8.1K 2009-05-06 16:46 circ_lib.js
-rwxr-xr-x 1 opensrf opensrf 439 2009-05-06 16:46 circ_permit_copy.js
-rwxr-xr-x 1 opensrf opensrf 840 2009-05-06 16:46 circ_permit_hold.js
-rwxr-xr-x 1 opensrf opensrf 652 2009-05-06 16:46 circ_permit_patron.js
-rwxr-xr-x 1 opensrf opensrf 137 2009-05-06 16:46 circ_permit_renew.js
• Major moves from files to DB in 1.6 (some support in 1.4)
• Will facilitate config via staff client
• Very flexible. Can address several aspects of users, copies, bibs, etc.
• Typically will use combination of item library, library performing action
(circulation, hold, etc), patron group, etc.