Presentation for course 'Understanding Library Systems and Software Applications' School of Communication and Information, Rutgers University,
New Brunswick, New Jersey, Thursday, October 25, 2012
Crushing, Blending, and Stretching Transactional Data
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
1. Data Warehousing and Mining
Data from Library and University
Systems for Assessment of Library
Operations
Understanding Library Systems and Software Applications
School of Communication and Information, Rutgers University,
New Brunswick, New Jersey,
Thursday, October 25, 2012
Ray Schwartz,
Systems Specialist Librarian
Cheng Library, William Paterson University,
Wayne, New Jersey, USA
schwartzr2 @ wpunj.edu
2. Outline
• What is Data Mining and Data
Warehousing and Why Do We Do It?
• Our Library and University
• Patron Statistical Categories
• Application Server
• Reporting
2
3. What is Data Mining and Data
Warehousing
• Extracting data from legacy systems and other
resources;
• cleaning, scrubbing and preparing data for decision
support;
• maintaining data in appropriate data stores;
• accessing and analysing data using a variety of end
user tools;
• and mining data for significant relationships.
• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
3
4. • The primary purpose of these efforts is
to provide easy access to specifically
prepared data that can be used with
decision support applications such as
management reports, queries, decision
support systems, executive information
systems and data mining.
• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
4
8. Our Systems
• Voyager ILS
• Clio ILL Software
• EZProxy Server
• Banner – University ERP
• University Networked Drive K:
• University Email Server
• University Web Server
8
9. Vendor Services
• Serials Solutions
• A to Z list
• MARC Record Service
• Link Resolver
• OCLC – Bibliographic Utility
• Worldcat Collection Analysis
• Coutts (was Blackwell) – Book Jobber
• Ebsco – Subscription Agent
• Marcive – Authority Control
• Database Vendors 9
12. What would we like to see?
• Breakdowns by department and majors.
• Combined usage by department/majors
of more than one library service.
• Which categories of patrons are
accessing which services?
12
13. Patron Statistical Categories
• Voyager Patron Database allows a maximum of 10
statistical categories per patron record.
• Worked with our University Information Systems
Department to extract the relevant data from the
relevant sources.
• Weekly extract from SIS and HRS to load into
Voyager
13
14. Groups and Services
• Major • Circulation
• Status – Books
– Media
– Undergrad or Grad
– Reserve
– Faculty, Adjunct Faculty or
– By Fund Code
Staff
– Location
• Department
• ILL / Document Delivery
• College • Databases
• Degree • Library Web Pages
• No. of Credits – Subject Area Resource Guides
– Reference Requests
• Year of Study
• Catalog
• Campus Location • Other Vendor Services
– Serials Solutions
14
15. From Students
•College and Mercer Identifier
•Class Level (Freshman, Sophomore, Junior, Senior, Graduate)
•Total Hours Registered for Current Semester
•Major
•2nd Major
•Degree
•CA-Collection Agency
•SOILS
•Student Entrance Level (New Non-Traditional Freshman, New
First Time Transfer, etc.)
•Department
16. From Faculty / Staff / Adjuncts
•College
•Full or Part-Time
•Status (Faculty, Adjunct, Staff, Professional Staff, Tenured,
Tenure-Track)
•Division
•Departments
17. History Department - 12 months - Feb. 2008
%
BORROW CIRC/ CIRC/
PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS ING MEMBER BORROWER
UNDERGRADUATE
STUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69
GRADUATE
STUDENTS 419 13 76 508 14 13 93% 36.29 39.08
ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25
FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35
HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93
LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59
DEFINITIONS:
BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge
MEDIA CIRCULATION = audio & video materials, including media reserves
EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc.
MEMBER = declared major or department member
BORROWER = any member who borrowed materials
Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers
17
18. Communications Majors FY08/09
Communications
Statistical Categories // Item Type / Location / Call No Type / Call No Majors Freshman Sophomore Junior Senior
M- DVD / Media Services / Other / DVD 194 17 31 52 94
M- VideoCass / Media Services / Other / VC 228 11 40 67 110
T- Book / 2nd Floor - Circulating / Library of Congress / B 34 9 8 11 6
T- Book / 2nd Floor - Circulating / Library of Congress / BD 3 1 2
T- Book / 2nd Floor - Circulating / Library of Congress / BF 30 5 5 12 8
...
2nd Floor Circulating 1531 222 310 403 596
T- Juvenile / CMC / 125 14 26 20 35
T- NJDoc / Askew Documents Room / Other / 1 1
New Jersey History 10 0 2 7 1
T- ReserveBk / Reserves Desk / 189 13 46 68 62
T- SpecColl / Special Collection / Library of Congress / LC 3 3
T- Book-McNaughton / Leisure Lounge / Library of Congress / F 2 1 1
T- Book-McNaughton / Leisure Lounge / Library of Congress / HF 1 1
T- Book-McNaughton / Leisure Lounge / Library of Congress / HS 2 2
T- Book-McNaughton / Leisure Lounge / Library of Congress / HV 5 1 2 2
T- Book-McNaughton / Leisure Lounge / Library of Congress / ML 1 1
T- Book-McNaughton / Leisure Lounge / Library of Congress / PN 3 3
T- Book-McNaughton / Leisure Lounge / Library of Congress / PS 29 4 10 15
T- Book-McNaughton / Leisure Lounge / Library of Congress / RC 2 1 1
T- Book-McNaughton / Leisure Lounge / Library of Congress / TL 1 1
Leisure Lounge 49 9 1 19 20
18
19. Challenges with combining
data from various services
• Little to no linkage of data
• Multiple user IDs for authentication
19
20. Application Server
• A machine or its software that works in
conjunction with a web server to deliver
application services such as the dynamic
creation of a webpage from content stored in a
database. From http://www.webtools.ca.gov/help/Glossary.asp
• Web Server Software (Apache or IIS)
• Database Management System – DBMS (MySQL,
Oracle, MS SQL Server)
• Scripting Language (Perl, PHP, ColdFusion, ASP)
20
21. Why an Application Server?
• Relevant data in logfiles need to be in
a database to be analyze.
• Need your own DBMS to create new
tables and queries.
21
23. Daily and Weekly Email
Reports from the Application
Server
Circ Fines Audit Daily Report - Daily at 6:05 AM.
Dupe Patron Record Report - Daily at 5:56 AM.
Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM.
Media Service Scheduling Rooms Report - Daily at 6:02 AM.
Media Services Equipment Pickup Summary - Daily at 7:00 AM.
Received Title Alert - Daily at 6:59 AM.
Reserves Overdues - Daily at 5:59 AM.
Scheduled LIS Tasks - Daily at 6:00 AM.
ILL Borrowing Overdues Report - Weekly at 5:59 AM.
ILL Lending Reports - Weekly at 6:15 AM.
23
24. Monthly Email Reports from
the Application Server
Circ Fines Audit - Monthly at 6:10 AM.
Circulation by Location and Item Type - Monthly at 6:21 AM.
Circulation Lost and Paid - Monthly at 6:25 AM.
Circulation Online Renewal Count - Monthly at 6:30 AM.
Media Circulation - Monthly at 6:35 AM.
Reserve Circulation - Monthly at 6:40 AM.
24
27. Lending Services Reports
Lists of patrons with fines between $10 and $19.99
• Student and Alumni fines list - Sorted by either Name, Amount or Notice Date.
• PALS and Courtesy Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with fines over $19.99
• Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or
Notes.
• PALS and Courtesy Patron fines list - Sorted by Name.
• VALE Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with overdues older than 30 days
• Student and Alumni overdues list - Sorted by either Name, IID or Notes.
• PALS and Courtesy Patron overdues list - Sorted by Name.
• All other Patron overdues list except VALE - Sorted by Name.
27
28. Lending Services Reports, cont.
Lists of VALE patrons with overdues older than 6 months
• VALE patron overdues list - Sorted by Name.
Miscellaneous Reports
• Patrons with the word "Collection Agency" or "CA" in their notes.
• Patrons with the word "FINE" in one of their notes.
• Patrons with the word "SOILS" in their notes.
• Patrons with the word "FALL07 SOILS" in their notes.
• Patrons with the word "HOLD" in their notes.
• Combined list of HOLD, FINE, and CA.
Circulation Reports by Item Type from 2003 to the present
• All Staff.
• All Colleges
• Undergraduates by Major.
• Graduates by Major
• Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009
and 30-Nov-2009 28
29. One of Our Projects
• Mining EZProxy logfiles and linking to patron
statistical categories from the Voyager Patron
Database
– What majors and departments are accessing
which database services?
– What majors and departments are accessing
the ILL services?
29
30. EZProxy via LDAP authenticates
access to:
Databases
Electronic journals
ILL/Doc Delivery forms
31. Example of EZProxy log entry
• Ip address nj.dhcp.embarqhsd.net
• (Not used) -
• user id theuser
• date/time 1/1/2008 4:25:15 AM
• Method GET
• page http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZa&url=
retrieved http://www.wpunj.edu/scripts/webscript.exe?fs.scr
• Version HTTP/1.1
• response 302
code
• no. of bytes 537
• Referring http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/w
URL ebscript.exe?fs.scr
• User agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
31
34. • Active Circ transactions are stored in a
table with patron ID and linked to
statistical categories.
• Completed Circ transactions are moved
to another table without the patron ID,
but still linked with the patron statistical
categories.
• The Patron Table contains the total
counts of transactions for each patron,
but no link to which transactions they are.
34
35. MySQL operations
• Extract patron statistical categories out of
Voyager and build them into the MySQL
database.
• EZProxy transactions would be stored in one
table and linked to patron statistical
categories via the user ID.
• Once completed, user ids are deleted.
• Logs are collected monthly and loaded and
deleted monthly.
35
50. Reporting and Standards
• Reporting
– Emailed periodically - e.g., daily
dossiers, and other event triggered reports.
– On demand, via email, web pages or a
printer.
• Standards
– Share data for comparative research.
– Groups of libraries and consortia
50