SlideShare uma empresa Scribd logo
1 de 55
Data Warehousing and Mining
 Data from Library and University
Systems for Assessment of Library
           Operations
   Understanding Library Systems and Software Applications
 School of Communication and Information, Rutgers University,
                New Brunswick, New Jersey,
                 Thursday, October 25, 2012

                       Ray Schwartz,
                Systems Specialist Librarian
          Cheng Library, William Paterson University,
                  Wayne, New Jersey, USA
                   schwartzr2 @ wpunj.edu
Outline
• What is Data Mining and Data
  Warehousing and Why Do We Do It?
• Our Library and University
• Patron Statistical Categories
• Application Server
• Reporting



                                     2
What is Data Mining and Data
        Warehousing
• Extracting data from legacy systems and other
  resources;
• cleaning, scrubbing and preparing data for decision
  support;
• maintaining data in appropriate data stores;
• accessing and analysing data using a variety of end
  user tools;
• and mining data for significant relationships.


 •   Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
     Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.

                                                                                              3
• The primary purpose of these efforts is
  to provide easy access to specifically
  prepared data that can be used with
  decision support applications such as
  management reports, queries, decision
  support systems, executive information
  systems and data mining.




•   Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
    Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.

                                                                                             4
Our University
•   9000 undergraduates
•   1000 graduates (mostly education majors)
•   400 faculty
•   800 adjuncts
•   1000 staff




                                               5
Our Library
•   19 librarians and 26 library staff
•   350,000 volumes
•   18,000 audiovisual items
•   47,000 print and electronic periodicals
•   124 general and subject specific databases
•   $1,100,000 Non-Salary Allocations




                                                 6
Our Transactions
•   600,000 Database Searches
•   413,000 Gate Counts
•   40,000 Library Materials Circulation
•   34,000 Equipment Circulation
•   19,000 Reference Queries
•   3,000 Interlibrary Loans
•   5,000 Documents Delivered



                                           7
Our Systems
•   Voyager ILS
•   Clio ILL Software
•   EZProxy Server
•   Banner – University ERP
•   University Networked Drive K:
•   University Email Server
•   University Web Server

                                    8
Vendor Services
• Serials Solutions
    • A to Z list
    • MARC Record Service
    • Link Resolver
• OCLC – Bibliographic Utility
    • Worldcat Collection Analysis
•   Coutts (was Blackwell) – Book Jobber
•   Ebsco – Subscription Agent
•   Marcive – Authority Control
•   Database Vendors                       9
Voyager Overdue and Fine
     Notices - Daily




                           10
Quarterly Extract for Serials
  Solutions AtoZ Service




                                11
What would we like to see?
• Breakdowns by department and majors.

• Combined usage by department/majors
  of more than one library service.

• Which categories of patrons are
  accessing which services?


                                        12
Patron Statistical Categories
• Voyager Patron Database allows a maximum of 10
  statistical categories per patron record.

• Worked with our University Information Systems
  Department to extract the relevant data from the
  relevant sources.

• Weekly extract from SIS and HRS to load into
  Voyager

                                                     13
Groups and Services
• Major                              •   Circulation
• Status                                   – Books
                                           – Media
     – Undergrad or Grad
                                           – Reserve
     – Faculty, Adjunct Faculty or
                                           – By Fund Code
       Staff
                                           – Location
•   Department
                                     •   ILL / Document Delivery
•   College                          •   Databases
•   Degree                           •   Library Web Pages
•   No. of Credits                        – Subject Area Resource Guides
                                          – Reference Requests
•   Year of Study
                                     •   Catalog
•   Campus Location                  •   Other Vendor Services
                                          – Serials Solutions



                                                                       14
From Students
•College and Mercer Identifier
•Class Level (Freshman, Sophomore, Junior, Senior, Graduate)
•Total Hours Registered for Current Semester
•Major
•2nd Major
•Degree
•CA-Collection Agency
•SOILS
•Student Entrance Level (New Non-Traditional Freshman, New
First Time Transfer, etc.)
•Department
From Faculty / Staff / Adjuncts

•College
•Full or Part-Time
•Status (Faculty, Adjunct, Staff, Professional Staff, Tenured,
Tenure-Track)
•Division
•Departments
History Department - 12 months -                                                                              Feb. 2008
                                                                                                              %
                                                                                                           BORROW              CIRC/       CIRC/
  PATRON STATUS           BOOK CIRC MEDIA CIRC EQUIP CIRC          TOTAL CIRC     MEMBERS        BORROWERS   ING              MEMBER     BORROWER

UNDERGRADUATE
STUDENTS                        2,715           250          698         3,663             238         186              78%      15.39        19.69

GRADUATE
STUDENTS                         419             13           76           508              14           13             93%      36.29        39.08

ADJUNCT FACULTY                  100             65           20           185              32           20             63%       5.78         9.25

FULL-TIME FACULTY                159            115          194           468              24           23             96%      19.50        20.35

HISTORY TOTALS                  3,393           443          988         4,824             308         242              79%      15.66        19.93

LIBRARY TOTALS                23,370          8,713       20,703        52,756         7,418          4,981             67%       7.11        10.59



DEFINITIONS:
BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge
MEDIA CIRCULATION = audio & video materials, including media reserves

EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc.
MEMBER = declared major or department member
BORROWER = any member who borrowed materials
Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers



                                                                                                                                         17
Communications Majors FY08/09
                                                                          Communications
Statistical Categories // Item Type / Location / Call No Type / Call No          Majors    Freshman   Sophomore   Junior   Senior
M- DVD / Media Services / Other / DVD                                               194         17           31      52       94
M- VideoCass / Media Services / Other / VC                                          228         11           40      67      110
T- Book / 2nd Floor - Circulating / Library of Congress / B                          34          9            8      11          6
T- Book / 2nd Floor - Circulating / Library of Congress / BD                          3          1                               2
T- Book / 2nd Floor - Circulating / Library of Congress / BF                         30          5            5      12          8
...
2nd Floor Circulating                                                              1531        222          310     403      596
T- Juvenile / CMC /                                                                 125         14           26      20       35
T- NJDoc / Askew Documents Room / Other /                                             1                                          1
New Jersey History                                                                   10          0            2       7          1
T- ReserveBk / Reserves Desk /                                                      189         13           46      68       62
T- SpecColl / Special Collection / Library of Congress / LC                           3                               3
T- Book-McNaughton / Leisure Lounge / Library of Congress / F                         2                               1          1
T- Book-McNaughton / Leisure Lounge / Library of Congress / HF                        1                       1
T- Book-McNaughton / Leisure Lounge / Library of Congress / HS                        2                               2
T- Book-McNaughton / Leisure Lounge / Library of Congress / HV                        5          1                    2          2
T- Book-McNaughton / Leisure Lounge / Library of Congress / ML                        1                               1
T- Book-McNaughton / Leisure Lounge / Library of Congress / PN                        3          3
T- Book-McNaughton / Leisure Lounge / Library of Congress / PS                       29          4                   10       15
T- Book-McNaughton / Leisure Lounge / Library of Congress / RC                        2          1                               1
T- Book-McNaughton / Leisure Lounge / Library of Congress / TL                        1                                          1
Leisure Lounge                                                                       49          9            1      19       20

                                                                                                                            18
Challenges with combining
 data from various services
• Little to no linkage of data

• Multiple user IDs for authentication




                                         19
Application Server
• A machine or its software that works in
  conjunction with a web server to deliver
  application services such as the dynamic
  creation of a webpage from content stored in a
  database. From http://www.webtools.ca.gov/help/Glossary.asp

• Web Server Software (Apache or IIS)
• Database Management System – DBMS (MySQL,
  Oracle, MS SQL Server)
• Scripting Language (Perl, PHP, ColdFusion, ASP)

                                                           20
Why an Application Server?
• Relevant data in logfiles need to be in
  a database to be analyze.

• Need your own DBMS to create new
  tables and queries.




                                            21
Authentication of ILL and other forms are
 routed through the EZProxy server




                                            22
Daily and Weekly Email
   Reports from the Application
              Server
Circ Fines Audit Daily Report - Daily at 6:05 AM.
Dupe Patron Record Report - Daily at 5:56 AM.
Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM.
Media Service Scheduling Rooms Report - Daily at 6:02 AM.
Media Services Equipment Pickup Summary - Daily at 7:00 AM.
Received Title Alert - Daily at 6:59 AM.
Reserves Overdues - Daily at 5:59 AM.
Scheduled LIS Tasks - Daily at 6:00 AM.

ILL Borrowing Overdues Report - Weekly at 5:59 AM.
ILL Lending Reports - Weekly at 6:15 AM.



                                                                     23
Monthly Email Reports from
      the Application Server
Circ Fines Audit - Monthly at 6:10 AM.
Circulation by Location and Item Type - Monthly at 6:21 AM.
Circulation Lost and Paid - Monthly at 6:25 AM.
Circulation Online Renewal Count - Monthly at 6:30 AM.
Media Circulation - Monthly at 6:35 AM.
Reserve Circulation - Monthly at 6:40 AM.




                                                              24
25
On Demand Reports




                    26
Lending Services Reports

Lists of patrons with fines between $10 and $19.99
•   Student and Alumni fines list - Sorted by either Name, Amount or Notice Date.
•   PALS and Courtesy Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with fines over $19.99
•   Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or
    Notes.
•   PALS and Courtesy Patron fines list - Sorted by Name.
•   VALE Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with overdues older than 30 days
•   Student and Alumni overdues list - Sorted by either Name, IID or Notes.
•   PALS and Courtesy Patron overdues list - Sorted by Name.
•   All other Patron overdues list except VALE - Sorted by Name.


                                                                                     27
Lending Services Reports, cont.
Lists of VALE patrons with overdues older than 6 months
• VALE patron overdues list - Sorted by Name.
Miscellaneous Reports
• Patrons with the word "Collection Agency" or "CA" in their notes.
•   Patrons with the word "FINE" in one of their notes.
•   Patrons with the word "SOILS" in their notes.
•   Patrons with the word "FALL07 SOILS" in their notes.
•   Patrons with the word "HOLD" in their notes.
• Combined list of HOLD, FINE, and CA.
Circulation Reports by Item Type from 2003 to the present
• All Staff.
•   All Colleges
•   Undergraduates by Major.
•   Graduates by Major
•   Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009
    and 30-Nov-2009                                                               28
One of Our Projects
• Mining EZProxy logfiles and linking to patron
  statistical categories from the Voyager Patron
  Database

  – What majors and departments are accessing
    which database services?

  – What majors and departments are accessing
    the ILL services?


                                                   29
EZProxy via LDAP authenticates
          access to:
Databases
Electronic journals
ILL/Doc Delivery forms
Example of EZProxy log entry
•   Ip address     nj.dhcp.embarqhsd.net
•   (Not used)     -
•   user id        theuser
•   date/time      1/1/2008 4:25:15 AM
•   Method         GET
•   page           http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZa&url=
    retrieved          http://www.wpunj.edu/scripts/webscript.exe?fs.scr
•   Version        HTTP/1.1
•   response       302
    code
•   no. of bytes   537
•   Referring      http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/w
    URL                ebscript.exe?fs.scr
•   User agent     Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)


                                                                                           31
Patron Privacy and Standards




                           32
Using Voyager as the model
     for Patron Privacy




                             33
• Active Circ transactions are stored in a
  table with patron ID and linked to
  statistical categories.
• Completed Circ transactions are moved
  to another table without the patron ID,
  but still linked with the patron statistical
  categories.
• The Patron Table contains the total
  counts of transactions for each patron,
  but no link to which transactions they are.


                                            34
MySQL operations
• Extract patron statistical categories out of
  Voyager and build them into the MySQL
  database.
• EZProxy transactions would be stored in one
  table and linked to patron statistical
  categories via the user ID.
• Once completed, user ids are deleted.
• Logs are collected monthly and loaded and
  deleted monthly.
                                             35
Slide removed for Privacy Reasons
Slide removed for Privacy Reasons




                                    38
ILL request form authentications by major
Article                              Book
Count Major                          Count Major
      62 M- Psychology                   90 M- History
      60 M- Sociology                    28 M- Non-Degree
      42 M- Applied Clinical Psych       25 M- Pub Pol & Intl Affairs
      35 M- Education                    20 M- Spanish
      31 M- History                      18 M- English
      30 M- Spanish                      16 M- Undecided
      29 M- Nursing                      14 M- Art
          M- Communication               14 M- Education
      19 Disorders                       11 M- Sociology
      19 M- Communication                10 M- Biology
      14 M- Biotechnology                 9 M- Music
      14 M- Counseling                    9 M- Special Programs
      14 M- English                       8 M- Psychology
      12 M- Non-Degree                    7 M- Biotechnology
      10 M- Community/Sch Health          7 M- Political Science
        7 M- Biology                      6 M- Anthropology
        7 M- Political Science            6 M- Music - Jazz Studies
        6 M- Undecided                    4 M- Business
        5 M- Comm Media Studies           4 M- Communication
        5 M- Reading                      4 M- Nursing
        4 M- Business
                                                                        48
49
Reporting and Standards
• Reporting
     –   Emailed periodically - e.g., daily
         dossiers, and other event triggered reports.
     –   On demand, via email, web pages or a
         printer.
• Standards
     –   Share data for comparative research.
     –   Groups of libraries and consortia




                                                  50
51
52
53
Further Reading

•Coombs, Karen A. (2005). Lessons learned from analyzing
library database usage data. Library Hi Tech, 23:4, 598.

• Diana, Birkin James. dashboard_beta.
http://library.brown.edu/dashboard/info/

• Metridoc. http://code.google.com/p/metridoc/

• Morton-Owens, Emily (2011) Trends at a glance. LITA 2011.
http://connect.ala.org/files/79651/trends_at_a_glance_dashb
oards_pdf_12068.pdf
Questions?


             Ray Schwartz,
      Systems Specialist Librarian
Cheng Library, William Paterson University,
        Wayne, New Jersey, USA
        schwartzr2 @ wpunj.edu




                                              55

Mais conteúdo relacionado

Semelhante a Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.Selena Killick
 
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historyAdam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historysherif user group
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data managementIncisive_Events
 
Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)EDINA, University of Edinburgh
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...datacite
 
‘The Establishment and Development of UCD Library’s Research Services Unit:Su...
‘The Establishment and Development of UCD Library’s Research Services Unit:Su...‘The Establishment and Development of UCD Library’s Research Services Unit:Su...
‘The Establishment and Development of UCD Library’s Research Services Unit:Su...CONUL Conference
 
Defining the Libraries' Role in Research: A Needs Assessment  Case Study
Defining the Libraries' Role in Research:  A Needs Assessment  Case StudyDefining the Libraries' Role in Research:  A Needs Assessment  Case Study
Defining the Libraries' Role in Research: A Needs Assessment  Case StudyKathryn Crowe
 
Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.Selena Killick
 
The Emergence  of Research Information Management (RIM) within US Libraries
The Emergence  of Research Information Management (RIM) within US LibrariesThe Emergence  of Research Information Management (RIM) within US Libraries
The Emergence  of Research Information Management (RIM) within US LibrariesOCLC
 
Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)Research Consulting Limited
 
Institutional repository
Institutional repositoryInstitutional repository
Institutional repositorysaurabh kaushik
 
LIBER Webinar: Supporting Data Literacy
LIBER Webinar: Supporting Data LiteracyLIBER Webinar: Supporting Data Literacy
LIBER Webinar: Supporting Data LiteracyLIBER Europe
 
Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...Robin Rice
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library
 

Semelhante a Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations (20)

SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.SCONUL Statistics: The view from the shop floor.
SCONUL Statistics: The view from the shop floor.
 
Research Data Management at The University of Edinburgh
Research Data Management at The University of EdinburghResearch Data Management at The University of Edinburgh
Research Data Management at The University of Edinburgh
 
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription historyAdam Rusbridge (EDINA) - Clarifying e-journal subscription history
Adam Rusbridge (EDINA) - Clarifying e-journal subscription history
 
Andrew Cox Research data management
Andrew Cox Research data managementAndrew Cox Research data management
Andrew Cox Research data management
 
Clement, A measured approach to supporting research productivity
Clement, A measured approach to supporting research productivityClement, A measured approach to supporting research productivity
Clement, A measured approach to supporting research productivity
 
Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)Piloting an E-Journals Preservation Registry Service (PEPRS)
Piloting an E-Journals Preservation Registry Service (PEPRS)
 
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...
 
‘The Establishment and Development of UCD Library’s Research Services Unit:Su...
‘The Establishment and Development of UCD Library’s Research Services Unit:Su...‘The Establishment and Development of UCD Library’s Research Services Unit:Su...
‘The Establishment and Development of UCD Library’s Research Services Unit:Su...
 
Defining the Libraries' Role in Research: A Needs Assessment  Case Study
Defining the Libraries' Role in Research:  A Needs Assessment  Case StudyDefining the Libraries' Role in Research:  A Needs Assessment  Case Study
Defining the Libraries' Role in Research: A Needs Assessment  Case Study
 
Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.Electronic Collection Management: How statistics can, and can't, help.
Electronic Collection Management: How statistics can, and can't, help.
 
The Emergence  of Research Information Management (RIM) within US Libraries
The Emergence  of Research Information Management (RIM) within US LibrariesThe Emergence  of Research Information Management (RIM) within US Libraries
The Emergence  of Research Information Management (RIM) within US Libraries
 
Biblio
BiblioBiblio
Biblio
 
Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)Open access advocacy: joining the dots (session 4a)
Open access advocacy: joining the dots (session 4a)
 
Today's forecast for your campus: BLUEcloud
 Today's forecast for your campus: BLUEcloud Today's forecast for your campus: BLUEcloud
Today's forecast for your campus: BLUEcloud
 
Institutional repository
Institutional repositoryInstitutional repository
Institutional repository
 
LIBER Webinar: Supporting Data Literacy
LIBER Webinar: Supporting Data LiteracyLIBER Webinar: Supporting Data Literacy
LIBER Webinar: Supporting Data Literacy
 
Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...Open data and research data management at the University of Edinburgh: polici...
Open data and research data management at the University of Edinburgh: polici...
 
Ukla uksg 2013_final
Ukla uksg 2013_finalUkla uksg 2013_final
Ukla uksg 2013_final
 
Ukla uksg 2013_final
Ukla uksg 2013_finalUkla uksg 2013_final
Ukla uksg 2013_final
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for Researchers
 

Mais de Ray Schwartz

Discovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategiesDiscovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategiesRay Schwartz
 
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...Ray Schwartz
 
Hacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searchingHacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searchingRay Schwartz
 
The path to flexible loading of patron records
The path to flexible loading of patron recordsThe path to flexible loading of patron records
The path to flexible loading of patron recordsRay Schwartz
 
Using drill down within alma analytics reports
Using drill down within alma analytics reportsUsing drill down within alma analytics reports
Using drill down within alma analytics reportsRay Schwartz
 
Vale2017 b13-presentation
Vale2017 b13-presentationVale2017 b13-presentation
Vale2017 b13-presentationRay Schwartz
 
Doing data visualizations with tableau
Doing data visualizations with tableauDoing data visualizations with tableau
Doing data visualizations with tableauRay Schwartz
 
Doing data visualizations with tableau
Doing data visualizations with tableauDoing data visualizations with tableau
Doing data visualizations with tableauRay Schwartz
 
Besides Circulation, How else is the print collection being used? Reporting o...
Besides Circulation, How else is the print collection being used? Reporting o...Besides Circulation, How else is the print collection being used? Reporting o...
Besides Circulation, How else is the print collection being used? Reporting o...Ray Schwartz
 
Fetch It! A Custom Voyager service for Holds/Retrieval without using reporter
Fetch It! A Custom Voyager service for Holds/Retrieval without using reporterFetch It! A Custom Voyager service for Holds/Retrieval without using reporter
Fetch It! A Custom Voyager service for Holds/Retrieval without using reporterRay Schwartz
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataRay Schwartz
 
Logging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT LogLogging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT LogRay Schwartz
 
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...Ray Schwartz
 
Crushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional DataCrushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional DataRay Schwartz
 

Mais de Ray Schwartz (15)

Discovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategiesDiscovery layer decisions, configurations and strategies
Discovery layer decisions, configurations and strategies
 
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
Deploying vu find as the discovery layer for voyager, eds, libguides, and oth...
 
Hacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searchingHacking vufind combined search and making bento searching
Hacking vufind combined search and making bento searching
 
Browses
BrowsesBrowses
Browses
 
The path to flexible loading of patron records
The path to flexible loading of patron recordsThe path to flexible loading of patron records
The path to flexible loading of patron records
 
Using drill down within alma analytics reports
Using drill down within alma analytics reportsUsing drill down within alma analytics reports
Using drill down within alma analytics reports
 
Vale2017 b13-presentation
Vale2017 b13-presentationVale2017 b13-presentation
Vale2017 b13-presentation
 
Doing data visualizations with tableau
Doing data visualizations with tableauDoing data visualizations with tableau
Doing data visualizations with tableau
 
Doing data visualizations with tableau
Doing data visualizations with tableauDoing data visualizations with tableau
Doing data visualizations with tableau
 
Besides Circulation, How else is the print collection being used? Reporting o...
Besides Circulation, How else is the print collection being used? Reporting o...Besides Circulation, How else is the print collection being used? Reporting o...
Besides Circulation, How else is the print collection being used? Reporting o...
 
Fetch It! A Custom Voyager service for Holds/Retrieval without using reporter
Fetch It! A Custom Voyager service for Holds/Retrieval without using reporterFetch It! A Custom Voyager service for Holds/Retrieval without using reporter
Fetch It! A Custom Voyager service for Holds/Retrieval without using reporter
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
Logging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT LogLogging Data on Voyager Transactions that Voyager does NOT Log
Logging Data on Voyager Transactions that Voyager does NOT Log
 
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
Application of EZProxy logs, Voyager’s Patron Database, MySQL, and ColdFusion...
 
Crushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional DataCrushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional Data
 

Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations

  • 1. Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations Understanding Library Systems and Software Applications School of Communication and Information, Rutgers University, New Brunswick, New Jersey, Thursday, October 25, 2012 Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu
  • 2. Outline • What is Data Mining and Data Warehousing and Why Do We Do It? • Our Library and University • Patron Statistical Categories • Application Server • Reporting 2
  • 3. What is Data Mining and Data Warehousing • Extracting data from legacy systems and other resources; • cleaning, scrubbing and preparing data for decision support; • maintaining data in appropriate data stores; • accessing and analysing data using a variety of end user tools; • and mining data for significant relationships. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 3
  • 4. • The primary purpose of these efforts is to provide easy access to specifically prepared data that can be used with decision support applications such as management reports, queries, decision support systems, executive information systems and data mining. • Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall. 4
  • 5. Our University • 9000 undergraduates • 1000 graduates (mostly education majors) • 400 faculty • 800 adjuncts • 1000 staff 5
  • 6. Our Library • 19 librarians and 26 library staff • 350,000 volumes • 18,000 audiovisual items • 47,000 print and electronic periodicals • 124 general and subject specific databases • $1,100,000 Non-Salary Allocations 6
  • 7. Our Transactions • 600,000 Database Searches • 413,000 Gate Counts • 40,000 Library Materials Circulation • 34,000 Equipment Circulation • 19,000 Reference Queries • 3,000 Interlibrary Loans • 5,000 Documents Delivered 7
  • 8. Our Systems • Voyager ILS • Clio ILL Software • EZProxy Server • Banner – University ERP • University Networked Drive K: • University Email Server • University Web Server 8
  • 9. Vendor Services • Serials Solutions • A to Z list • MARC Record Service • Link Resolver • OCLC – Bibliographic Utility • Worldcat Collection Analysis • Coutts (was Blackwell) – Book Jobber • Ebsco – Subscription Agent • Marcive – Authority Control • Database Vendors 9
  • 10. Voyager Overdue and Fine Notices - Daily 10
  • 11. Quarterly Extract for Serials Solutions AtoZ Service 11
  • 12. What would we like to see? • Breakdowns by department and majors. • Combined usage by department/majors of more than one library service. • Which categories of patrons are accessing which services? 12
  • 13. Patron Statistical Categories • Voyager Patron Database allows a maximum of 10 statistical categories per patron record. • Worked with our University Information Systems Department to extract the relevant data from the relevant sources. • Weekly extract from SIS and HRS to load into Voyager 13
  • 14. Groups and Services • Major • Circulation • Status – Books – Media – Undergrad or Grad – Reserve – Faculty, Adjunct Faculty or – By Fund Code Staff – Location • Department • ILL / Document Delivery • College • Databases • Degree • Library Web Pages • No. of Credits – Subject Area Resource Guides – Reference Requests • Year of Study • Catalog • Campus Location • Other Vendor Services – Serials Solutions 14
  • 15. From Students •College and Mercer Identifier •Class Level (Freshman, Sophomore, Junior, Senior, Graduate) •Total Hours Registered for Current Semester •Major •2nd Major •Degree •CA-Collection Agency •SOILS •Student Entrance Level (New Non-Traditional Freshman, New First Time Transfer, etc.) •Department
  • 16. From Faculty / Staff / Adjuncts •College •Full or Part-Time •Status (Faculty, Adjunct, Staff, Professional Staff, Tenured, Tenure-Track) •Division •Departments
  • 17. History Department - 12 months - Feb. 2008 % BORROW CIRC/ CIRC/ PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS ING MEMBER BORROWER UNDERGRADUATE STUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69 GRADUATE STUDENTS 419 13 76 508 14 13 93% 36.29 39.08 ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25 FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35 HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93 LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59 DEFINITIONS: BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge MEDIA CIRCULATION = audio & video materials, including media reserves EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc. MEMBER = declared major or department member BORROWER = any member who borrowed materials Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers 17
  • 18. Communications Majors FY08/09 Communications Statistical Categories // Item Type / Location / Call No Type / Call No Majors Freshman Sophomore Junior Senior M- DVD / Media Services / Other / DVD 194 17 31 52 94 M- VideoCass / Media Services / Other / VC 228 11 40 67 110 T- Book / 2nd Floor - Circulating / Library of Congress / B 34 9 8 11 6 T- Book / 2nd Floor - Circulating / Library of Congress / BD 3 1 2 T- Book / 2nd Floor - Circulating / Library of Congress / BF 30 5 5 12 8 ... 2nd Floor Circulating 1531 222 310 403 596 T- Juvenile / CMC / 125 14 26 20 35 T- NJDoc / Askew Documents Room / Other / 1 1 New Jersey History 10 0 2 7 1 T- ReserveBk / Reserves Desk / 189 13 46 68 62 T- SpecColl / Special Collection / Library of Congress / LC 3 3 T- Book-McNaughton / Leisure Lounge / Library of Congress / F 2 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / HF 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / HS 2 2 T- Book-McNaughton / Leisure Lounge / Library of Congress / HV 5 1 2 2 T- Book-McNaughton / Leisure Lounge / Library of Congress / ML 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / PN 3 3 T- Book-McNaughton / Leisure Lounge / Library of Congress / PS 29 4 10 15 T- Book-McNaughton / Leisure Lounge / Library of Congress / RC 2 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / TL 1 1 Leisure Lounge 49 9 1 19 20 18
  • 19. Challenges with combining data from various services • Little to no linkage of data • Multiple user IDs for authentication 19
  • 20. Application Server • A machine or its software that works in conjunction with a web server to deliver application services such as the dynamic creation of a webpage from content stored in a database. From http://www.webtools.ca.gov/help/Glossary.asp • Web Server Software (Apache or IIS) • Database Management System – DBMS (MySQL, Oracle, MS SQL Server) • Scripting Language (Perl, PHP, ColdFusion, ASP) 20
  • 21. Why an Application Server? • Relevant data in logfiles need to be in a database to be analyze. • Need your own DBMS to create new tables and queries. 21
  • 22. Authentication of ILL and other forms are routed through the EZProxy server 22
  • 23. Daily and Weekly Email Reports from the Application Server Circ Fines Audit Daily Report - Daily at 6:05 AM. Dupe Patron Record Report - Daily at 5:56 AM. Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM. Media Service Scheduling Rooms Report - Daily at 6:02 AM. Media Services Equipment Pickup Summary - Daily at 7:00 AM. Received Title Alert - Daily at 6:59 AM. Reserves Overdues - Daily at 5:59 AM. Scheduled LIS Tasks - Daily at 6:00 AM. ILL Borrowing Overdues Report - Weekly at 5:59 AM. ILL Lending Reports - Weekly at 6:15 AM. 23
  • 24. Monthly Email Reports from the Application Server Circ Fines Audit - Monthly at 6:10 AM. Circulation by Location and Item Type - Monthly at 6:21 AM. Circulation Lost and Paid - Monthly at 6:25 AM. Circulation Online Renewal Count - Monthly at 6:30 AM. Media Circulation - Monthly at 6:35 AM. Reserve Circulation - Monthly at 6:40 AM. 24
  • 25. 25
  • 27. Lending Services Reports Lists of patrons with fines between $10 and $19.99 • Student and Alumni fines list - Sorted by either Name, Amount or Notice Date. • PALS and Courtesy Patron fines list - Sorted by Name. • All other Patron fines list - Sorted by Name. Lists of patrons with fines over $19.99 • Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or Notes. • PALS and Courtesy Patron fines list - Sorted by Name. • VALE Patron fines list - Sorted by Name. • All other Patron fines list - Sorted by Name. Lists of patrons with overdues older than 30 days • Student and Alumni overdues list - Sorted by either Name, IID or Notes. • PALS and Courtesy Patron overdues list - Sorted by Name. • All other Patron overdues list except VALE - Sorted by Name. 27
  • 28. Lending Services Reports, cont. Lists of VALE patrons with overdues older than 6 months • VALE patron overdues list - Sorted by Name. Miscellaneous Reports • Patrons with the word "Collection Agency" or "CA" in their notes. • Patrons with the word "FINE" in one of their notes. • Patrons with the word "SOILS" in their notes. • Patrons with the word "FALL07 SOILS" in their notes. • Patrons with the word "HOLD" in their notes. • Combined list of HOLD, FINE, and CA. Circulation Reports by Item Type from 2003 to the present • All Staff. • All Colleges • Undergraduates by Major. • Graduates by Major • Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009 and 30-Nov-2009 28
  • 29. One of Our Projects • Mining EZProxy logfiles and linking to patron statistical categories from the Voyager Patron Database – What majors and departments are accessing which database services? – What majors and departments are accessing the ILL services? 29
  • 30. EZProxy via LDAP authenticates access to: Databases Electronic journals ILL/Doc Delivery forms
  • 31. Example of EZProxy log entry • Ip address nj.dhcp.embarqhsd.net • (Not used) - • user id theuser • date/time 1/1/2008 4:25:15 AM • Method GET • page http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZa&url= retrieved http://www.wpunj.edu/scripts/webscript.exe?fs.scr • Version HTTP/1.1 • response 302 code • no. of bytes 537 • Referring http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/w URL ebscript.exe?fs.scr • User agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322) 31
  • 32. Patron Privacy and Standards 32
  • 33. Using Voyager as the model for Patron Privacy 33
  • 34. • Active Circ transactions are stored in a table with patron ID and linked to statistical categories. • Completed Circ transactions are moved to another table without the patron ID, but still linked with the patron statistical categories. • The Patron Table contains the total counts of transactions for each patron, but no link to which transactions they are. 34
  • 35. MySQL operations • Extract patron statistical categories out of Voyager and build them into the MySQL database. • EZProxy transactions would be stored in one table and linked to patron statistical categories via the user ID. • Once completed, user ids are deleted. • Logs are collected monthly and loaded and deleted monthly. 35
  • 36.
  • 37. Slide removed for Privacy Reasons
  • 38. Slide removed for Privacy Reasons 38
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. ILL request form authentications by major Article Book Count Major Count Major 62 M- Psychology 90 M- History 60 M- Sociology 28 M- Non-Degree 42 M- Applied Clinical Psych 25 M- Pub Pol & Intl Affairs 35 M- Education 20 M- Spanish 31 M- History 18 M- English 30 M- Spanish 16 M- Undecided 29 M- Nursing 14 M- Art M- Communication 14 M- Education 19 Disorders 11 M- Sociology 19 M- Communication 10 M- Biology 14 M- Biotechnology 9 M- Music 14 M- Counseling 9 M- Special Programs 14 M- English 8 M- Psychology 12 M- Non-Degree 7 M- Biotechnology 10 M- Community/Sch Health 7 M- Political Science 7 M- Biology 6 M- Anthropology 7 M- Political Science 6 M- Music - Jazz Studies 6 M- Undecided 4 M- Business 5 M- Comm Media Studies 4 M- Communication 5 M- Reading 4 M- Nursing 4 M- Business 48
  • 49. 49
  • 50. Reporting and Standards • Reporting – Emailed periodically - e.g., daily dossiers, and other event triggered reports. – On demand, via email, web pages or a printer. • Standards – Share data for comparative research. – Groups of libraries and consortia 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. Further Reading •Coombs, Karen A. (2005). Lessons learned from analyzing library database usage data. Library Hi Tech, 23:4, 598. • Diana, Birkin James. dashboard_beta. http://library.brown.edu/dashboard/info/ • Metridoc. http://code.google.com/p/metridoc/ • Morton-Owens, Emily (2011) Trends at a glance. LITA 2011. http://connect.ala.org/files/79651/trends_at_a_glance_dashb oards_pdf_12068.pdf
  • 55. Questions? Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu 55