This presentation was provided by Joe Zucca of the University of Pennsylvania, during Session Five of the NISO event "Assessment Practices and Metrics for the 21st Century," held on November 22, 2019.
1. Assessment Practices & Metrics for the 21st Century
A 2019 NISO Training Series
Session V. Technology & Services
November 22, 2019
Joe Zucca
Associate University Librarian
for Technology Services
University of Pennsylvania Libraries
2. Digital Transformation, Its Impact on Assessment
The Need for Infrastructure(s)
Data Governance
Technology’s Contribution to Organizational Learning
4 units for discussion
3. Digital Transformation, Its Impact on Assessment
Digital transformation involves the use of technologies to create new
or modified business processes with the goal of improving user
experience and optimizing organizational resources.
1 of 26
8. Millions of dollars of information in a host of formats
Access to massive quantities of print & e-content
Discovery systems
Supply chains [local circulation & ILL]
Cultural preservation and curation
Information and digital literacy programs
Digital scholarship [data wrangling, storage, IT selection, statistics, etc]
Courseware service
Facilities for learning/study/creation
Publishing assistance for scholars
Open access
Educational technologies (e.g., desktops/laptops, software provisioning)
Digital conversion services (e.g., Scan-on-Demand, 3D printing/scanning)
International and local Knowledge Bases (e.g. OCLC)
Data analysis for administrators (e.g.,bibliometrics)
Content and preservation repositories
Research compliance & dissemination aids (e.g., Elements, DMPtool, ORCID)
Software design and deployment
Enterprise-level applications (e.g., The OPAC, EZProxy, VIVO)
Collaborative programs with peers and vendors
Emerging technology (e.g. VR & AR)
….
Products and services
that libraries
offer or directly support
6 of 26
16. Is Assessment sustainable?
Expanding number & complexity of sources
Segregation of elements (lack of integration)
Data resolution
Governance: Non-standard, denormalized, fragile data models,
and policies for managing mountains of complex data.
Insufficient control of data
13 of 26
17. 15 of 26
Need for
statistics
Lack of
consistency
Irrelevant
stats
Difficulties in
data extract
Appoint a
committee
Urgency of
action
Wait for the
report
19. EzProxy log
ILS
Apache logCounter
SOLr log
Elements VIVO ScopusIlliadAeon Ares
Relais
D2D
WorkDay
Banner
Link Resolver
Fedora
ERP $
Canvas
OutlookLib
Guides
Lib
Cal
Lib
Answers
BePress
Fresh
Service
Pingdom
N.A.S.
LeanLibrary
Google
ANALYTICS
SUMA
CASB
DATA ECOSYSTEMS
MetriDoc
20. NOT an analytics module (that’s just another silo)
Supports data set integration for enriched resolution
Reduces overhead / Creates efficient workflow
Characteristics of Infrastructure
Safeguards privacy
Promotes scalability/sustainability
Creates service opportunities
17 of 26
22. ISO 16439-2014
NISO Z39.7
I2 edu person
IPEDS Classification of Instructional Programs
Coupled analytics modules
Policy-based Infrastructure
The community can positively influence data standards and data access
19 of 26
23. Production Systems
[e.g., ILS]
Event Data Analytics Layer
Metadata
Extract Transform Load
ETL Repository
Stored jobs and analytics
Assessment Platform
Analytics Service
Technical Infrastructure
[e.g., MetriDoc]
20 of 26
25. Data Governance
Managing the availability, quality, usability, consistency, and
security of an organization's data
21 of 26
26. Data flow relevant to analytics and thus governance
10101001101010
01010101001101
11010010001110
10010110101010
101001
101010
01011
11011
Harvest Raw Refine Integrate Deploy Reuse
Supplement
define &
attribute
Who, what,
where, when
Resource
management Regulation
Managing the availability, quality, usability, consistency, and security of an organization's data.
22 of 26
Events not
aggregations
Governance
conditions
27. Managing the availability, quality, usability, consistency, and security of an organization's data.
Technical and Policy Dimensions
Pipelines & systems architecture
Quality assurance
Data models / schema
Business logic
Standardized data definition
Encryption, auth/security
Roles & permissions
Relevance of sources
Statistical models
Retention
DMP & PII Protocols
23 of 26
29. “…the key requirement for institutional
success is to move from scalable
efficiency to scalable learning.
Said differently, the rate of learning, innovation, and
performance improvement within the institution must
match (or exceed) that of the surrounding environment
if the institution is to survive (or thrive).”
-- John Hagel III and John Seely Brown
The New Organization Model: Learning at Scale
HBR Blog Network
Scalable Learning
25 of 26
30. Heighten the observational power of our
organizations.
Reveal how users work, what they need, and
expect, and what’s expected of them.
Help negotiate the morass of strategic choices.
Improve our response to change.
How can technology assist?
26 of 26
31. Feedback is Welcome. Thank You
Assessment Practices & Metrics for the 21st Century
A 2019 NISO Training Series
Session V. Technology & Services
November 22, 2019
Joe Zucca
Associate University Librarian
for Technology Services
University of Pennsylvania Libraries
Notas do Editor
INTRODUCTION
This is a talk about assessment practices from the point of view of Technology and Services. It’s comprised of 4 units: 1) the impact of digital transformation on assessment, 2) the role of infrastructure in enabling sustainable, actionable assessment programs, and 3) the place of Data Governance as a founding, load-bearing condition for building assessment infrastructure. And lastly, some thoughts on IT’s contribution to organizational learning, which is the goal and hopefully the outcome of the first three topics.
WHAT IS DIGITAL TRANSFORMATION
I’M STARTING HERE BECAUSE SERVICE and SERVICE QUALITY ARE INCREASINGLY COUPLED WITH DIGITAL TRANSFORMATION, OR THE DIGITAL EXPERIENCE, AND THIS TRANSFORMATION CAN BE DISRUPTIVE OR AT LEAST COMPLICATING FOR ASSESSMENT.
Digital transformation involves the use of technologies to create new or modified business processes, with the goal of improving user experience and optimizing organizational resources.
I thought is would be useful to start with some concrete examples of data wrangling for assessment purposes and work up to some general conclusions about the effects of the digital experience.
TREND OVERHEAD 1
This graph shows the trajectory of the Laptop lending service at Penn, which we inaugurated in the early 2000s, first in the main library (called Van Pelt) and by 2010 in several other locations. The downward slope is clear, a reflection of the rising availability of portable technology. Then, in 2017 something dramatic occurred. (click). In that year we installed a self-service, laptop kiosk in Van Pelt and saw a sudden surge in laptop interest. Curiously that surge appeared system-wide, but with Van Pelt accounting for an unusual percentage of the growth. Absent other environmental factors, the kiosk variable – a digital transformation-- quite arguably accounted for the uptick in service that continues to spread.
So the graph captures a changing signal in user behavior. While they point to more interesting questions, the data here are pretty limited in their explanatory power, i.e., their resolution.
LONG TAIL OVERHEAD 2
In addition to tracking circulation, we also monitor the use of applications. This graph shows clearly the pervasiveness of web use and Acrobat among laptop service events. That’s a beguiling finding. How are these devices used in web browsing? Are students accessing Penn’s courseware site (managed by the libraries), or our electronic resources, or just casually browsing videos? It would be tremendously helpful to parse that 80% of activity. Unfortunately, we lack the power –at the present time– to the peer any deeper into this measure and thereby increase the resolution of our data about laptop use.
IMAGINED TREND OVERHEAD 3
For the moment, we also lack the ability to analyze WHO uses what with a loaned laptop, and how that might relate to other desktop appliances. Until we have more robust data capture and processing in place later this year, I can only conjecture what a graph might look like that follows a demographic model. Again, the resolution of our data is at present disappointingly fuzzy.
DATA INTEGRATION:
Even these simple statistical insights carry a high cost in terms of workflow complexity and staff time. They are expensive and low-yield. And that’s a central challenge to assessment. How might we achieve a higher, sharper resolution when even the fairly simple metrics we have --rate of circulation and application interest– are expensive to calculate?
The first data element came from Alma the second from our central software controller called keyserver. Each of these systems required a different analyst to manually harvest the relevant data, build them into spreadsheets, make the necessary pivots, and attempt an aggregated picture. And for all of that, the information quotient, as we have seen, is not terribly rich.
Demographic dimensions would improve the picture, and for that we’d need to call in more staff and systems specialization to harvest metadata from Penn’s identity management system and join it with transaction measures. And were we to add a vector about budget –say, to calculate cost per session-- we’d need yet another system and business analyst to fill out the picture.
But, the integration of all this data would surely make for high resolution and actionable analysis of a significant portion of computing activity. I say significant because our 1,100 desktop units clock-in more than 2 million hours of service annually. That’s 1,800 hours per unit.
Throughout this presentation, I’ll return to certain questions related to the sustainability of analytics, questions about how to improve the ease, the rapidity, the integration, the cost, and the relevance of data collection, particularly as our services become more and more deeply tied to technology.
SCALE OF POTENTIAL INTEGRATIONS
And not only technology. In thinking about sustained and scalable analytics, it’s especially enlightening to remember the breadth and variety of products that issue from a typical research library. This raises the bar substantially. How do we render multivariate analysis across a very wide range of services and interesting data targets? Here’s a non-exhaustive list of library “products” from my own institution, starting with the millions of dollars in information procurement. Access to massive amounts of print & e-content...and so on. It’s a daunting challenge to plug many of these services into the lens of the data model on the preceding slide.
APPLYING ASSESSMENT CYCLES (1) TO THE SCALE OF SERVICES AND POTENTIAL DATA TARGETS
Data is a high-value product of these services. Once it may have been the discarded, detretis of systems, but in the age of big data and deep learning, it’s asset value is indisputable.
ASSESSMENT CYCLE 2
The product of data flowing from services is assessment and [Click]
ASSESSMENT CYCLE 3
And the product of assessment is organizational intelligence … [Click]
ASSESSMENT CYCLE 3
Intelligence that should shape and inform the quality of library services and the experience of our users. If we’re to use data for the betterment of user experience, for achieving cost-efficiency, for improving outcomes, particularly service quality, we’ll need to complete this cycle against a large number of that trove of services in our list.
SO HOW DO WE COMPLETE THE ASSESSMENT CYCLE WITH CURRENT DATA RESOURCES? WHAT”S KEY ARE THE DATA AND THE DATA ARE ELUSIVE.
Early in the library’s digital transformation, the sources of statistical interest were few, essentially, the ILS, and if we could break them down efficiently, our web server logs and the data provided through project COUNTER. [click] With fleeting success, attempts were made to mine data from EzProxy for a wider view of electronic use. [CLICK] But today, the number of data silos that contain useful (and practically inaccessible) business intelligence is rapidly growing. Adding to the complexity of this picture is the emergence of cloud mixed with on-premise systems adding to the isolation of data and analytics modules.
SUMMARY OF DATA TRANSFORMATION: CHALLENGE AND OPPORTUNITY
So, while we can appreciate the cycle connecting service to data to assessment to business intelligence, our ability to complete that cycle in scalable, sophisticated ways is hampered. Every service activity, whether it involves physical or digital commodities, is associated with an application, stored data, servers, and networks –many beyond our control– rich with the evidence of digital transformation and, like valuable ore, expensive, laborious, and time- consuming to liberate.
TO SCALE AND SUSTAIN THE ASSESSMENT CYCLE WE MUST ADDRESS THESE CHALLENGES
Infrastructure is the antidote to these challenges. Whether we focus on the concept of data lakes or data warehousing platforms, the course to successful assessment is technological and that technology will require investment comparable to the more familiar business systems, like the ILS, operating in our libraries. Click
THE CHALLENGE IS AN ANCIENT ONE (witness this 70-year old story at Penn…)
Just a cautionary note before shifting gears to infrastructure. In case we think this is peculiarly a problem of the digital age, here’s a reminder of how long-standing the library’s struggle has been with measurement. The modalities of our problem change but the underlying challenges have an eerie persistence.
WHICH TAKES US TO INFRASTRUCTURE: KEY TO FUTURE SOLUTIONS
So I’ll shift gears at this point and begin to point to potential opportunities for firming up the ground under assessment practice.
RETURN TO THE PROFUSION OF SILOS (MANY WITH ANALYTICS ENGINES) GENERALIZED, AND STANDARDS-BASED DATA ECOSYSTEM THAT’S COMMENSURATE WITH THE DIGITAL TRANSFORMATION GOING ON AROUND US.
Essentially, we’ll have to move from a profusion of purpose-built silos to a generalized, and standards-based data ecosystem that’s commensurate with the digital transformation going on around us. Penn has been making efforts in this direction with a platform we call MetriDoc. With funding from the IMLS and the Ivy Plus Libraries Confederation, Penn has used MetriDoc as a laboratory for exploring data governance issues and a variety of workflow challenges. And we’re presently building applications that support analytics for the Ivy’s confederacy and our local assessment needs.
THE CHARACTERISTICS OF INFRASTRUCTURE
ROLES FOR THE COMMUNITY IN BUILDING INFRASTRUCTURE: POLICY AND TECHNOLOGY. Not all parts of the community are able to tackle the software and systems challenge, but all in the community of practice, the SME’s or assessment, need to help direct the course of standards and the analytical tools that vendors want to bolt onto our systems and declare victory in analytics.
AREAS OF COMMUNITY ENGAGEMENT
Standards of practice, and standards applicable to metadata such as demographic and institutional attributes
ASSESSMENT LABORATORIES—One instance, work on Extract/Transform/Load (ETL) platform.
INFRASTRUCTURE AND DATA GOVERNANCE ARE TIGHTLY COUPLED
Data governance and infrastructure are intimately linked concepts. Data governance provides a foundation for the successful implementation and management of the assessment’s infrastructure and indeed the effectiveness of the assessment program at least as it’s related to the vast skein of activity data that’s needed to comprehend the digital transformation of service. Within the context of organizations like libraries, data governance is generally construed as activities designed to manage the availability, quality, usability, consistency, and security of organizational data – and it’s best add Privacy to the notion of security. How does this play out in practice, for example in the setting of the MetriDoc initiative at Penn?
BACK TO THE METRIDOC MODEL: POINTS OF GOVERNANCE ACTIVITY
A good way to view data governance is in terms of the flow or cycles of data that are relevant to analytics. From the point of view of MetriDoc that flow starts with the harvesting of data. Here a governing principle is to collect data in its raw possible form. The very act of aggregating data limits its explanatory potential. Collecting data as close to an event as possible ensures the availability of a variety of pivot points; that pivoting may require supplementary information that’s part of the Refine step pictured here and on intentional standardization of data definitions and attribute. If the raw data feed contains two dozen variables for describing a faculty user in an event or that user’s departmental affiliation, you’ll either overspend in normalizing in refinement or loose the benefit of refinement altogether. Governance is also key to managing system resources like storage, memory, and software performance factors. Governance plays a critical role in determining who can use data, for what purposes, in what settings and at what times. And it is necessary for regulating the future use and stewardship of the products of analysis. As data flows through infrastructure, decisions will need to be made ensure effective management of the resource.
IN SUMMARY: TWO DIMENSIONS OF DATA GOVERNANCE
So, in summary, data governance has both technical and policy dimensions that have to be worked out in any programmatically designed data environment for assessment. On the technical side they include: list… And on the policy side: list…
CLOSING THOUGHT
WHAT’S THE MISSION OF ASSESSMENT IF IT’S NOT ORGANIZATIONA LEARNING?
AS IT”S BEEN OBSERVED by JOHN HAGEL AND JOHN SEELY BROWN….
IF SUCCESS IS CONTINGENT ON LEARNING, Technology CAN HAVE AN IMPACT BY..