UKSG Conference 2016 Breakout Session - Who’s reading your valuable content and did they really pay for it?, Keith Abbot, Charles White and Andrew Pitts
This session will be presented jointly by Publisher Solutions International Ltd, Wiley and SAGE Publications, sharing experiences of the work done to combat Subscription Fraud, IP Address Abuse, and Bribery and Corruption in academic publishing. The presenters will be exploring the challenges faced by publishers and the steps taken to monitor and clean up growing and ever changing volumes of data.
Semelhante a UKSG Conference 2016 Breakout Session - Who’s reading your valuable content and did they really pay for it?, Keith Abbot, Charles White and Andrew Pitts
Semelhante a UKSG Conference 2016 Breakout Session - Who’s reading your valuable content and did they really pay for it?, Keith Abbot, Charles White and Andrew Pitts (20)
UKSG Conference 2016 Breakout Session - Who’s reading your valuable content and did they really pay for it?, Keith Abbot, Charles White and Andrew Pitts
2. Publisher Solutions International
• Established in 2005
• Initial focus on the identification, case
development, and remediation efforts
relating to subscription abuse.
• Specifically created to serve as an
independent third party enabling STM
publishing industry to benefit from the
aggregation and analysis of confidential data
without competitive or anti-trust concerns.
3. Transition to IP Address Verification Work
• PSI customers asked us to expand fraud
identification work to include IP Address/
Site License business.
• As part of this effort, PSI and Wiley created JV to
conduct global clean-up of IP address data.
• Proprietary database of >50k institutions & >1
billion IPs
• Data from 150+ publishers
• US data (last significant territory) to be
completed by March 2016
4.
5. Key Takeaway
• State of IP Address data and management
of same within the STM industry is poor.
• ca. 58% of IP Address data requires further
investigation
– e.g.
Territory Lines of Data Red Amber Green
France 63,071 4% 58% 38%
Germany 58,145 5% 43% 52%
China 149,435 1% 64% 35%
Avg/Total 270,651 3% 58% 39%
6. Takeaways from completion
of IP Address Clean Up
• Publisher and even library data is universally poor.
• Poor IP Address data extends far beyond initial
expectation that problems would be primarily attributed to
fraud.
• Neither publishers nor libraries are equipped to address the
problem and maintain long-term solution.
• Resource requirements for publishers and libraries alike is
overwhelming for current systems, processes, and budgets –
even at existing levels of inaccuracy.
• Keeping IP Address data clean provides no competitive
advantage – but not doing so presents significant risk on
many levels.
7. Associated Risks/Problems
• Easy to insert false IP addresses into systems
with no inherent checks
• Wrong IP addresses on accounts result in false usage
reporting
• Incorrect usage reporting carries significant implications for
pricing and widely used marketing metrics across industry
• Fraud can go undetected for years
• IP data errors create “openings” for illegal
proxy/downloading
• Open Access publishers have little or no idea where usage
is coming from
• Data gets dirty as fast as it is cleaned
8. Institution A
Publisher 1
CURRENT STATE
UNVETTED IP ADDRESS CHANGES/ADDITIONS
(Largely Manual Data Entry)
Publisher 2
Publisher 3
Publisher 4
Institution B
Institution C
Institutions Changes Publishers Unvetted Changes
70K 1 5.5K 3.85M Annual
70K 5 5.5K 1.93B Annual
70K 10 5.5K 3.85B Annual
10. Brief Introduction
• Keith Abbott, 25 years in industry from a journals
fulfilment background
• Current emphasis is on content licensing and underlying
data supporting access to content
• Team of two people checking licenses and IP address data
• My focus is on IP address data issues confronting industry
• Working with PSI for eleven years to audit IP addresses
10
12. Is online access any better?
• And they all have the same IP
address range
• 134.245.*.*
12
• University of Kiel
• GEOMAR
• IPN
• ZBW (Kiel)
• ZBW (Hamburg)
• Christian Albrechts Universität zu Kiel
• UKSH
• Helmholtz-Zentrum für Ozeanforschung Kiel
• German National Library of Economics
• University Hospital Schleswig Holstein (Kiel)
• Institut für die Pädagogik der
Naturwissenschaften und Mathematik an der
Universität Kiel
• HWWA
13. Getting Better – we have got it down to six!
• University of Kiel
• University Hospital Schleswig Holstein (Kiel)
• GEOMAR
• IPN
• German National Library of Economics (Hamburg)
• German National Library of Economics (Kiel)
13
• But they are all still sharing the
same IP address
• 134.245.*.*
14. IP addresses must be split out per
location
University Hospital Schleswig Holstein (Kiel)
134.245.121-255.*
German National Library of
Economics (Kiel)
134.245.101-110.*
GEOMAR
134.245.1-50.*
IPN
134.245.51-60.*
German National Library of
Economics (Hamburg)
134.245.110-120.*
University of Kiel
134.245.61-100.*
14
15. What can we learn from this example?
• Data is complex and confusing with multiple names
acronyms and English/native language variants
• IP addresses in addition to database accounts must be accurately
segmented
• Failure to maintain correct IP address information could lead to access
being inappropriately shared or customers losing access
• A publisher must check their underlying data matches their license
agreements
• Bad IP address data will lead to incorrect usage statistics
15
16. Introduction
• Charlie White, Senior Customer Service Advisor.
• Working on a day to day basis with Institutions, Individuals and
Agents
• SAGE has been working with PSI on both Print and IP Fraud
investigations for the past 7 years.
• I will be focusing on IP Fraud
17. What is IP Fraud?
• Fraud definition - wrongful or criminal deception intended
to result in financial or personal gain.
• How is it achieved in Publishing? It starts with data.
• Publisher contacted by agent with a list of IP ranges for a mutual
customer.
• Publisher trusts the IP ranges are correct and uploads onto their
system.
• Hidden in the customer’s genuine IPs is a range owned by the
agent.
• Publisher has unknowingly opened all the customer’s content to
the agent.
• Back to our definition. What goes the agent gain?
18. Case Study
• A large Thai Agent “Agent X”was an subscription agent based
in Thailand.
• Agent investigated by PSI initially for Print Fraud leading to
publishers stopping all business with the agent.
• Agent X attempt to get around the ban.
• Despite many negotiations, Agent X fail to settle and their
accounts are put on hold for good.
• PSI approached by a ‘whistleblower’ with information
concerning the agent’s business practices.
• The agent also involved in IP Fraud.
21. What can we do to prevent this?
• IP Audits.
• Stop ‘rogue’ Subscription Agents from placing orders
with us.
• A greater understanding within the Industry as a whole
of IP abuse and the importance of keeping accurate and
up-to-date information.
22. Moving Forward
• Industry needs a practical, economically viable,
and effective solution for managing IP Address
data and enabling publishers to gain a better
understanding of who their customers are:
– Institutions accessing data
– Potential customers visiting publishers
– Authors contributing to publishers
23. On-line IP Register
Unrestricted
internet users
Registered
Publisher or Agent
Publisher
or Agent
Registered
Institution
Institution
Basic lookup
Request to be
added to DB
Register
themselves
Detailed lookup of
their data
Request change to
their data
Detailed lookup of
any data
Request to register
themselves
Request to add
an institution
Request change to
any data
All requests
24. LONGTERM SOLUTION:
CENTRALIZED IP ADDRESS REGISTRY
• Create a global IP address database for all
Publishers to use and establish long-term
industry standard
• Clean up all publisher authentication databases
• Verify all new IP additions and changes
• Check Publisher Log Files against IP database for
abuse detection and usage anomalies
• Enlist support from library community to keep the IP
address database current and accurate
25. Institution
e.g.
University of
Oxford
New IP address
Delete IP address
PSI Verify IP
Publisher 1
Publisher 2
Publisher 3
Publisher 4
API/unique IP
PSI “Cube”
IP Registry
PSI-PROACTIVE/CENTRALIZED
VETTED IP ADDRESS VALIDATION
(Largely Automated Process)
Institutions Changes Publishers Transactions
70K 1 5.5K 70K
70K 5 5.5K 350K
70K 10 5.5K 700K
26. Any Questions?
Andrew Pitts: andrew@publishersolutionsint.com
Charlie White: charlie.white@sagepub.co.uk
Keith Abbott: kabbott@wiley.com
Notas do Editor
Give a shout out to Ryan here (Ryan Jones attending Tuesday). Mention that Charlie will cover fraud aspects.
Messy, difficult to control, things get lost
Acronyms, German names, two different cities, university and a hospital, all share the same IP address range, do we have a license for this? Good luck to CS Rep in Bangalore.
Institutional names cleaned, but they are still sharing the same IP address
In point 3 ask if anyone has lost access to publisher content? Not good having a signed license or content if data quality behind it is bad. Originally we had 12institutions, access counted 12 times, then 6, counted 6 times. Economics titles counted at a hospital and medical titles counted at an economics library. Librarians should not assume publishers understand their data.