SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
A Database of Riches:
Measuring the Options for Google’s Book
         Settlement Roll Out




  Michael Cairns – Managing Partner, Information Media Partners

             michael.cairns@infomediapartners.com

                       Tel: 908 938 4889
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 2 of 19

Author:

            Michael Cairns has been a publishing executive and consultant for over 25
            years. As President, R.R. Bowker he led the team that transitioned the
            company from a print-based organization to one reliant on web subscription
            products, and also successfully broadened the company’s revenue base. During
            his tenure at Bowker, he managed the sale of Bowker from Reed Elsevier and,
once that transaction was completed, he executed a strategic plan resulting in the
acquisition and integration of five companies in three years. As a consultant, he has
managed projects for many large media companies including Thomson Learning (Cengage),
Simon & Schuster, Reed Elsevier, The Interpublic Group of Companies, Ogilvy & Mather,
Hearst, Gruner + Jahr, Online Computer Library Center (OCLC), AARP and others. In
addition, Michael has held executive positions at PricewaterhouseCoopers, Berlitz
International, Inc., Macmillan, Inc, and MyWire.com.

In his current role at Information Media Partners, Michael consults with a wide spectrum of
publishing and media companies helping them define market opportunities, develop
business strategies, identify acquisition opportunities and manage through crisis. Potential
clients are encouraged to contact Michael for more information (tel: 908 938 4889).

Notes on this Report:

In the summer of 2009, I started to wonder at the potential market opportunity that the
Google Book Settlement could represent. Fellow industry consultant Mike Shatzkin and I
began to discuss the agreement and I agreed to pull together a spreadsheet that could
represent an ‘order of magnitude’ estimate of the market opportunity. This report does not
rely on any direct interviews with Google nor representatives of the Book Rights Registry
(BRR) and, as such, it only represents a structured approach to analyzing the opportunity.
Nor is this report a definitive declaration of pricing, market penetration or approach in the
manner in which this market opportunity may be leveraged.

In addition to this report on market opportunity, I also constructed an estimate of the
potential size of the orphan works population. This material has been available for some
time on my blog (personanondata) and in several presentations I have made. I have
included this analysis as an attachment to this report. (Other than a few minor punctuation
edits, there have been no changes to my original).

Several people helped in the review of this document and, for their time and effort, I am
especially grateful. A special thanks to Mike Shatzkin of The Idea Logical Company who
originally prompted me to look at the market potential of the Google Book Settlement and
helped me organize my thoughts.

Both OCLC’s WorldCat and Bowker’s Books In Print were invaluable in developing some of
the conclusions formulated in this document. Specific citations are noted where applicable.

Readers of this report may be interested in discussing the findings with me directly and in
more detail. Please contact me to arrange a time: michael.cairns@infomediapartners.com
or 908 938 4889. Find me on LinkedIn, Twitter and Scribd.




Copyright: Michael Cairns – Replication and Distribution By Permission                         2
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 3 of 19

Introduction:

Almost five years ago, Google embarked on the most ambitious library development project
ever conceived: To create a “Noah’s Ark” of every book ever published and to start by
digitizing books held by a rarefied group of five major academic libraries. The immediate
response from US publishers was muted, until the implications of the project became clear:
That Google proposed no boundaries to the digitization effort and initiated the scanning of
books both in and out of copyright and in and out of print. Adding to publisher’s concerns,
Google planned to display “snippets” (small selections) of the book’s content in search
results. Despite some hurried conversations among publishers, author groups and Google,
Google remained convinced that what they were doing represented a social ‘good’ and the
partial display of the scanned books was legally within the boundaries of fair use.

From the publisher perspective, this was a make-or-break moment, and the implications
were more acutely felt by trade publishers who saw the potential for their business models
to be obliterated by easy and ready access to high-quality content via a Google search over
which they would exert little or no control. Even worse was the fear that rampant piracy of
content would also develop – a debated and contentious point - given the easy access to a
digitized version of a work that could be e-mailed or printed at will. The publishers
determined that if Google were to ‘get away with it’ without challenge, then anyone would
be able to digitize publisher content and possibly replicate what has been going on in the
music and motion picture industries for almost ten years. In mid-2005, prompted by a law
suit filed by The Authors Guild, the Association of American Publishers (AAP) led by four
primary publishers filed suit against Google in an effort to halt the scanning of in-copyright
materials. (The Authors Guild and AAP ultimately combined their filings).

The initial Google Book Settlement (GBS) agreement, given preliminary approval by a court
in October 2008, generated a vast amount of argument both in support of the agreement
and in challenges to it. A revised agreement was drafted after the Federal District Court of
Southern New York and Judge Chin agreed to delay the adjudication and final arguments
which were heard in late February 2010. To date, Judge Chin has not given a timetable nor
an indication of when and how he will decide the case.

From the perspective of the early leading library participants, Google’s arrival and promise
to digitize their purposefully conserved print collections looked like a miracle. Faced with
forced declines in the dollars spent on monographs and the ever-rising expense of
maintaining over 100 years of print archives, the Google digitization program provided a
possible solution to many problems. All libraries believe they hold a social covenant to
collect, maintain and preserve the most relevant materials of interest to their communities
but maintaining that covenant becomes a challenge in an environment of increasing
expenses while also enduring the challenges of migrating to an on-line world.1



1
 It is important to acknowledge that, initially, the GBS may have been seen as a solution to libraries’ conservation and preservation
needs; however, subsequently, libraries have determined that they need to develop their own preservation options in which The Hathi
Trust is a clear leader.


Copyright: Michael Cairns – Replication and Distribution By Permission                                                              3
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 4 of 19

The library world is typically segmented into public and academic institutions and while
these often varied ‘communities’ may differ in their philosophy towards, for example,
collection development or preservation, they do share some common practices. Most
importantly, all libraries are committed to resource sharing and while materials use has
historically and primarily been ‘local’ to the library, every institution wants to make its
collections available to virtually any patron and institution who requests them. In short,
these library collections were always ‘accessible’ to all regardless of geography or copyright:
First US Mail, FedEx, e-mail and then the Internet progressively made this sharing easier
but, until Google arrived with their digitization program, any sharing beyond the local
institution was via physical distribution2. In effect, it could be argued that the Google
scanning program simply makes an existing practice vastly more efficient.

Even though, the approval of the Google Book Settlement (GBS) hangs in the balance under
review by Judge Chin of the Federal District Court of Southern New York, an Executive
Director has been named to head the Book Rights Registry (BRR)3 and is preparing the
groundwork to establish the organization (BRR) in advance of approval. This report
represents an attempt to analyze the market size opportunity for Google as it seeks to
exploit the Google Book Settlement. Following are our summary findings which are
discussed in more detail in the ensuing pages of this report.

Summary Findings of the Report:

                Libraries will see tremendous advantages – both immediate and over time - from
                 the GBS, although concerns have been voiced (notably from Robert Darnton of
                 Harvard4)

                Google’s annual subscription revenue for licensing to libraries could approach
                 $260mm by year three of launch

                Over time, publishers (and content owners) will recognize the GBS service as an
                 effective way to reach the library community and are likely to add titles to the
                 service5

                Google will add services and may open the platform for other application
                 providers to enhance and broaden the user experience




2
 Resource sharing and improvements in the ‘logistics’ provided by OCLC (WorldCat) or via consortia such as OhioLink has made
physical distribution effective and comparatively efficient.

3
 The BRR is the management body tasked with administering the GBS and representing the interests of authors and publishers once
approval has been granted by the court.

4
    Robert Darnton, NY Review of Books

5
 The settlement doesn’t provide for adding content prior to 1/5/09; however, we are suggesting that, by mutual consent, additional
published content may be added as an expedient method of reaching the library market.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                               4
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 5 of 19

          The manner in which the GBS deals with orphan works will provide a roadmap for
           other communities of ‘orphans’ in photography, arts, and similar content and
           intellectual property




Copyright: Michael Cairns – Replication and Distribution By Permission                  5
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 6 of 19

Business Analysis:

By mid-2008, the lawsuit was background noise adding to the general malaise and
discomfort characterizing the media industry and the announcement that the parties had
agreed to settle their differences was initially greeted with support, relief and some surprise.
Yet, as the implications of the complex settlement agreement became clearer, a strong
(and, at times, strident) opposition developed to argue for substantial revisions to, or the
elimination of, key sections of the agreement. Importantly, this opposition also succeeded in
enjoining the Department of Justice (DoJ) to voice ‘strong opposition’ to segments of the
agreement. When combined with the concerns expressed by DoJ, the opposition to the
agreement was able to exact significant changes to the agreement’s terms. A ‘revised
agreement’ was presented to and is now pending approval by Judge Denny Chin of the
Federal District Court of Southern New York.

Among the principal arguments against approval of the original settlement agreement were
the following:

       •    Opponents argued Google would attain an insurmountable monopoly over in-
            copyright but out-of-print works

       •    The obligation to ‘opt-out’ of the agreement places an undue burden on the copyright
            holder (author)

       •    Foreign rights holders were under represented (or insufficiently consulted) and thus
            disadvantaged by the original agreement

       •    Monies collected on behalf of copyright holders but never disbursed would be paid
            into a ‘general expenses’ fund to benefit the Books Rights Registry6

       •    Some authors believed their moral rights to determine the use and replication of
            their works were circumvented.

       •    The agreement itself will in effect create copyright ‘legislation’ which should be the
            purview of Congress

The revision to the agreement has partially addressed these issues (excepting the last item)
but the settlement revision has not fully incorporated all of the challenges supported by the
settlement opposition and the Department of Justice.

Two aspects of the agreement which generated attention and hyperbole concerned the
number of “orphan works” and the revenue model Google would implement to market their
full-text database. Both of these issues are used by settlement opponents to justify the
agreement’s rejection by the Court. In each case, very little real analysis has been




6
    Changed in the second version of the settlement so that uncollected funds would eventually be distributed to designated charities.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                                   6
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 7 of 19

conducted to determine the true parameters of both the ‘orphan’ issue and the market
opportunity.

In August 2009, we published an estimate of the potential number of orphan works that
may exist. We are unaware of any other detailed analysis that attempts to quantify the
collection of titles which remain in copyright but whose copyright holder has not been
located. This analysis is included as an attachment to this document7. The following chart
summarizes the findings of potential orphan works:

                          Estimate of                                                           Percent of
                            Orphan                                                             Title Output:
                            Works                                                              1920 – 2000

                            580,388                           Base Case                               24%

                            824,553                      High/Aggressive                              34%



In summary, the orphan analysis estimated a potential orphan population of 580,388 based
on a review of pre-existing statistical information documenting the numbers of new titles
published in the US since 1920. While we estimated that ‘orphans’ would be more prevalent
among older titles, the total annual title output only exceeded 15,000 for the first time in
1960 (according to our source data); therefore, the universe of all titles published between
1920 and 1980 is actually relatively small. Publishing output only rapidly increased during
the late 1980s and it is assumed that the majority of these titles will not be ‘orphans’
because copyright information is readily available and confirmable. As noted, the full report
is included as an attachment to this report. We believe our analysis to be sound and the
results were supported by a different methodology based on data from OCLC’s WorldCat
database (as noted in the full report).

After estimating the total number of ‘orphans’ we also estimated the number of foreign
works that could potentially be included in the GBS. This analysis is more tenuous
statistically because we relied entirely on the OCLC WorldCat database8 and made several
key assumptions and extrapolations. Based on this conditional estimate, we determined
there could be approximately 1.2million titles from the ten largest languages published and
an additional 0.2million from all other languages.

Currently, the content potentially covered by the GBS represents over 12mm titles scanned.
Multiple versions of the same work are included in this total; however, even if all foreign
works are to be excluded from the database and authors and publishers voluntarily remove



7
 A related analysis that extrapolates the potential number of foreign language titles that may fall under the umbrella of the settlement
has also been completed but is not included in this document.

8
    This is not to assert that the WorldCat data is inaccurate in any way; rather, our assumptions should be considered ‘best-guess’.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                                  7
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 8 of 19

their titles from inclusion, the Google Book subscription product will remain a compelling
database for the academic and public library market as well as schools and certain
corporations. A significant change adopted in the amended settlement agreement has
narrowed the class to UK, Australian and Canadian published books in addition to those
registered with the US copyright office.9

The Google Books Database Subscription and Revenue Model

Opponents have suggested that Google will be in a position to exercise monopolistic pricing
and to ‘overcharge’ to extract maximum revenues from their customers. We agree that their
market position could be abused; however, we believe there is a counter-balance included
in the agreement that obviates this tendency. Google seeks maximum exposure for the
content - not only to support its stated mission of providing wide and broad access to this
‘hidden’ content, but also to support other business opportunities they may implement
(such as advertising programs). We believe Google will see overly aggressive pricing as an
inhibitor to wide market acceptance of the product. The Book Rights Registry will represent
the interests of authors and publishers who will argue for pricing that maximizes their
opportunity. Together, balancing wide access (Google’s position) with pricing considerations
will result in an optimal pricing matrix.

In developing our financial and market analysis, there are several key assumptions we have
relied upon10:

        •     Pricing will be variable based on type of institution

        •     This will be considered a ‘must have’ database product for all libraries

        •     The Google product will effectively “level the playing field” from small to large
              academic libraries for the types of books covered by the Settlement

        •     Google will continue to invest in the Book database product by adding content,
              functionality and applications/tools to aid usage over time and may raise pricing

        •     Penetration will not reach 100% for any segment, but is likely to grow over time

        •     Corporations will be important customers (e.g., science, aeronautics and
              engineering-based firms)




9
     As an upper limit, the number of ‘non-English’ language titles could be 50% of the total books scanned.

10
  Business models that include advertising are not assumed in this analysis. It may be possible that Google will use the scanned
content as content around which they can tailor advertising offers; however, the second amended version has narrowed the
application of varied business models and it is difficult to determine that any model other than a subscription-based service will be the
primary revenue generator to Google and the BRR. Over time, this may change but that circumstance is not anticipated in this
analysis.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                                 8
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 9 of 19

In the following analysis, we attempt to define the Google Books Database market
opportunity and estimate the potential annual revenues the company may be able to
generate each year from database subscriptions. Google currently markets several services
to publishers which include Google Scholar, Google Partner Program and Google Editions
(which will be launched in mid-2010). These current products and services are not included
or assumed in this analysis.

In estimating the market potential for the Google Settlement database product, we have
taken three primary components (or drivers) into account: Market segmentation,
penetration and pricing.

Market Segment

The agreement provides Google with the right to exploit certain markets including
academic, public and special libraries, corporate customers, print-on-demand (POD)11 and
direct-to-consumer sales. In our analysis, we have used American Library Association data
itemizing the type and number of libraries in the US and used “best guess” estimates of the
market opportunity represented by corporations and consumers. Most commentary to date
has focused on the library community, which is where this analysis is strongest in its
estimates and where we concentrate our discussion.

An important accommodation of the Settlement is the provision of free access to the
database product for all public libraries and certain “Carnegie” classed libraries. Each library
accepting this access will receive the equivalent of a single user sign-on that will allow
patrons and/or staff to access the Settlement database without restriction. While an
important accommodation for some libraries, for the majority of libraries this access will not
be appropriately functional and, thus, site-wide and unlimited user access provided under
the terms of the subscription product will remain the better option. We do not believe this
free access will materially impact the revenue opportunity for Google and have allowed for
this circumstance in our financial model.

In our opinion, academic libraries will consider a subscription to the Google Books database
as a competitive necessity. For the first time, any subscribing library within the United
States may gain direct access to the collections of some of the largest and most renowned
academic collections in North America12. In addition, this access will far surpass the inter-
library loan process of years past simply because the content is completely indexed.
Researchers will no longer have to ‘guess’ that a title may be relevant to their research
based on an index or table of contents and, moreover, they eliminate the risk that upon
requesting the title be delivered to them, they discover the content to be irrelevant.




11
 POD is a right that may be granted to Google in the future pending approval of the Book Rights Registry and the rightsholders they
will represent.

12
     The amended settlement has narrowed the class and effectively excludes non-English titles from the database.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                                9
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 10 of 19

Many academic library collections have been built over centuries and titles in their
collections are often unique, which is another compelling reason supporting the argument
that the Google database represents a singular opportunity for all academic institutions to
“narrow the gap” between their research capabilities and those of the country’s largest and
best endowed institutions. While some academic collections’ titles are available via inter-
library loan, many older, fragile and unique works are only available at the institution itself
by special request. The digitization of many (not all) of these works significantly broadens
access to and distribution of this content. Undoubtedly, researchers, educators and students
at all academic institutions will pressure their administrators and librarians to subscribe to
the product13.

The following chart represents our construct for the potential addressable market segments
for the Google book database14:

                                   Total Number of Academic Libraries                     3,617
                                   Total Public Libraries                                 9,198
                                   School Libraries                                      99,783
                                   Special Libraries                                      9,066
                                   Armed Forces                                             296
                                   Government                                             1,159


Market Penetration:

We estimate that sales penetration will vary considerably across the segments; however, for
the reasons presented earlier, we believe penetration into the academic library segment will
lead all other markets. Public libraries (particularly metropolitan library systems) will find
value in the database and, as a group, will represent the largest concentration of customers
overall. School libraries are unlikely to subscribe to the database in great numbers for
budgetary or relevance reasons and, moreover, students will be encouraged to gain access
to the product via their public library remote-access facilities.

We expect larger research public libraries (such as The New York Public Library) will be
treated as academic libraries for the sake of pricing. We also expect some corporations to
access the database product and, while pricing for these ‘for profit’ entities should be
comparatively high, the absolute number of customers in this segment will be small.

Pricing:

Database subscription pricing can be complicated and confusing. Models can be based on
population served, purchasing budgets and/or enrollment, and then be subject to



13
 It is likely that an extensive database of user behavior maybe generated by usage of this database. This is data that publishers (and
authors) may be interested in mining for product development and/or insights into consumer behavior.

14
     Source: American Library Association



Copyright: Michael Cairns – Replication and Distribution By Permission                                                                   10
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 11 of 19

multiplication factors such as number of simultaneous users, number of physical locations
and other factors. We don’t know which method Google will choose; however, in order to
keep our analysis as simple and transparent as possible, we have built our pricing model on
the basis of the following criteria:

       •     Unlimited users per location

       •     Branch public libraries priced at 25% of base fee per additional branch

       •     3% price increases per year

       •     Institution ‘classification’ based on ALA data

       •     Full ramp-up will occur over the first three years

Additionally, we expect Google will sell to the ‘highest’ administrative level possible15. For
example, the University System of Georgia manages licensing contracts under their Galileo
program for both public and academic libraries and, therefore, this agency would be the
customer rather than individual or local libraries. In New York, Google would license access
to the library authorities in each borough. In New York City (Manhattan), this would mean
the main library and roughly 50 satellite libraries would have unlimited access via one
contract and, based on our pricing matrix, the NYPL would pay approximately $340,000 per
year for access ($25,000 for the main and $6,250 per 50 locations)

For-profit organizations (corporations and businesses) will have a pricing matrix higher than
for non-profit libraries and institutions (generally standard practice). We would expect that
only a relatively small percentage of businesses would subscribe to the entire database and
we have segmented the target market into Fortune 500, 1,000 and all others. The corporate
customers most likely to subscribe would be those companies with large research needs
such as pharmaceutical, aeronautics, engineering and the like. Options to better address
this market may include shorter subscription terms, usage based on metering systems or
topic/subject specific packages.

Market Opportunity Summary:

We believe Google and the Book Rights Registry (a proxy for authors, authors’ heirs and
publishers) will be motivated to maximize access to the Google database in order to
maximize viewing of the content which will, in turn, result in optimal revenues for both. We
do not believe Google will implement a monopolistic approach to pricing and, in comparison
with smaller and more segmented databases, we believe the Google pricing will appear
reasonable considering the breadth and depth of content in the database.

Approach to the Market:


15
     Consortia pricing, while an important consideration, would represent a discount to the pricing matrix we present and would be
negotiated on a case-by-case basis. We have not made accommodations for Consortia pricing.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                               11
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 12 of 19

In our view, Google has several options for marketing and selling this database product:

   •   Google sells the product themselves with their own sales force

   •   Google designates one supplier for each segment

   •   Google allows all vendors to integrate the books database product into their existing
       database products and pays Google a defined fee per user.

In our view, it is unlikely that Google will establish their own sales force to sell into the
library and corporate marketplaces. While Google does have an ad sales force supporting its
SEM program(s), this activity is vastly different from building a sales team to call on
libraries and corporate clients. Additionally, given Google’s predilection for automation, the
hiring of a human sales team doesn’t seem culturally acceptable. Lastly, and possibly more
important, we believe licensing this product will become more a ‘renewal’ business as the
market matures (after 3-4yrs) which could require far less sales effort – or one significantly
different than that required in the first three years. We estimate a fully staffed Google sales
force could cost the company $15million annually but, in short, Google is unlikely to want
the headache.

Given the limitations of the above approach, we believe it is more likely Google will contract
with one or more of the established players and pay a standard sales commission to the
provider. In this model, Google will be able to set prices and targets and retain a degree of
control over both the provider of this sales effort and the market delivery (pricing) of the
product. Existing providers would bid on the right to sell this database on behalf of Google
and, because the product will be highly valued, the bidding would likely be highly
competitive. Likely providers to Google would include ProQuest, Gale/Cengage, OCLC or
EBSCO. It is also possible that an ‘outlier’ such as Ingram, Baker & Taylor or Hudson News
(LibreDigital) would also see representing this database as a significant opportunity. For an
established player, it is likely the provider would see increased sales in their current offering
– simply representing the Google Books database would open new market opportunities. For
an ‘outlier’, the Google Books product may represent an opportunity to enter the market
using the Google product as a foundation.

In our estimation, the above scenario is not only practical (not having to administer their
own sales force is a major advantage), but may also be cost effective. Given the ‘prize’ of
representing the Google database, we believe the average cost to Google maybe less than
10% of revenues. (“Renewal” sales may also be commissioned less than initial sales).

Working with a single provider thus represents an effective solution for Google but this
strategy may not also be efficient. In order to achieve greater efficiency in reaching their
target market while also eliminating possible “political” issues caused by selecting one
vendor over the others, the company may consider allowing any provider to sign a standard
distribution agreement with the company and sell and market the product into all markets.
This approach has several advantages:




Copyright: Michael Cairns – Replication and Distribution By Permission                         12
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 13 of 19

   •   Immediately leverages the competitive position of all major providers that otherwise
       may be mutually exclusive

   •   Gives a library subscriber a choice of provider and/or allows them to work with an
       existing ‘preferred’ vendor

   •   Potentially enables providers to integrate the Google product with their existing
       products thus providing rapid development initiatives and built-in content ‘handcuffs’
       supporting renewals

   •   Minimizes Google’s exposure to any supplier limitations and negative customer
       support issues

   •   Provides maximum exposure to all market segments virtually immediately

   •   As part of these agreements, Google may gain access to index all content supplied
       by their third-party sales partners

Approach to the Market Summary:

Based on this review of Google’s tactical options, we believe the company will enable
multiple (initially ‘preferred’) vendors to market and sell the product into the market.
Google will establish pricing and the vendors will be required to pay Google based on this
set price schedule (less vendor commission). Under this model, any vendor will be free to
charge the end-customer less than the ‘set price’; however, the vendor would still pay
Google based on the higher ‘full’ price. (Selling below the set price could occur due to
bundling different products provided by the vendor).

Forecasted Revenue Expectations:

Based on our assumptions documented above, we believe the revenue Google may
generate from the Google Books database product could approach $260million per year. Our
revenue model was based on the following set of assumptions:

   •   Base pricing by segment

   •   Price discounts based on size of library holdings or population served

   •   Penetration levels based on library size

   •   Revenue represents full implementation, which we expect by year three




Copyright: Michael Cairns – Replication and Distribution By Permission                       13
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 14 of 19

The following chart documents our estimates:

                                          Total                 Avg.                                          Revenue
                Segment                                                             Avg. Pricing
                                         Market              Penetration                                       ($MM)
              Academics                   3,617                    65%                  $55,000                     $130.1
              Publics                    9,198                     47%                  $21,000                     $112.8
              School                     99,783                    0.5%                 $10,000                       $4.9
              Special                     9,066                    0.5%                 $25,000                         $1.1
              Armed
                                            296                     5%                  $11,000                         $0.1
              Forces
              Government                  1,159                    25%                  $11,000                          3.1
              Corporate                 100,000                     2%                  $37.500                         $7.5
              Total                                                                                                 $260.0


As noted, we believe it will take Google three years to ramp up this full implementation
revenue (we do not see this as a limitation on Google’s part, rather, a typical expectation
for a new-product roll out). At the above levels, we believe pricing is not only reasonable
and affordable, but compares favorably with existing database publishers’ pricing. There are
few, if any, other publishers who have products which serve as many (all) segments as the
Google Book database.

At this revenue level, each of the 12mm titles in the Google database has a nominal value
of $22 (per year) to Google. More importantly, the per-unit price paid by each library will be
less than $0.05 (five cents). On a pure cost-avoidance basis, licensing the Google Books
database appears good value given current costs. If the costs of handing, cataloging, special
requests (such as interlibrary loans) and storage are added to the base wholesale price of
any title, the title’s full ‘carrying costs’ can double. Some studies have indicated that
fulfilling an interlibrary loan request can cost $25 for each segment from the library to
requestor and back. This cost far exceeds the original (or, in many instances, the
replacement) cost of the title16.

While we believe this database to be an important acquisition for most academic and many
public libraries, we do expect that Google will need to sell this product aggressively in the
early years to achieve the penetration levels we anticipate. There are several reasons for
this: Firstly, the content of the database is largely unknown and, while representative of
many important library collections, Google will need to market this collection as important
and complementary to the library customers in question. Secondly, the sheer size of the
database could be an inhibiting (or intimidating) factor and therefore the navigation,


16
  Users may print all or portions of the titles they select – although the ability (functionality) to do this may be a subsequent grant
provided by the BRR to Google – and there is a cost to these activities;; however, we maintain the utility of the database and the ability
of the user to be precise in their printing requests will thus produce only a marginal negative cost (if any) relative the costs of
avoidance that is endemic to the current solution.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                                  14
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 15 of 19

bibliographic data quality and the delivery of subject ‘collections’ will be important customer
acquisition and retention areas for the company to focus on.

In summary, we believe Google will be able to successfully launch their Book Database
product into the market with fair and reasonable pricing that will encourage a broad base of
target customers to subscribe.

Future Market Growth Opportunities:

While launch of this product is a focus of attention, we do believe the company has
numerous opportunities to expand the product over time. We do not expect the Google
Books database product to ‘stand still’; rather, we believe this product could become the
primary access point for textural (monograph) materials into the library market.

Future market opportunities17:

       •     The addition of other content: Publishers may see this product as a viable library
             market entrance point for all their book content

       •     Provision of usage data to publishers (and others) for business and product
             development needs

       •     Pricing increases over time and penetration will increase

       •     Inclusion of international/non-US market content – English language

       •     Inclusion of international/non-US market content – Non-English language

       •     Access to international markets

       •     Addition of more in-copyright materials closer to current pub dates; perhaps
             becomes a major distribution mechanism for book content

       •     Topic/segmented collections

       •     Potential to open the database for third party application development




17
     We expect these opportunities to ‘evolve’ over time based on discussion, negotiation and mutual agreement of the parties.



Copyright: Michael Cairns – Replication and Distribution By Permission                                                           15
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 16 of 19

Summary:

This analysis argues that the Google Books Database product will be seen as a ‘must have’
product for a large proportion of academic and public libraries and is, thus, valuable on its
merits. Google will price this product at levels both lower than existing database providers
and at levels that are ‘economically viable’ given cost avoidance justifications. The company
retains flexibility in how they will approach selling and marketing the product; however, we
believe they will contract these services. Lastly, we believe there is potential upside to the
revenue model based on adding new markets and expanding content.




Copyright: Michael Cairns – Replication and Distribution By Permission                      16
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 17 of 19

Addendum A – Orphan Works Analysis

580,388 Orphans (Give or Take)

Clearly one of the most (if not the most) contentious issue regarding the Google Book
Settlement (GBS) centers on the nebulous community of “orphans and orphan titles”. And
yet, through the entirety of the discussion since the Google Book Settlement agreement was
announced, no one has attempted to define how many orphans there really are. Allow me:
580,388. How do I know? Well, I admit, I do my share of guess work to get to this
estimate, but I believe my analysis is based on key facts from which I have extrapolated a
conclusion. Interestingly, I completed this analysis starting from two very different points
and the first results were separated from the second by only 3,000 works (before I made
some minor adjustments).

Before I delve into my analysis, it might be useful to make some observations about the
current discussion on the number of orphans. First, when commentators discuss this issue,
they refer to the ‘millions’ of orphan titles. This is both deliberate obfuscation and lazy
reporting: Most notably, the real issue is not titles but the number of works. My analysis
attempts to identify the number of ‘works’; titles are a multiple of works. A work will often
have multiple manifestations or derivations (paperback, library version, large print, etc.)
and, thus, while the statement that there may be ‘millions of orphans titles’ may be partially
correct, it is entirely misleading when the true measure applicable to the GBS discussion is
how many orphan works exist. It is the owner (or parent) of the work we want to find.

To many reporters and commentators, suggesting there are millions of orphans makes
sense because of the sheer number of books scanned by Google but, again, this is laziness.
Because Google has scanned 7-10 million titles then, so the logic goes, there must be
‘millions of orphans’. However, as a 2005 report (which I understand they are updating) by
OCLC noted, many definitional disclaimers are applied to this universe of titles such as titles
in foreign languages, titles distributed in the US, titles published in the UK, to name a few.
Accounting for these disclaimers significantly reduces the population of titles at the core of
this orphan discussion. These points were made in the 2005 OCLC report (although they
were not looking specifically at orphans) when they looked at the overlap in title holdings
among the first five Google libraries. (And, if you like this stuff, this was pretty interesting).
Prognosticators unfamiliar with the industry may also believe there are millions and millions
of published titles since, well, there are just lots and lots in their local B&N and town library.

The two methods I chose to try to estimate the population of orphans relied, firstly, on data
from Bowker’s BooksinPrint and OCLC’s Worldcat databases and, secondly, on industry data
published by Bowker since 1880 on title output. I accessed BooksinPrint via NYPL (Bowker
cut off my sub) and Worldcat is free via the web. The Bowker title data has been published
and referred to numerous times over the years and I found this data via Google Book
Search; I also purchased an old copy of The Bowker Annual from Alibris.

In using these databases, my goal was to determine whether there are consistencies across


Copyright: Michael Cairns – Replication and Distribution By Permission                          17
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 18 of 19

the two databases that I could then apply to the Google title counts. In addition to the ‘raw
data’ I extracted from the databases, OCLC (Dempsey) also noted some specific numbers of
‘books’ in their database (91mm), titles from the US (13mm) and non-corporate ‘Authors’
(4mm). Against the title counts from both sets of data, I attributed percentages which I
then applied to the Google universe of titles (7mm). (My analysis also 'limits' these numbers
to print books excluding, for example, dissertations).

In order to complete the analysis to determine a specific orphan population, I reduced my
raw results based on “best guess” estimates for non-books in the count, public domain titles
and titles where the copyright status is known. These final calculations result in a potential
orphan population of 600,000 works. I also stress-tested this calculation by manipulating
my percentages resulting in a possible universe of 1.6mm orphan works. This latter
estimate is (in my view) illogical, as I will show in my second analysis.

An important point should be made here: I am calculating the potential orphan population,
not the number of orphans. These numbers represent a total before any effort is made to
find the copyright holder. These efforts are already underway and will get easier once
money collected by the Books Rights Registry is to be distributed.

My second approach emanated from a desire to validate the first approach. If I could
determine how many works had been published each year since 1924, then I could attribute
percentages to this annual output based on my estimate of how likely it was that the
copyright status would be in doubt. Simply put, my supposition was that the older the work,
the more likely it was that it could be an orphan.

Bowker has consistently calculated the number of works published in the US since 1880
(give or take) and the methodology for these calculations remained consistent through the
mid-1990s. According to their numbers, approximately 2mm works were published between
1920 and 2000. Unsurprisingly, a look at the distribution of these numbers confirms that
the bulk of those works were published recently. If there were (only) 2mm works published
since the 1920s, it is impossible to conclude there are millions of orphan works.

To complete this analysis, I aggressively estimated the percentage of works published each
decade since 1920 which could be orphan works. The analysis suggests a total of 580K
potential orphan works which, as a subset of the approximately 2mm works published in the
US during this period, seems a reasonable estimate. My objective to ‘validate’ my first
approach (using OCLC and BIP data) shows that both approaches, using different
methodology, reach similar conclusions.

There are several conclusions that can be drawn from this analysis. Firstly, since the
universe of works is finite then, beyond a certain point, the Google scanning operation will
begin to find ‘new’ orphans at a decreasing rate. I don’t know if this number is 5mm
scanned titles or 12mm; my estimate is 7mm because, according to Worldcat, there are
3mm authors to 12mm titles. If you apply this ratio to the Bowker estimate of total of works
published, the number is around 7-8mm titles. Secondly, publishing output accelerated in

Copyright: Michael Cairns – Replication and Distribution By Permission                      18
A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out
Author: Michael Cairns – Information Media Partners
Page 19 of 19

the latter part of the 20th century. While my estimates in percentage terms of the number
of more recent orphans were comparably lower than the percentages applied in the early
part of the century for ‘older orphans’, the base number of published titles is much higher,
therefore the number of possible orphans is higher. Common sense dictates that it will be
far easier to find the parents of these later ‘orphans’.

In the aggregate, the 600K potential orphans may still seem high against a “work”
population of 2.2mm (25%). I disagree, given the distribution of the ‘orphan’ works (above
paragraph) and because I have assumed no estimate of the BRR’s effort to find and identify
the parents. In my view, true orphans will be a much lower number than 600,000, which
leads me to my final point. Money collected on behalf of unidentified orphan owners will
eventually be disbursed to cover costs of BRR or to other publishers. There has been some
controversy on this point and it derives, again, from the idea that there are millions of
orphans and thus the pool of undisbursed revenues will be huge. The true numbers don’t
support this conclusion. There will not be a huge pool of royalty revenues to be ultimately
disbursed to publishers who don’t ‘deserve’ this windfall because there won’t be very many
true orphans. The other point here is that royalty revenues will be calculated on usage and,
almost by definition, true orphan titles for the most part are not going to be popular titles
and therefore will not generate significant revenues in comparison with all other titles.

This analysis is not definitive, it is directional. Until someone else can present an argument
that examines the true numbers and works in more detail, I think this analysis is more
useful to the Google Settlement discussion than referring by rote to the ‘millions of
orphans’. The prevailing approach is lazy, misleading and inaccurate.




Copyright: Michael Cairns – Replication and Distribution By Permission                       19

Mais conteúdo relacionado

Destaque

Teden cvička - ocene vin 2010 društva
Teden cvička - ocene vin 2010 društvaTeden cvička - ocene vin 2010 društva
Teden cvička - ocene vin 2010 društva
jakaspaka
 
Websites to get followers on keek
Websites to get followers on keekWebsites to get followers on keek
Websites to get followers on keek
mandy365
 
Permeabilidad y coordinación
Permeabilidad y coordinaciónPermeabilidad y coordinación
Permeabilidad y coordinación
unach
 
Website to gain more followers on keek
Website to gain more followers on keekWebsite to gain more followers on keek
Website to gain more followers on keek
mandy365
 
Top 7 hr coordinator interview questions answers
Top 7 hr coordinator interview questions answersTop 7 hr coordinator interview questions answers
Top 7 hr coordinator interview questions answers
job-interview-questions
 

Destaque (20)

edulivew
edulivewedulivew
edulivew
 
Predictions for Educational Publishing: NFAIS Conference 2013
Predictions for Educational Publishing: NFAIS Conference 2013Predictions for Educational Publishing: NFAIS Conference 2013
Predictions for Educational Publishing: NFAIS Conference 2013
 
Responsive Web Design Workshop, Frankfurt Bookfair 2013
Responsive Web Design Workshop, Frankfurt Bookfair 2013Responsive Web Design Workshop, Frankfurt Bookfair 2013
Responsive Web Design Workshop, Frankfurt Bookfair 2013
 
Rethinking and Remixing Content: Society of Scholarly Publishers Panel 2013
Rethinking and Remixing Content: Society of Scholarly Publishers Panel 2013Rethinking and Remixing Content: Society of Scholarly Publishers Panel 2013
Rethinking and Remixing Content: Society of Scholarly Publishers Panel 2013
 
A Textbook for Publishing: Selected Articles from Personanondata
A Textbook for Publishing: Selected Articles from Personanondata  A Textbook for Publishing: Selected Articles from Personanondata
A Textbook for Publishing: Selected Articles from Personanondata
 
Frankfurt Bookfair Supply Chain Meeting: Publishing in a Digital Age
Frankfurt Bookfair Supply Chain Meeting: Publishing in a Digital AgeFrankfurt Bookfair Supply Chain Meeting: Publishing in a Digital Age
Frankfurt Bookfair Supply Chain Meeting: Publishing in a Digital Age
 
Korea Publishing Conference 2010: The United States of Publishing Status and ...
Korea Publishing Conference 2010: The United States of Publishing Status and ...Korea Publishing Conference 2010: The United States of Publishing Status and ...
Korea Publishing Conference 2010: The United States of Publishing Status and ...
 
Teden cvička - ocene vin 2010 društva
Teden cvička - ocene vin 2010 društvaTeden cvička - ocene vin 2010 društva
Teden cvička - ocene vin 2010 društva
 
Publishing Predictions 2010: Cloudy with a chance of alarm
Publishing Predictions 2010: Cloudy with a chance of alarmPublishing Predictions 2010: Cloudy with a chance of alarm
Publishing Predictions 2010: Cloudy with a chance of alarm
 
Parallel Universe: Will Libraries and Publishers Learn to Share?
Parallel Universe: Will Libraries and Publishers Learn to Share?Parallel Universe: Will Libraries and Publishers Learn to Share?
Parallel Universe: Will Libraries and Publishers Learn to Share?
 
Improving Publisher Metadata: AAUP Annual Meeting 2012
Improving Publisher Metadata: AAUP Annual Meeting 2012Improving Publisher Metadata: AAUP Annual Meeting 2012
Improving Publisher Metadata: AAUP Annual Meeting 2012
 
Chunking and Reusing Content: AAUP Conference 2012
Chunking and Reusing Content: AAUP Conference 2012Chunking and Reusing Content: AAUP Conference 2012
Chunking and Reusing Content: AAUP Conference 2012
 
Publishing Value Chain & Trends 1996 2000
Publishing Value Chain & Trends 1996 2000Publishing Value Chain & Trends 1996 2000
Publishing Value Chain & Trends 1996 2000
 
Websites to get followers on keek
Websites to get followers on keekWebsites to get followers on keek
Websites to get followers on keek
 
Google
GoogleGoogle
Google
 
The leveson inquiry
The leveson inquiryThe leveson inquiry
The leveson inquiry
 
Survey
SurveySurvey
Survey
 
Permeabilidad y coordinación
Permeabilidad y coordinaciónPermeabilidad y coordinación
Permeabilidad y coordinación
 
Website to gain more followers on keek
Website to gain more followers on keekWebsite to gain more followers on keek
Website to gain more followers on keek
 
Top 7 hr coordinator interview questions answers
Top 7 hr coordinator interview questions answersTop 7 hr coordinator interview questions answers
Top 7 hr coordinator interview questions answers
 

Semelhante a A database of riches michael cairns

Cummings LIBR 202 Term Paper
Cummings LIBR 202 Term PaperCummings LIBR 202 Term Paper
Cummings LIBR 202 Term Paper
Darcy Cummings
 
Google Book Search Presentation
Google Book Search PresentationGoogle Book Search Presentation
Google Book Search Presentation
bryboyd
 
Google book settlement olita sept 2009
Google book settlement olita sept 2009Google book settlement olita sept 2009
Google book settlement olita sept 2009
Tony Horava
 
Generation of information google
Generation of information googleGeneration of information google
Generation of information google
Sachin Sharma
 
Generation of Information-Google
Generation of Information-GoogleGeneration of Information-Google
Generation of Information-Google
Sachin Sharma
 
Academic librarian
Academic librarianAcademic librarian
Academic librarian
berklibrary
 
Crkn agm oct 2009 google books settlement
Crkn agm oct 2009 google books settlementCrkn agm oct 2009 google books settlement
Crkn agm oct 2009 google books settlement
Tony Horava
 
Google case study
Google case studyGoogle case study
Google case study
stacian
 
Sr briefing paper_anderson
Sr briefing paper_andersonSr briefing paper_anderson
Sr briefing paper_anderson
briquetdelemos
 

Semelhante a A database of riches michael cairns (20)

Cummings LIBR 202 Term Paper
Cummings LIBR 202 Term PaperCummings LIBR 202 Term Paper
Cummings LIBR 202 Term Paper
 
Data Digitization
Data DigitizationData Digitization
Data Digitization
 
Google Book Search Presentation
Google Book Search PresentationGoogle Book Search Presentation
Google Book Search Presentation
 
GBS Amended Settlement: A status update
GBS Amended Settlement: A status updateGBS Amended Settlement: A status update
GBS Amended Settlement: A status update
 
Google book settlement olita sept 2009
Google book settlement olita sept 2009Google book settlement olita sept 2009
Google book settlement olita sept 2009
 
Google Books Lecture
Google Books LectureGoogle Books Lecture
Google Books Lecture
 
Generation of information google
Generation of information googleGeneration of information google
Generation of information google
 
Generation of Information-Google
Generation of Information-GoogleGeneration of Information-Google
Generation of Information-Google
 
Academic librarian
Academic librarianAcademic librarian
Academic librarian
 
Crkn agm oct 2009 google books settlement
Crkn agm oct 2009 google books settlementCrkn agm oct 2009 google books settlement
Crkn agm oct 2009 google books settlement
 
Google case study
Google case studyGoogle case study
Google case study
 
Discovering Library2.0 Libraryservices For The Google Generation Sconul June ...
Discovering Library2.0 Libraryservices For The Google Generation Sconul June ...Discovering Library2.0 Libraryservices For The Google Generation Sconul June ...
Discovering Library2.0 Libraryservices For The Google Generation Sconul June ...
 
What if the future (of libraries)
What if the future (of libraries)What if the future (of libraries)
What if the future (of libraries)
 
James English, The New York Public Library @European Digital Distributors Me...
James English,  The New York Public Library @European Digital Distributors Me...James English,  The New York Public Library @European Digital Distributors Me...
James English, The New York Public Library @European Digital Distributors Me...
 
Sr briefing paper_anderson
Sr briefing paper_andersonSr briefing paper_anderson
Sr briefing paper_anderson
 
Resource Description Pres and Paper
Resource Description Pres and PaperResource Description Pres and Paper
Resource Description Pres and Paper
 
The Library as Publisher: How Pressbooks Supports Knowledge Sharing
The Library as Publisher: How Pressbooks Supports Knowledge SharingThe Library as Publisher: How Pressbooks Supports Knowledge Sharing
The Library as Publisher: How Pressbooks Supports Knowledge Sharing
 
GOKb - Global Open Knowledge Base
GOKb - Global Open Knowledge Base GOKb - Global Open Knowledge Base
GOKb - Global Open Knowledge Base
 
Time for true radicals
Time for true radicalsTime for true radicals
Time for true radicals
 
Watkinson "The Good, Bad, and Ugly in Open Access Humanities Monographs"
Watkinson "The Good, Bad, and Ugly in Open Access Humanities Monographs"Watkinson "The Good, Bad, and Ugly in Open Access Humanities Monographs"
Watkinson "The Good, Bad, and Ugly in Open Access Humanities Monographs"
 

Mais de Michael Cairns

Mais de Michael Cairns (18)

Book Industry Study Group Webinar: Technology Spending in Book Publishing
Book Industry Study Group Webinar: Technology Spending in Book PublishingBook Industry Study Group Webinar: Technology Spending in Book Publishing
Book Industry Study Group Webinar: Technology Spending in Book Publishing
 
Publishing: Establishing & Managing Partnerships and Relationships
Publishing: Establishing & Managing Partnerships and RelationshipsPublishing: Establishing & Managing Partnerships and Relationships
Publishing: Establishing & Managing Partnerships and Relationships
 
K 12 education market survey report for publishers
K 12 education market survey report for publishersK 12 education market survey report for publishers
K 12 education market survey report for publishers
 
Blockchain
Blockchain Blockchain
Blockchain
 
Digital transformation: A seminar for senior management
Digital transformation: A seminar for senior managementDigital transformation: A seminar for senior management
Digital transformation: A seminar for senior management
 
Google Book Settlement: Symposium at New York Law School
Google Book Settlement: Symposium at New York Law SchoolGoogle Book Settlement: Symposium at New York Law School
Google Book Settlement: Symposium at New York Law School
 
Weeks'Best Investor Deck 2009
Weeks'Best Investor Deck 2009Weeks'Best Investor Deck 2009
Weeks'Best Investor Deck 2009
 
Shared book Academicpub.com Publisher Partnership Deck 2011
Shared book Academicpub.com Publisher Partnership Deck 2011Shared book Academicpub.com Publisher Partnership Deck 2011
Shared book Academicpub.com Publisher Partnership Deck 2011
 
Publisher visioning session senior management retreat 1999
Publisher visioning session senior management retreat 1999Publisher visioning session senior management retreat 1999
Publisher visioning session senior management retreat 1999
 
High Level Overview of the Publishing Industry 2017
High Level Overview of the Publishing Industry 2017High Level Overview of the Publishing Industry 2017
High Level Overview of the Publishing Industry 2017
 
Acquisitions, Corporate Restructuring and the Future. NFAIS Workshop 2012
Acquisitions, Corporate Restructuring and the Future.  NFAIS Workshop 2012Acquisitions, Corporate Restructuring and the Future.  NFAIS Workshop 2012
Acquisitions, Corporate Restructuring and the Future. NFAIS Workshop 2012
 
The First 100 Days: A Planning Framework for the CEO
The First 100 Days: A Planning Framework for the CEOThe First 100 Days: A Planning Framework for the CEO
The First 100 Days: A Planning Framework for the CEO
 
Edtech 2017 Market Overview, Publishers Forum Berlin
Edtech 2017 Market Overview, Publishers Forum BerlinEdtech 2017 Market Overview, Publishers Forum Berlin
Edtech 2017 Market Overview, Publishers Forum Berlin
 
RLG (OCLC) Symposium Chicago 2010
RLG (OCLC) Symposium  Chicago 2010RLG (OCLC) Symposium  Chicago 2010
RLG (OCLC) Symposium Chicago 2010
 
Notable Posts 2008
Notable Posts 2008Notable Posts 2008
Notable Posts 2008
 
State Of Google Print
State Of Google PrintState Of Google Print
State Of Google Print
 
Building an Intelligent Supply Chain Frankfurt Supply Chain Interests Group 2002
Building an Intelligent Supply Chain Frankfurt Supply Chain Interests Group 2002Building an Intelligent Supply Chain Frankfurt Supply Chain Interests Group 2002
Building an Intelligent Supply Chain Frankfurt Supply Chain Interests Group 2002
 
Overview of Technology in Publishing: NYU Publishing Program Seminar for Chin...
Overview of Technology in Publishing: NYU Publishing Program Seminar for Chin...Overview of Technology in Publishing: NYU Publishing Program Seminar for Chin...
Overview of Technology in Publishing: NYU Publishing Program Seminar for Chin...
 

A database of riches michael cairns

  • 1. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Michael Cairns – Managing Partner, Information Media Partners michael.cairns@infomediapartners.com Tel: 908 938 4889
  • 2. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 2 of 19 Author: Michael Cairns has been a publishing executive and consultant for over 25 years. As President, R.R. Bowker he led the team that transitioned the company from a print-based organization to one reliant on web subscription products, and also successfully broadened the company’s revenue base. During his tenure at Bowker, he managed the sale of Bowker from Reed Elsevier and, once that transaction was completed, he executed a strategic plan resulting in the acquisition and integration of five companies in three years. As a consultant, he has managed projects for many large media companies including Thomson Learning (Cengage), Simon & Schuster, Reed Elsevier, The Interpublic Group of Companies, Ogilvy & Mather, Hearst, Gruner + Jahr, Online Computer Library Center (OCLC), AARP and others. In addition, Michael has held executive positions at PricewaterhouseCoopers, Berlitz International, Inc., Macmillan, Inc, and MyWire.com. In his current role at Information Media Partners, Michael consults with a wide spectrum of publishing and media companies helping them define market opportunities, develop business strategies, identify acquisition opportunities and manage through crisis. Potential clients are encouraged to contact Michael for more information (tel: 908 938 4889). Notes on this Report: In the summer of 2009, I started to wonder at the potential market opportunity that the Google Book Settlement could represent. Fellow industry consultant Mike Shatzkin and I began to discuss the agreement and I agreed to pull together a spreadsheet that could represent an ‘order of magnitude’ estimate of the market opportunity. This report does not rely on any direct interviews with Google nor representatives of the Book Rights Registry (BRR) and, as such, it only represents a structured approach to analyzing the opportunity. Nor is this report a definitive declaration of pricing, market penetration or approach in the manner in which this market opportunity may be leveraged. In addition to this report on market opportunity, I also constructed an estimate of the potential size of the orphan works population. This material has been available for some time on my blog (personanondata) and in several presentations I have made. I have included this analysis as an attachment to this report. (Other than a few minor punctuation edits, there have been no changes to my original). Several people helped in the review of this document and, for their time and effort, I am especially grateful. A special thanks to Mike Shatzkin of The Idea Logical Company who originally prompted me to look at the market potential of the Google Book Settlement and helped me organize my thoughts. Both OCLC’s WorldCat and Bowker’s Books In Print were invaluable in developing some of the conclusions formulated in this document. Specific citations are noted where applicable. Readers of this report may be interested in discussing the findings with me directly and in more detail. Please contact me to arrange a time: michael.cairns@infomediapartners.com or 908 938 4889. Find me on LinkedIn, Twitter and Scribd. Copyright: Michael Cairns – Replication and Distribution By Permission 2
  • 3. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 3 of 19 Introduction: Almost five years ago, Google embarked on the most ambitious library development project ever conceived: To create a “Noah’s Ark” of every book ever published and to start by digitizing books held by a rarefied group of five major academic libraries. The immediate response from US publishers was muted, until the implications of the project became clear: That Google proposed no boundaries to the digitization effort and initiated the scanning of books both in and out of copyright and in and out of print. Adding to publisher’s concerns, Google planned to display “snippets” (small selections) of the book’s content in search results. Despite some hurried conversations among publishers, author groups and Google, Google remained convinced that what they were doing represented a social ‘good’ and the partial display of the scanned books was legally within the boundaries of fair use. From the publisher perspective, this was a make-or-break moment, and the implications were more acutely felt by trade publishers who saw the potential for their business models to be obliterated by easy and ready access to high-quality content via a Google search over which they would exert little or no control. Even worse was the fear that rampant piracy of content would also develop – a debated and contentious point - given the easy access to a digitized version of a work that could be e-mailed or printed at will. The publishers determined that if Google were to ‘get away with it’ without challenge, then anyone would be able to digitize publisher content and possibly replicate what has been going on in the music and motion picture industries for almost ten years. In mid-2005, prompted by a law suit filed by The Authors Guild, the Association of American Publishers (AAP) led by four primary publishers filed suit against Google in an effort to halt the scanning of in-copyright materials. (The Authors Guild and AAP ultimately combined their filings). The initial Google Book Settlement (GBS) agreement, given preliminary approval by a court in October 2008, generated a vast amount of argument both in support of the agreement and in challenges to it. A revised agreement was drafted after the Federal District Court of Southern New York and Judge Chin agreed to delay the adjudication and final arguments which were heard in late February 2010. To date, Judge Chin has not given a timetable nor an indication of when and how he will decide the case. From the perspective of the early leading library participants, Google’s arrival and promise to digitize their purposefully conserved print collections looked like a miracle. Faced with forced declines in the dollars spent on monographs and the ever-rising expense of maintaining over 100 years of print archives, the Google digitization program provided a possible solution to many problems. All libraries believe they hold a social covenant to collect, maintain and preserve the most relevant materials of interest to their communities but maintaining that covenant becomes a challenge in an environment of increasing expenses while also enduring the challenges of migrating to an on-line world.1 1 It is important to acknowledge that, initially, the GBS may have been seen as a solution to libraries’ conservation and preservation needs; however, subsequently, libraries have determined that they need to develop their own preservation options in which The Hathi Trust is a clear leader. Copyright: Michael Cairns – Replication and Distribution By Permission 3
  • 4. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 4 of 19 The library world is typically segmented into public and academic institutions and while these often varied ‘communities’ may differ in their philosophy towards, for example, collection development or preservation, they do share some common practices. Most importantly, all libraries are committed to resource sharing and while materials use has historically and primarily been ‘local’ to the library, every institution wants to make its collections available to virtually any patron and institution who requests them. In short, these library collections were always ‘accessible’ to all regardless of geography or copyright: First US Mail, FedEx, e-mail and then the Internet progressively made this sharing easier but, until Google arrived with their digitization program, any sharing beyond the local institution was via physical distribution2. In effect, it could be argued that the Google scanning program simply makes an existing practice vastly more efficient. Even though, the approval of the Google Book Settlement (GBS) hangs in the balance under review by Judge Chin of the Federal District Court of Southern New York, an Executive Director has been named to head the Book Rights Registry (BRR)3 and is preparing the groundwork to establish the organization (BRR) in advance of approval. This report represents an attempt to analyze the market size opportunity for Google as it seeks to exploit the Google Book Settlement. Following are our summary findings which are discussed in more detail in the ensuing pages of this report. Summary Findings of the Report:  Libraries will see tremendous advantages – both immediate and over time - from the GBS, although concerns have been voiced (notably from Robert Darnton of Harvard4)  Google’s annual subscription revenue for licensing to libraries could approach $260mm by year three of launch  Over time, publishers (and content owners) will recognize the GBS service as an effective way to reach the library community and are likely to add titles to the service5  Google will add services and may open the platform for other application providers to enhance and broaden the user experience 2 Resource sharing and improvements in the ‘logistics’ provided by OCLC (WorldCat) or via consortia such as OhioLink has made physical distribution effective and comparatively efficient. 3 The BRR is the management body tasked with administering the GBS and representing the interests of authors and publishers once approval has been granted by the court. 4 Robert Darnton, NY Review of Books 5 The settlement doesn’t provide for adding content prior to 1/5/09; however, we are suggesting that, by mutual consent, additional published content may be added as an expedient method of reaching the library market. Copyright: Michael Cairns – Replication and Distribution By Permission 4
  • 5. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 5 of 19  The manner in which the GBS deals with orphan works will provide a roadmap for other communities of ‘orphans’ in photography, arts, and similar content and intellectual property Copyright: Michael Cairns – Replication and Distribution By Permission 5
  • 6. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 6 of 19 Business Analysis: By mid-2008, the lawsuit was background noise adding to the general malaise and discomfort characterizing the media industry and the announcement that the parties had agreed to settle their differences was initially greeted with support, relief and some surprise. Yet, as the implications of the complex settlement agreement became clearer, a strong (and, at times, strident) opposition developed to argue for substantial revisions to, or the elimination of, key sections of the agreement. Importantly, this opposition also succeeded in enjoining the Department of Justice (DoJ) to voice ‘strong opposition’ to segments of the agreement. When combined with the concerns expressed by DoJ, the opposition to the agreement was able to exact significant changes to the agreement’s terms. A ‘revised agreement’ was presented to and is now pending approval by Judge Denny Chin of the Federal District Court of Southern New York. Among the principal arguments against approval of the original settlement agreement were the following: • Opponents argued Google would attain an insurmountable monopoly over in- copyright but out-of-print works • The obligation to ‘opt-out’ of the agreement places an undue burden on the copyright holder (author) • Foreign rights holders were under represented (or insufficiently consulted) and thus disadvantaged by the original agreement • Monies collected on behalf of copyright holders but never disbursed would be paid into a ‘general expenses’ fund to benefit the Books Rights Registry6 • Some authors believed their moral rights to determine the use and replication of their works were circumvented. • The agreement itself will in effect create copyright ‘legislation’ which should be the purview of Congress The revision to the agreement has partially addressed these issues (excepting the last item) but the settlement revision has not fully incorporated all of the challenges supported by the settlement opposition and the Department of Justice. Two aspects of the agreement which generated attention and hyperbole concerned the number of “orphan works” and the revenue model Google would implement to market their full-text database. Both of these issues are used by settlement opponents to justify the agreement’s rejection by the Court. In each case, very little real analysis has been 6 Changed in the second version of the settlement so that uncollected funds would eventually be distributed to designated charities. Copyright: Michael Cairns – Replication and Distribution By Permission 6
  • 7. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 7 of 19 conducted to determine the true parameters of both the ‘orphan’ issue and the market opportunity. In August 2009, we published an estimate of the potential number of orphan works that may exist. We are unaware of any other detailed analysis that attempts to quantify the collection of titles which remain in copyright but whose copyright holder has not been located. This analysis is included as an attachment to this document7. The following chart summarizes the findings of potential orphan works: Estimate of Percent of Orphan Title Output: Works 1920 – 2000 580,388 Base Case 24% 824,553 High/Aggressive 34% In summary, the orphan analysis estimated a potential orphan population of 580,388 based on a review of pre-existing statistical information documenting the numbers of new titles published in the US since 1920. While we estimated that ‘orphans’ would be more prevalent among older titles, the total annual title output only exceeded 15,000 for the first time in 1960 (according to our source data); therefore, the universe of all titles published between 1920 and 1980 is actually relatively small. Publishing output only rapidly increased during the late 1980s and it is assumed that the majority of these titles will not be ‘orphans’ because copyright information is readily available and confirmable. As noted, the full report is included as an attachment to this report. We believe our analysis to be sound and the results were supported by a different methodology based on data from OCLC’s WorldCat database (as noted in the full report). After estimating the total number of ‘orphans’ we also estimated the number of foreign works that could potentially be included in the GBS. This analysis is more tenuous statistically because we relied entirely on the OCLC WorldCat database8 and made several key assumptions and extrapolations. Based on this conditional estimate, we determined there could be approximately 1.2million titles from the ten largest languages published and an additional 0.2million from all other languages. Currently, the content potentially covered by the GBS represents over 12mm titles scanned. Multiple versions of the same work are included in this total; however, even if all foreign works are to be excluded from the database and authors and publishers voluntarily remove 7 A related analysis that extrapolates the potential number of foreign language titles that may fall under the umbrella of the settlement has also been completed but is not included in this document. 8 This is not to assert that the WorldCat data is inaccurate in any way; rather, our assumptions should be considered ‘best-guess’. Copyright: Michael Cairns – Replication and Distribution By Permission 7
  • 8. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 8 of 19 their titles from inclusion, the Google Book subscription product will remain a compelling database for the academic and public library market as well as schools and certain corporations. A significant change adopted in the amended settlement agreement has narrowed the class to UK, Australian and Canadian published books in addition to those registered with the US copyright office.9 The Google Books Database Subscription and Revenue Model Opponents have suggested that Google will be in a position to exercise monopolistic pricing and to ‘overcharge’ to extract maximum revenues from their customers. We agree that their market position could be abused; however, we believe there is a counter-balance included in the agreement that obviates this tendency. Google seeks maximum exposure for the content - not only to support its stated mission of providing wide and broad access to this ‘hidden’ content, but also to support other business opportunities they may implement (such as advertising programs). We believe Google will see overly aggressive pricing as an inhibitor to wide market acceptance of the product. The Book Rights Registry will represent the interests of authors and publishers who will argue for pricing that maximizes their opportunity. Together, balancing wide access (Google’s position) with pricing considerations will result in an optimal pricing matrix. In developing our financial and market analysis, there are several key assumptions we have relied upon10: • Pricing will be variable based on type of institution • This will be considered a ‘must have’ database product for all libraries • The Google product will effectively “level the playing field” from small to large academic libraries for the types of books covered by the Settlement • Google will continue to invest in the Book database product by adding content, functionality and applications/tools to aid usage over time and may raise pricing • Penetration will not reach 100% for any segment, but is likely to grow over time • Corporations will be important customers (e.g., science, aeronautics and engineering-based firms) 9 As an upper limit, the number of ‘non-English’ language titles could be 50% of the total books scanned. 10 Business models that include advertising are not assumed in this analysis. It may be possible that Google will use the scanned content as content around which they can tailor advertising offers; however, the second amended version has narrowed the application of varied business models and it is difficult to determine that any model other than a subscription-based service will be the primary revenue generator to Google and the BRR. Over time, this may change but that circumstance is not anticipated in this analysis. Copyright: Michael Cairns – Replication and Distribution By Permission 8
  • 9. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 9 of 19 In the following analysis, we attempt to define the Google Books Database market opportunity and estimate the potential annual revenues the company may be able to generate each year from database subscriptions. Google currently markets several services to publishers which include Google Scholar, Google Partner Program and Google Editions (which will be launched in mid-2010). These current products and services are not included or assumed in this analysis. In estimating the market potential for the Google Settlement database product, we have taken three primary components (or drivers) into account: Market segmentation, penetration and pricing. Market Segment The agreement provides Google with the right to exploit certain markets including academic, public and special libraries, corporate customers, print-on-demand (POD)11 and direct-to-consumer sales. In our analysis, we have used American Library Association data itemizing the type and number of libraries in the US and used “best guess” estimates of the market opportunity represented by corporations and consumers. Most commentary to date has focused on the library community, which is where this analysis is strongest in its estimates and where we concentrate our discussion. An important accommodation of the Settlement is the provision of free access to the database product for all public libraries and certain “Carnegie” classed libraries. Each library accepting this access will receive the equivalent of a single user sign-on that will allow patrons and/or staff to access the Settlement database without restriction. While an important accommodation for some libraries, for the majority of libraries this access will not be appropriately functional and, thus, site-wide and unlimited user access provided under the terms of the subscription product will remain the better option. We do not believe this free access will materially impact the revenue opportunity for Google and have allowed for this circumstance in our financial model. In our opinion, academic libraries will consider a subscription to the Google Books database as a competitive necessity. For the first time, any subscribing library within the United States may gain direct access to the collections of some of the largest and most renowned academic collections in North America12. In addition, this access will far surpass the inter- library loan process of years past simply because the content is completely indexed. Researchers will no longer have to ‘guess’ that a title may be relevant to their research based on an index or table of contents and, moreover, they eliminate the risk that upon requesting the title be delivered to them, they discover the content to be irrelevant. 11 POD is a right that may be granted to Google in the future pending approval of the Book Rights Registry and the rightsholders they will represent. 12 The amended settlement has narrowed the class and effectively excludes non-English titles from the database. Copyright: Michael Cairns – Replication and Distribution By Permission 9
  • 10. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 10 of 19 Many academic library collections have been built over centuries and titles in their collections are often unique, which is another compelling reason supporting the argument that the Google database represents a singular opportunity for all academic institutions to “narrow the gap” between their research capabilities and those of the country’s largest and best endowed institutions. While some academic collections’ titles are available via inter- library loan, many older, fragile and unique works are only available at the institution itself by special request. The digitization of many (not all) of these works significantly broadens access to and distribution of this content. Undoubtedly, researchers, educators and students at all academic institutions will pressure their administrators and librarians to subscribe to the product13. The following chart represents our construct for the potential addressable market segments for the Google book database14: Total Number of Academic Libraries 3,617 Total Public Libraries 9,198 School Libraries 99,783 Special Libraries 9,066 Armed Forces 296 Government 1,159 Market Penetration: We estimate that sales penetration will vary considerably across the segments; however, for the reasons presented earlier, we believe penetration into the academic library segment will lead all other markets. Public libraries (particularly metropolitan library systems) will find value in the database and, as a group, will represent the largest concentration of customers overall. School libraries are unlikely to subscribe to the database in great numbers for budgetary or relevance reasons and, moreover, students will be encouraged to gain access to the product via their public library remote-access facilities. We expect larger research public libraries (such as The New York Public Library) will be treated as academic libraries for the sake of pricing. We also expect some corporations to access the database product and, while pricing for these ‘for profit’ entities should be comparatively high, the absolute number of customers in this segment will be small. Pricing: Database subscription pricing can be complicated and confusing. Models can be based on population served, purchasing budgets and/or enrollment, and then be subject to 13 It is likely that an extensive database of user behavior maybe generated by usage of this database. This is data that publishers (and authors) may be interested in mining for product development and/or insights into consumer behavior. 14 Source: American Library Association Copyright: Michael Cairns – Replication and Distribution By Permission 10
  • 11. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 11 of 19 multiplication factors such as number of simultaneous users, number of physical locations and other factors. We don’t know which method Google will choose; however, in order to keep our analysis as simple and transparent as possible, we have built our pricing model on the basis of the following criteria: • Unlimited users per location • Branch public libraries priced at 25% of base fee per additional branch • 3% price increases per year • Institution ‘classification’ based on ALA data • Full ramp-up will occur over the first three years Additionally, we expect Google will sell to the ‘highest’ administrative level possible15. For example, the University System of Georgia manages licensing contracts under their Galileo program for both public and academic libraries and, therefore, this agency would be the customer rather than individual or local libraries. In New York, Google would license access to the library authorities in each borough. In New York City (Manhattan), this would mean the main library and roughly 50 satellite libraries would have unlimited access via one contract and, based on our pricing matrix, the NYPL would pay approximately $340,000 per year for access ($25,000 for the main and $6,250 per 50 locations) For-profit organizations (corporations and businesses) will have a pricing matrix higher than for non-profit libraries and institutions (generally standard practice). We would expect that only a relatively small percentage of businesses would subscribe to the entire database and we have segmented the target market into Fortune 500, 1,000 and all others. The corporate customers most likely to subscribe would be those companies with large research needs such as pharmaceutical, aeronautics, engineering and the like. Options to better address this market may include shorter subscription terms, usage based on metering systems or topic/subject specific packages. Market Opportunity Summary: We believe Google and the Book Rights Registry (a proxy for authors, authors’ heirs and publishers) will be motivated to maximize access to the Google database in order to maximize viewing of the content which will, in turn, result in optimal revenues for both. We do not believe Google will implement a monopolistic approach to pricing and, in comparison with smaller and more segmented databases, we believe the Google pricing will appear reasonable considering the breadth and depth of content in the database. Approach to the Market: 15 Consortia pricing, while an important consideration, would represent a discount to the pricing matrix we present and would be negotiated on a case-by-case basis. We have not made accommodations for Consortia pricing. Copyright: Michael Cairns – Replication and Distribution By Permission 11
  • 12. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 12 of 19 In our view, Google has several options for marketing and selling this database product: • Google sells the product themselves with their own sales force • Google designates one supplier for each segment • Google allows all vendors to integrate the books database product into their existing database products and pays Google a defined fee per user. In our view, it is unlikely that Google will establish their own sales force to sell into the library and corporate marketplaces. While Google does have an ad sales force supporting its SEM program(s), this activity is vastly different from building a sales team to call on libraries and corporate clients. Additionally, given Google’s predilection for automation, the hiring of a human sales team doesn’t seem culturally acceptable. Lastly, and possibly more important, we believe licensing this product will become more a ‘renewal’ business as the market matures (after 3-4yrs) which could require far less sales effort – or one significantly different than that required in the first three years. We estimate a fully staffed Google sales force could cost the company $15million annually but, in short, Google is unlikely to want the headache. Given the limitations of the above approach, we believe it is more likely Google will contract with one or more of the established players and pay a standard sales commission to the provider. In this model, Google will be able to set prices and targets and retain a degree of control over both the provider of this sales effort and the market delivery (pricing) of the product. Existing providers would bid on the right to sell this database on behalf of Google and, because the product will be highly valued, the bidding would likely be highly competitive. Likely providers to Google would include ProQuest, Gale/Cengage, OCLC or EBSCO. It is also possible that an ‘outlier’ such as Ingram, Baker & Taylor or Hudson News (LibreDigital) would also see representing this database as a significant opportunity. For an established player, it is likely the provider would see increased sales in their current offering – simply representing the Google Books database would open new market opportunities. For an ‘outlier’, the Google Books product may represent an opportunity to enter the market using the Google product as a foundation. In our estimation, the above scenario is not only practical (not having to administer their own sales force is a major advantage), but may also be cost effective. Given the ‘prize’ of representing the Google database, we believe the average cost to Google maybe less than 10% of revenues. (“Renewal” sales may also be commissioned less than initial sales). Working with a single provider thus represents an effective solution for Google but this strategy may not also be efficient. In order to achieve greater efficiency in reaching their target market while also eliminating possible “political” issues caused by selecting one vendor over the others, the company may consider allowing any provider to sign a standard distribution agreement with the company and sell and market the product into all markets. This approach has several advantages: Copyright: Michael Cairns – Replication and Distribution By Permission 12
  • 13. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 13 of 19 • Immediately leverages the competitive position of all major providers that otherwise may be mutually exclusive • Gives a library subscriber a choice of provider and/or allows them to work with an existing ‘preferred’ vendor • Potentially enables providers to integrate the Google product with their existing products thus providing rapid development initiatives and built-in content ‘handcuffs’ supporting renewals • Minimizes Google’s exposure to any supplier limitations and negative customer support issues • Provides maximum exposure to all market segments virtually immediately • As part of these agreements, Google may gain access to index all content supplied by their third-party sales partners Approach to the Market Summary: Based on this review of Google’s tactical options, we believe the company will enable multiple (initially ‘preferred’) vendors to market and sell the product into the market. Google will establish pricing and the vendors will be required to pay Google based on this set price schedule (less vendor commission). Under this model, any vendor will be free to charge the end-customer less than the ‘set price’; however, the vendor would still pay Google based on the higher ‘full’ price. (Selling below the set price could occur due to bundling different products provided by the vendor). Forecasted Revenue Expectations: Based on our assumptions documented above, we believe the revenue Google may generate from the Google Books database product could approach $260million per year. Our revenue model was based on the following set of assumptions: • Base pricing by segment • Price discounts based on size of library holdings or population served • Penetration levels based on library size • Revenue represents full implementation, which we expect by year three Copyright: Michael Cairns – Replication and Distribution By Permission 13
  • 14. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 14 of 19 The following chart documents our estimates: Total Avg. Revenue Segment Avg. Pricing Market Penetration ($MM) Academics 3,617 65% $55,000 $130.1 Publics 9,198 47% $21,000 $112.8 School 99,783 0.5% $10,000 $4.9 Special 9,066 0.5% $25,000 $1.1 Armed 296 5% $11,000 $0.1 Forces Government 1,159 25% $11,000 3.1 Corporate 100,000 2% $37.500 $7.5 Total $260.0 As noted, we believe it will take Google three years to ramp up this full implementation revenue (we do not see this as a limitation on Google’s part, rather, a typical expectation for a new-product roll out). At the above levels, we believe pricing is not only reasonable and affordable, but compares favorably with existing database publishers’ pricing. There are few, if any, other publishers who have products which serve as many (all) segments as the Google Book database. At this revenue level, each of the 12mm titles in the Google database has a nominal value of $22 (per year) to Google. More importantly, the per-unit price paid by each library will be less than $0.05 (five cents). On a pure cost-avoidance basis, licensing the Google Books database appears good value given current costs. If the costs of handing, cataloging, special requests (such as interlibrary loans) and storage are added to the base wholesale price of any title, the title’s full ‘carrying costs’ can double. Some studies have indicated that fulfilling an interlibrary loan request can cost $25 for each segment from the library to requestor and back. This cost far exceeds the original (or, in many instances, the replacement) cost of the title16. While we believe this database to be an important acquisition for most academic and many public libraries, we do expect that Google will need to sell this product aggressively in the early years to achieve the penetration levels we anticipate. There are several reasons for this: Firstly, the content of the database is largely unknown and, while representative of many important library collections, Google will need to market this collection as important and complementary to the library customers in question. Secondly, the sheer size of the database could be an inhibiting (or intimidating) factor and therefore the navigation, 16 Users may print all or portions of the titles they select – although the ability (functionality) to do this may be a subsequent grant provided by the BRR to Google – and there is a cost to these activities;; however, we maintain the utility of the database and the ability of the user to be precise in their printing requests will thus produce only a marginal negative cost (if any) relative the costs of avoidance that is endemic to the current solution. Copyright: Michael Cairns – Replication and Distribution By Permission 14
  • 15. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 15 of 19 bibliographic data quality and the delivery of subject ‘collections’ will be important customer acquisition and retention areas for the company to focus on. In summary, we believe Google will be able to successfully launch their Book Database product into the market with fair and reasonable pricing that will encourage a broad base of target customers to subscribe. Future Market Growth Opportunities: While launch of this product is a focus of attention, we do believe the company has numerous opportunities to expand the product over time. We do not expect the Google Books database product to ‘stand still’; rather, we believe this product could become the primary access point for textural (monograph) materials into the library market. Future market opportunities17: • The addition of other content: Publishers may see this product as a viable library market entrance point for all their book content • Provision of usage data to publishers (and others) for business and product development needs • Pricing increases over time and penetration will increase • Inclusion of international/non-US market content – English language • Inclusion of international/non-US market content – Non-English language • Access to international markets • Addition of more in-copyright materials closer to current pub dates; perhaps becomes a major distribution mechanism for book content • Topic/segmented collections • Potential to open the database for third party application development 17 We expect these opportunities to ‘evolve’ over time based on discussion, negotiation and mutual agreement of the parties. Copyright: Michael Cairns – Replication and Distribution By Permission 15
  • 16. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 16 of 19 Summary: This analysis argues that the Google Books Database product will be seen as a ‘must have’ product for a large proportion of academic and public libraries and is, thus, valuable on its merits. Google will price this product at levels both lower than existing database providers and at levels that are ‘economically viable’ given cost avoidance justifications. The company retains flexibility in how they will approach selling and marketing the product; however, we believe they will contract these services. Lastly, we believe there is potential upside to the revenue model based on adding new markets and expanding content. Copyright: Michael Cairns – Replication and Distribution By Permission 16
  • 17. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 17 of 19 Addendum A – Orphan Works Analysis 580,388 Orphans (Give or Take) Clearly one of the most (if not the most) contentious issue regarding the Google Book Settlement (GBS) centers on the nebulous community of “orphans and orphan titles”. And yet, through the entirety of the discussion since the Google Book Settlement agreement was announced, no one has attempted to define how many orphans there really are. Allow me: 580,388. How do I know? Well, I admit, I do my share of guess work to get to this estimate, but I believe my analysis is based on key facts from which I have extrapolated a conclusion. Interestingly, I completed this analysis starting from two very different points and the first results were separated from the second by only 3,000 works (before I made some minor adjustments). Before I delve into my analysis, it might be useful to make some observations about the current discussion on the number of orphans. First, when commentators discuss this issue, they refer to the ‘millions’ of orphan titles. This is both deliberate obfuscation and lazy reporting: Most notably, the real issue is not titles but the number of works. My analysis attempts to identify the number of ‘works’; titles are a multiple of works. A work will often have multiple manifestations or derivations (paperback, library version, large print, etc.) and, thus, while the statement that there may be ‘millions of orphans titles’ may be partially correct, it is entirely misleading when the true measure applicable to the GBS discussion is how many orphan works exist. It is the owner (or parent) of the work we want to find. To many reporters and commentators, suggesting there are millions of orphans makes sense because of the sheer number of books scanned by Google but, again, this is laziness. Because Google has scanned 7-10 million titles then, so the logic goes, there must be ‘millions of orphans’. However, as a 2005 report (which I understand they are updating) by OCLC noted, many definitional disclaimers are applied to this universe of titles such as titles in foreign languages, titles distributed in the US, titles published in the UK, to name a few. Accounting for these disclaimers significantly reduces the population of titles at the core of this orphan discussion. These points were made in the 2005 OCLC report (although they were not looking specifically at orphans) when they looked at the overlap in title holdings among the first five Google libraries. (And, if you like this stuff, this was pretty interesting). Prognosticators unfamiliar with the industry may also believe there are millions and millions of published titles since, well, there are just lots and lots in their local B&N and town library. The two methods I chose to try to estimate the population of orphans relied, firstly, on data from Bowker’s BooksinPrint and OCLC’s Worldcat databases and, secondly, on industry data published by Bowker since 1880 on title output. I accessed BooksinPrint via NYPL (Bowker cut off my sub) and Worldcat is free via the web. The Bowker title data has been published and referred to numerous times over the years and I found this data via Google Book Search; I also purchased an old copy of The Bowker Annual from Alibris. In using these databases, my goal was to determine whether there are consistencies across Copyright: Michael Cairns – Replication and Distribution By Permission 17
  • 18. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 18 of 19 the two databases that I could then apply to the Google title counts. In addition to the ‘raw data’ I extracted from the databases, OCLC (Dempsey) also noted some specific numbers of ‘books’ in their database (91mm), titles from the US (13mm) and non-corporate ‘Authors’ (4mm). Against the title counts from both sets of data, I attributed percentages which I then applied to the Google universe of titles (7mm). (My analysis also 'limits' these numbers to print books excluding, for example, dissertations). In order to complete the analysis to determine a specific orphan population, I reduced my raw results based on “best guess” estimates for non-books in the count, public domain titles and titles where the copyright status is known. These final calculations result in a potential orphan population of 600,000 works. I also stress-tested this calculation by manipulating my percentages resulting in a possible universe of 1.6mm orphan works. This latter estimate is (in my view) illogical, as I will show in my second analysis. An important point should be made here: I am calculating the potential orphan population, not the number of orphans. These numbers represent a total before any effort is made to find the copyright holder. These efforts are already underway and will get easier once money collected by the Books Rights Registry is to be distributed. My second approach emanated from a desire to validate the first approach. If I could determine how many works had been published each year since 1924, then I could attribute percentages to this annual output based on my estimate of how likely it was that the copyright status would be in doubt. Simply put, my supposition was that the older the work, the more likely it was that it could be an orphan. Bowker has consistently calculated the number of works published in the US since 1880 (give or take) and the methodology for these calculations remained consistent through the mid-1990s. According to their numbers, approximately 2mm works were published between 1920 and 2000. Unsurprisingly, a look at the distribution of these numbers confirms that the bulk of those works were published recently. If there were (only) 2mm works published since the 1920s, it is impossible to conclude there are millions of orphan works. To complete this analysis, I aggressively estimated the percentage of works published each decade since 1920 which could be orphan works. The analysis suggests a total of 580K potential orphan works which, as a subset of the approximately 2mm works published in the US during this period, seems a reasonable estimate. My objective to ‘validate’ my first approach (using OCLC and BIP data) shows that both approaches, using different methodology, reach similar conclusions. There are several conclusions that can be drawn from this analysis. Firstly, since the universe of works is finite then, beyond a certain point, the Google scanning operation will begin to find ‘new’ orphans at a decreasing rate. I don’t know if this number is 5mm scanned titles or 12mm; my estimate is 7mm because, according to Worldcat, there are 3mm authors to 12mm titles. If you apply this ratio to the Bowker estimate of total of works published, the number is around 7-8mm titles. Secondly, publishing output accelerated in Copyright: Michael Cairns – Replication and Distribution By Permission 18
  • 19. A Database of Riches: Measuring the Options for Google’s Book Settlement Roll Out Author: Michael Cairns – Information Media Partners Page 19 of 19 the latter part of the 20th century. While my estimates in percentage terms of the number of more recent orphans were comparably lower than the percentages applied in the early part of the century for ‘older orphans’, the base number of published titles is much higher, therefore the number of possible orphans is higher. Common sense dictates that it will be far easier to find the parents of these later ‘orphans’. In the aggregate, the 600K potential orphans may still seem high against a “work” population of 2.2mm (25%). I disagree, given the distribution of the ‘orphan’ works (above paragraph) and because I have assumed no estimate of the BRR’s effort to find and identify the parents. In my view, true orphans will be a much lower number than 600,000, which leads me to my final point. Money collected on behalf of unidentified orphan owners will eventually be disbursed to cover costs of BRR or to other publishers. There has been some controversy on this point and it derives, again, from the idea that there are millions of orphans and thus the pool of undisbursed revenues will be huge. The true numbers don’t support this conclusion. There will not be a huge pool of royalty revenues to be ultimately disbursed to publishers who don’t ‘deserve’ this windfall because there won’t be very many true orphans. The other point here is that royalty revenues will be calculated on usage and, almost by definition, true orphan titles for the most part are not going to be popular titles and therefore will not generate significant revenues in comparison with all other titles. This analysis is not definitive, it is directional. Until someone else can present an argument that examines the true numbers and works in more detail, I think this analysis is more useful to the Google Settlement discussion than referring by rote to the ‘millions of orphans’. The prevailing approach is lazy, misleading and inaccurate. Copyright: Michael Cairns – Replication and Distribution By Permission 19