SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
NoSQL technologies from an STM
publishing perspective
Bradley P. Allen, Elsevier Labs
Presentation at NoSQL Now 2011
San Jose, CA, USA
2011-08-25
Peak physical media: is it here?




               •   “Music Sales”, New York Times, 1 August 2009.
                   http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html
               •   “Initial Circs per student”, William Denton, 31 January 2011.
                   http://www.miskatonic.org/2011/01/31/initial-circs-student
               •   “Rise of e-book Readers to Result in Decline of Book Publishing Business”, Steven
                   Mather, iSuppli, 28 April 2011. http://www.isuppli.com/Home-and-Consumer-
                   Electronics/News/Pages/Rise-of-e-book-Readers-to-Result-in-Decline-of-Book-
                   Publishing-Business.aspx                                                            2
In any case, the challenge to STM publishers is clear


 • Print revenue is softening
 • Online channels are exploding
    – Changing the way customers create and consume
      our content
    – Leading to new requirements and market
      opportunities for online products




                                                      3
Additional challenges in STM publishing


 • Academic context and tradition inhibits
   business model innovation
 • Technology and business traditionally
   separate concerns
 • Acquisitions create content and data silos
 • Global market drives lowest common
   denominator technology choices


                                                4
A simple model of the evolution of STM publishing



 Print era: 1600s -   Digital Library era:    Platform-as-a-
 1980                 1980 – 2010s            service era: 2010s

      • Packaged as        • Packaged as           • Packaged as
        books and            books and               apps
        journals             journals              • Digitally
      • Physically         • Digitally               distributed
        distributed          distributed           • Access and
      • Access and         • Access and              discovery
        discovery            discovery               through social
        through              through search          networks
        libraries            engines




                                                                      5
STM publishing use cases in transition
Use case                                         Digital Library era                            Platform-as-a-service era
A new medical term relevant to an emerging       Organizational governance issues about how     A single, automated and standardized
healthcare issue (e.g. a new type of avian flu   taxonomies are be updated, coupled with        taxonomy management and content
virus) needs to be incorporated into a search    manually-intensive workflows and ad-hoc        enhancement workflow allows rapid and
index immediately                                approaches to content tagging, inhibit rapid   timely update of search applications
                                                 response
Application developers want to mash up           Data silos without easy means of               Content API and single-point-of-access
epidemiological data with medical journal        programmatic access by developers, coupled     repository allow data and content to be
articles to create topic-specific Web resource   with governance and business model             accessed, discovered and reused across
                                                 questions , inhibit data reuse                 multiple applications
Digital library developers want to stage         Duplication of core content leads to           Consolidation of duplicate repositories into a
content into single repository for unified       synchronization, quality control issues        single point of truth across all content
search index generation                                                                         accessible and discoverable through a
                                                                                                Content API eliminates the need for
                                                                                                duplication and synchronization
Third party solutions providers want to          No standards, no APIs for point-of-care        Standards and APIs that scale across multiple
integrate content (e.g. tagged medical journal   content integration across all content and     partners, for all content types, for all delivery
articles, medical taxonomies) into point-of-     data                                           formats
care solutions
Publishers want to deliver their content to      No clear standard or approach for targeting    Web- and industry-standards for eReader,
tablets and e-readers in delivery formats that   emerging eReader, tablet devices, multiple     tablet devices supported as part of standard
take advantage of the displays and interaction   and divergent approaches leading to siloed     automated processing into delivery channel-
modalities on those devices                      solutions, duplication of effort               specific formats, regularly updated and
                                                                                                exposed through a Content API
Journal publisher wants to integrate content     No single point of access to content           Easy access to multiple opportunities for
enhancements across multiple subject matter      enhancements, no standards for content         content enhancements embedded in
areas to add value to products leveraging        enhancement suppliers and partners to          standard next-generation article formats and
Article of the Future technology                 deliver enhancements for integration           provided using standard content
                                                                                                enhancement formats
                                                                                                                                              6
Facets of STM publishing processes

                                                   Process Type


                                            Access and
     Acquisition        Transformation                         Enhancement           Composition         Delivery
                                             discovery




                    Entity                                            Activity                      Content Type
                                                         submitting           entity extraction
author                  product catalog
                                                         crawling             fact extraction
supplier                editor
                                                         syndicating          clustering           article
Web site                reviewer
                                                         formatting           aggregating          book
typesetter              user
                                                         mapping              ordering             media object
automated process       designer
                                                         cleansing            summarizing          entity record
subject matter expert   developer
                                                         indexing             filtering            taxonomy
search engine           e-book
                                                         querying             analysis             ontology
content repository      mobile app
                                                         updating             rendering            user-generated content
entity registry         mobile-enhanced Web site
                                                         storing              design
                        API
                                                         annotating           publishing
                                                         subject tagging      accessing
                                                         classification       retrieving
                                                         entity recognition   deleting


                                                                                                                    7
Emerging content requirements

 •   Broad range of content types                            •   Accessible
      –   Must treat as first-class objects video, audio,         –   Must be easily accessed through content
          images, datasets, metadata and knowledge                    creation, retrieval, update and deletion (CRUD)
          organization systems in addition to articles and            services
          books
                                                             •   Flexible
 •   Standards-based                                              –   New content types and associated schemas
      –   Web-standard formats to support ease of                     must be easily added through configuration
          integration and interoperability
                                                             •   Reusable
 •   Fine-grained                                                 –   It must be efficient for product developers to
      –   Must be decomposable into and addressable in                aggregate and compose content fragments into
          fragments smaller than the unit of publication;             new products
          e.g., down to the level of specific words,
          phrases, images, table cells in articles or book
                                                             •   Modifiable
          chapters, key frames and segments in videos             –   Support the enhancement and correction of
                                                                      content at any time following creation
 •   Discoverable
      –   Must be easily located across all levels of
                                                             •   Broad range of delivery formats
          granularity,                                            –   Content standards and services must support
                                                                      fulfillment, delivery and presentation across
                                                                      desktop, notebook, tablet and mobile
                                                                      computing devices



                                                                                                                      8
Emerging content architecture

                                 Linked data


                                                            Relational
                                                            metadata
                                           Entity record
                          Relational
                          Metadata
               Document                                        Relational
                                                               metadata




                              Relational
     Acquire                  Metadata                     Relational       Deliver
                                                           metadata
                                           Media object


                          Relational                          Relational
                                                              metadata
                          Metadata




                                  Transform,
                               Enhance, Compose


                                                                                      9
Content acquisition and transformation




                                         10
Content enhancement and analytics




                                    11
Content composition and delivery




                                   12
Why NoSQL is important to STM publishing


 • NoSQL emphasizes design choices that focus on
   delivering robust, scalable Web applications
   –   Document-centric
   –   Schemaless
   –   Support for analytics
   –   Read/write at Web scale
   –   Move scale-out from development to operations
 • As we shift to the platform-as-a-service era,
   these features become an important part of the
   STM publishing technology stack
                                                       13
How NoSQL addresses STM publishing’s needs

 • Schemaless, document-centric stores
     – Ease repository extension to accommodate expanding range of new, finer-
       grained content types
     – Fit HTML5/JS/CSS content stack providing web-based alternatives to native apps
     – Expedite application stack refresh in support of authoring and editorial workflow
       portals and tools
 • Support for analytics eases innovation in scientometrics
 • Read/write at Web scale accommodates solutions incorporating content
   at more dynamic, fine-grained scale
     –   Entity records
     –   Annotations
     –   Other forms of community-contributed content
     –   Linked data integration of heterogeneous information resources across the Web
         for mashups/solutions
 • Moving scale-out from development to operations reduces time-to-
   market, cost of failure for emerging, niche publishing opportunities


                                                                                      14
Where STM publishing can drive NoSQL requirements


 • Integrated support for search
    – Free text retrieval
    – Faceted navigation
 • Query language functionality
    – Nearest-neighbor matching
    – Joins vs. join-free
 • Primitives/support for analytics design patterns
    – Clustering
    – Classification
    – Entity resolution
 • Primitives/support for semantic enhancement
    – Linked data
    – Language processing
 • Versioning for document stores

                                                      15
Elsevier applications of NoSQL technologies


 •   Entity registries
 •   Metadata repositories
 •   Big data analytics
 •   User-built apps




                                              16
Linked Data Repository




                         17
SciVal




         18
SciVerse




           19
Conclusions


 • STM publishing is in transition
 • This is driving new requirements for content
 • Many of these requirements are well met by
   NoSQL solutions
 • Some requirements point to areas of future
   work for NoSQL technologists and vendors



                                                  20

Mais conteúdo relacionado

Mais de Bradley Allen

Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Bradley Allen
 
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
Bradley Allen
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Bradley Allen
 
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
Bradley Allen
 
Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)
Bradley Allen
 
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
Bradley Allen
 
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Bradley Allen
 
Rethinking Faceted Navigation for Online Marketing (2008)
Rethinking Faceted Navigation for Online Marketing (2008)Rethinking Faceted Navigation for Online Marketing (2008)
Rethinking Faceted Navigation for Online Marketing (2008)
Bradley Allen
 
Siderean and AWS (AWS Startup Event LA 2008)
Siderean and AWS (AWS Startup Event LA 2008)Siderean and AWS (AWS Startup Event LA 2008)
Siderean and AWS (AWS Startup Event LA 2008)
Bradley Allen
 
Navigation Through Social Computing (Enterprise Search Summit 2008)
Navigation Through Social Computing (Enterprise Search Summit 2008)Navigation Through Social Computing (Enterprise Search Summit 2008)
Navigation Through Social Computing (Enterprise Search Summit 2008)
Bradley Allen
 

Mais de Bradley Allen (10)

Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
Bridging the Gap Between Folksonomies and Taxonomies: A Semantic Web Approach...
 
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...
 
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
Faceted Navigation of User-Generated Metadata (JDCL 2006 Workshop on Metadata...
 
Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)
 
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
Relational Navigation: A Taxonomy-Based Approach to Information Access and Di...
 
Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...Relational Navigation Brings Social Computing and Semantic Technology Computi...
Relational Navigation Brings Social Computing and Semantic Technology Computi...
 
Rethinking Faceted Navigation for Online Marketing (2008)
Rethinking Faceted Navigation for Online Marketing (2008)Rethinking Faceted Navigation for Online Marketing (2008)
Rethinking Faceted Navigation for Online Marketing (2008)
 
Siderean and AWS (AWS Startup Event LA 2008)
Siderean and AWS (AWS Startup Event LA 2008)Siderean and AWS (AWS Startup Event LA 2008)
Siderean and AWS (AWS Startup Event LA 2008)
 
Navigation Through Social Computing (Enterprise Search Summit 2008)
Navigation Through Social Computing (Enterprise Search Summit 2008)Navigation Through Social Computing (Enterprise Search Summit 2008)
Navigation Through Social Computing (Enterprise Search Summit 2008)
 

Último

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

NoSQL Technologies from an STM Publishing Perspective (NoSQL Now 2011)

  • 1. NoSQL technologies from an STM publishing perspective Bradley P. Allen, Elsevier Labs Presentation at NoSQL Now 2011 San Jose, CA, USA 2011-08-25
  • 2. Peak physical media: is it here? • “Music Sales”, New York Times, 1 August 2009. http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html • “Initial Circs per student”, William Denton, 31 January 2011. http://www.miskatonic.org/2011/01/31/initial-circs-student • “Rise of e-book Readers to Result in Decline of Book Publishing Business”, Steven Mather, iSuppli, 28 April 2011. http://www.isuppli.com/Home-and-Consumer- Electronics/News/Pages/Rise-of-e-book-Readers-to-Result-in-Decline-of-Book- Publishing-Business.aspx 2
  • 3. In any case, the challenge to STM publishers is clear • Print revenue is softening • Online channels are exploding – Changing the way customers create and consume our content – Leading to new requirements and market opportunities for online products 3
  • 4. Additional challenges in STM publishing • Academic context and tradition inhibits business model innovation • Technology and business traditionally separate concerns • Acquisitions create content and data silos • Global market drives lowest common denominator technology choices 4
  • 5. A simple model of the evolution of STM publishing Print era: 1600s - Digital Library era: Platform-as-a- 1980 1980 – 2010s service era: 2010s • Packaged as • Packaged as • Packaged as books and books and apps journals journals • Digitally • Physically • Digitally distributed distributed distributed • Access and • Access and • Access and discovery discovery discovery through social through through search networks libraries engines 5
  • 6. STM publishing use cases in transition Use case Digital Library era Platform-as-a-service era A new medical term relevant to an emerging Organizational governance issues about how A single, automated and standardized healthcare issue (e.g. a new type of avian flu taxonomies are be updated, coupled with taxonomy management and content virus) needs to be incorporated into a search manually-intensive workflows and ad-hoc enhancement workflow allows rapid and index immediately approaches to content tagging, inhibit rapid timely update of search applications response Application developers want to mash up Data silos without easy means of Content API and single-point-of-access epidemiological data with medical journal programmatic access by developers, coupled repository allow data and content to be articles to create topic-specific Web resource with governance and business model accessed, discovered and reused across questions , inhibit data reuse multiple applications Digital library developers want to stage Duplication of core content leads to Consolidation of duplicate repositories into a content into single repository for unified synchronization, quality control issues single point of truth across all content search index generation accessible and discoverable through a Content API eliminates the need for duplication and synchronization Third party solutions providers want to No standards, no APIs for point-of-care Standards and APIs that scale across multiple integrate content (e.g. tagged medical journal content integration across all content and partners, for all content types, for all delivery articles, medical taxonomies) into point-of- data formats care solutions Publishers want to deliver their content to No clear standard or approach for targeting Web- and industry-standards for eReader, tablets and e-readers in delivery formats that emerging eReader, tablet devices, multiple tablet devices supported as part of standard take advantage of the displays and interaction and divergent approaches leading to siloed automated processing into delivery channel- modalities on those devices solutions, duplication of effort specific formats, regularly updated and exposed through a Content API Journal publisher wants to integrate content No single point of access to content Easy access to multiple opportunities for enhancements across multiple subject matter enhancements, no standards for content content enhancements embedded in areas to add value to products leveraging enhancement suppliers and partners to standard next-generation article formats and Article of the Future technology deliver enhancements for integration provided using standard content enhancement formats 6
  • 7. Facets of STM publishing processes Process Type Access and Acquisition Transformation Enhancement Composition Delivery discovery Entity Activity Content Type submitting entity extraction author product catalog crawling fact extraction supplier editor syndicating clustering article Web site reviewer formatting aggregating book typesetter user mapping ordering media object automated process designer cleansing summarizing entity record subject matter expert developer indexing filtering taxonomy search engine e-book querying analysis ontology content repository mobile app updating rendering user-generated content entity registry mobile-enhanced Web site storing design API annotating publishing subject tagging accessing classification retrieving entity recognition deleting 7
  • 8. Emerging content requirements • Broad range of content types • Accessible – Must treat as first-class objects video, audio, – Must be easily accessed through content images, datasets, metadata and knowledge creation, retrieval, update and deletion (CRUD) organization systems in addition to articles and services books • Flexible • Standards-based – New content types and associated schemas – Web-standard formats to support ease of must be easily added through configuration integration and interoperability • Reusable • Fine-grained – It must be efficient for product developers to – Must be decomposable into and addressable in aggregate and compose content fragments into fragments smaller than the unit of publication; new products e.g., down to the level of specific words, phrases, images, table cells in articles or book • Modifiable chapters, key frames and segments in videos – Support the enhancement and correction of content at any time following creation • Discoverable – Must be easily located across all levels of • Broad range of delivery formats granularity, – Content standards and services must support fulfillment, delivery and presentation across desktop, notebook, tablet and mobile computing devices 8
  • 9. Emerging content architecture Linked data Relational metadata Entity record Relational Metadata Document Relational metadata Relational Acquire Metadata Relational Deliver metadata Media object Relational Relational metadata Metadata Transform, Enhance, Compose 9
  • 10. Content acquisition and transformation 10
  • 11. Content enhancement and analytics 11
  • 12. Content composition and delivery 12
  • 13. Why NoSQL is important to STM publishing • NoSQL emphasizes design choices that focus on delivering robust, scalable Web applications – Document-centric – Schemaless – Support for analytics – Read/write at Web scale – Move scale-out from development to operations • As we shift to the platform-as-a-service era, these features become an important part of the STM publishing technology stack 13
  • 14. How NoSQL addresses STM publishing’s needs • Schemaless, document-centric stores – Ease repository extension to accommodate expanding range of new, finer- grained content types – Fit HTML5/JS/CSS content stack providing web-based alternatives to native apps – Expedite application stack refresh in support of authoring and editorial workflow portals and tools • Support for analytics eases innovation in scientometrics • Read/write at Web scale accommodates solutions incorporating content at more dynamic, fine-grained scale – Entity records – Annotations – Other forms of community-contributed content – Linked data integration of heterogeneous information resources across the Web for mashups/solutions • Moving scale-out from development to operations reduces time-to- market, cost of failure for emerging, niche publishing opportunities 14
  • 15. Where STM publishing can drive NoSQL requirements • Integrated support for search – Free text retrieval – Faceted navigation • Query language functionality – Nearest-neighbor matching – Joins vs. join-free • Primitives/support for analytics design patterns – Clustering – Classification – Entity resolution • Primitives/support for semantic enhancement – Linked data – Language processing • Versioning for document stores 15
  • 16. Elsevier applications of NoSQL technologies • Entity registries • Metadata repositories • Big data analytics • User-built apps 16
  • 18. SciVal 18
  • 19. SciVerse 19
  • 20. Conclusions • STM publishing is in transition • This is driving new requirements for content • Many of these requirements are well met by NoSQL solutions • Some requirements point to areas of future work for NoSQL technologists and vendors 20