New Trends in Data Management in the Information Industries

•Transferir como PPTX, PDF•

1 gostou•905 visualizações

Presentation from the Copyright Clearance Center Distinguished Speaker Series presentation February 26th, 2015. As the publishing industry is transforming from form based, single purpose products to information providers focused on the curation of data and content tailoring its delivery to the role, action and location of the users, there has been a parallel transformation in the management of the data and content that are the raw materials for these products. Matt Turner, MarkLogic’s CTO for Media and Publishing, will talk about the new generation of information management technology focusing on how they are helping transform the information industries and revolutionize how people think about managing data and content. Topic that will be covered include NoSQL / new generation databases, search, and semantic technology and information product trends with example of innovative teams leveraging these new capabilities.

Tecnologia

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
New Trends in Data Management in the
Information Industries
Presented by: Matt Turner, CTO Media and Publishing
February, 2015

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 2
Agenda
 Introduction
 Information Industries Trends
 Top 5 Challenges in the Industry
 New Approaches and Solutions

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3
Hierarchical Era
For your application
data!
• Application- and
hardware-specific
Data Drives the Need for a New Generation Database
Relational Era
“For all your structured
data!”
• Normalized, tabular
model
• Application-
independent query
• User control
Any Structure Era
“For all your data!”
• Schema-agnostic
• Massive scale
• Query and search
• Analytics
• Heterogeneous data
• Faster time-to-results

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4
Harnessing Data & Reimagining Applications
 Reduce Risk
 Manage Compliance
 Create New Value from Data
 Optimize Operations
 Lower TCO / Better IT Economics
 Better Decision-making

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 5
MarkLogic:
Best Operational
Data Warehouse
(Aug 2014)

Enterprise NoSQL Database Platform
Flexible Data
Model
Store and manage
JSON, XML, RDF,
and Geospatial data
with a document-
centric, schema-
agnostic database
Scalability
and Elasticity
ACID
Transactions
Search and
Query
Semantics Certified
Security
Hadoop
Integration
Scale to
petabytes of data
without over-
provisioning or
over-spending
Avoid data loss,
data corruption,
and stale
reads—even at
speed and scale
Lightning fast,
sophisticated,
sub-second
search and
query across all
of your data
Store and query
linked data as
RDF and
SPARQL
Make your
Hadoop better
by connecting
it to MarkLogic
Government-
grade, granular,
role-based
security

DECADE+ OF INNOVATION
Working Together To Reimagine Applications

FROM PUBLISHERS TO
INFORMATION PROVIDERS

TRADITIONAL PUBLISHING
FORM BASED
PRODUCTS
DEDICATED
PRODUCT
INFRASTRUCTURE
Product A Dedicated
Infrastructure
(database + search engine)
Product B
Product C
Company Data
Industry Data
Filings
Reports

INFORMATION DELIVERY PLATFORM
FORMAT
INDEPENDEN
T
INFORMATION
CENTRIC
DYNAMIC
DELIVERY
Company Data
Industry Data
Filings
Reports
Deliver the right content,
to the right user,
in the right format,
in real time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13
Top 5 Requirements for Information Providers
Getting data IN fast isn’t the problem – it’s getting insights OUT Faster!
Data is complex – but users want complexity hidden!
Not everyone has permission to access all the data…
Repurpose, repurpose, repurpose. Repeat
Once you attract them – you must be reliable
1
2
3
4
5

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14
Traditional Technology
 Rows and columns for content strip
information
Title Publication
Date
Category Abstract Section Section 2?
Science
Article 1
3/1/14 Biology Abstract
text . . .
Section
text
Section text
Research
Book
6/4/13 Surgery Abstract
text . . .
Section
text
Section text
Science
Article 2
6/4/05 Chemistry Abstract
text . . .
Section
text
Section text
?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15
Traditional Technology
 Rows and columns for content strip
information
 Hierarchical taxonomies overlap and don’t
capture the complexity
Title Publication
Date
Category Abstract Section Section 2?
Science
Article 1
3/1/14 Biology Abstract
text . . .
Section
text
Section text
Research
Book
6/4/13 Surgery Abstract
text . . .
Section
text
Section text
Science
Article 2
6/4/05 Chemistry Abstract
text . . .
Section
text
Section text
?
Research
Medicine
Science
Surgery
Orthopedics
Cell Biology
Biochemistry
….
Life Sciences
Biomedical
Sciences
Cell Biology
Biology
Biochemistry
…Chemistry
Microbiology
Biochemistry
…
?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16
8. Develop, integrate and test
infrastructure & applications
4. Define schemas, indexes
and services
1. Design infrastructure,
services & applications 2. Analyze Data Formats
Articles Books
Industry
Data
Reports
5. Build databases,
middleware and services
infrastructure
6. Define & implement ETL
processes
The Functional Solution Silos & Treadmill
7. Load and normalize data
3. Define queries & Service
APIs
?
?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17
Hierarchical Era
For your application
data!
• Application- and
hardware-specific
Data Drives the Need for a New Generation Database
Relational Era
“For all your structured
data!”
• Normalized, tabular
model
• Application-
independent query
• User control
Any Structure Era
“For all your data!”
• Schema-agnostic
• Massive scale
• Query and search
• Analytics
• Heterogeneous data
• Faster time-to-results

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18
 No need to define up front
 Matched to complex content and
metadata data modeling
 Data is managed in its most
accessible, natural form
 XML, JSON, RDF, geospatial
Flexible Data Model
Schema-agnostic, structure-aware
Result: Product content and data from
multiple sources available to be tailored to
any purpose and product

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19
Search and Query
Search to find answers in documents, relationships, and metadata
 Automatic indexing of every data value, text and data
structure
 Specialized indexes for data values (analytics, facets,
sorting), geospatial and triples
 All updated in the context of ACID transactions to
ensure data integrity and real-time access
 Accessible via fully programmable search API with full-
text search, type-ahead suggestions, facets, snippeting,
highlighted search terms, proximity boosting, relevance
ranking, and language support
JavaScript XQuery SPARQL
Rich Query
Capability
In-database
MapReduce
Full-text
Search
Semantic
Search
Geospatial
Search
Result: simplified architecture with a single
component for search and database

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20
Semantics
Enterprise triple store, document store, and database combined
 Store and query billions of facts and relationships
 Leverage ontologies for domain and role specific
context access to data and documents
 Efficient metadata management with relationships
to ontologies
 Standards-based for ease of use and integration
– RDF, SPARQL, and standard REST
interfaces

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21
Documents, data and triples provide complete picture of content
Semantics
Result: context to tailor information to your user’s role, activity and location

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22
Scalability, Elasticity and Cloud
Massive enterprise scalability and elasticity
 Scale horizontally in clusters on commodity
hardware to hundreds of nodes, petabytes of
data, and billions of documents
 Process thousands of multi-document multi-
statement transactions per second
 Start small and scale up or down to meet capacity
and performance demands without over-
provisioning or over-spending
 Fully cloud enabled for automated deployment
and management on EC2
 Leverage dynamic configurations with Tiered
Storage
D-NODE D-NODE
E-NODE E-NODE
D-NODE
Result: Enterprise-ready to power mission critical products

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23
8. Develop, integrate and test
infrastructure & applications
1. Design infrastructure,
services & applications
With MarkLogic…
3. Define queries & Service
APIs
?
?
When something changes.... It’s no big deal

INFORMATION DELIVERY PLATFORM EXTENDED
Content and
Customers
Complete Picture of
Business
Metrics Driving Product
Development and
Sales
Company Data
Industry Data
Filings
Reports
Catalogs Lists
Authors Institutions Social Media +
Usage

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25
Use Case: Master Data
 Foundational data for
digital products
 Industry topology and
trends to drive innovation
 User and content metrics
to drive product
development

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26
Use Case: Enhance Digital Products
 Present information based on
relationships
 Go beyond traditional technology with
depth of content
 Drive efficiency using semantic
approach to tagging

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27
Use Case: Go Beyond Search
 Concept instead of keyword search
 Related content and information
drive the content discovery and new
interactions
– SNL40 continuous viewing
 Dynamically tailored to the users
specific attributes or activity

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28
Use Case: ‘Everything Else’
 Tailor views and access to
information with multiple ontologies
 Example: follow scientist from
research to the workbench to
conferences to publishing
 Content delivery tailored to the
users role, activity and location

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29
Top 5 Requirements for Information Providers
Getting data IN fast isn’t the problem – it’s getting insights OUT Faster!
Data is complex – but users want complexity hidden!
Not everyone has permission to access all the data…
Repurpose, repurpose, repurpose. Repeat
Once you attract them – you must be reliable
1
2
3
4
5

Mais conteúdo relacionado

Mais procurados

MarkLogic Semantic use cases Fernando Mesa

IDS: Update on Reference Architecture and Ecosystem DesignBoris Otto

Modern Data Discovery and Integration in Retail BankingCambridge Semantics

From Data Lakes to the Data Fabric: Our Vision for Digital StrategyCambridge Semantics

Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagelsemanticsconference

Industrial Data Space - Why we need a European Initiative on Data SovereigntyThorsten Huelsmann

The Year of the GraphCambridge Semantics

Industrial Data Space Association - New Members, New Insights, New Future Dir...Thorsten Huelsmann

Introducing Industrial Data Space Initiative, CPDP Conferende 2017Thorsten Huelsmann

Architecture Roadmap Visualization using the ArchiMate® 3.0 Modeling LanguageOrbus Software

Big Data and the Semantic Web: Challenges and OpportunitiesSrinath Srinivasa

Tropos - Data as a Service - Business analytics insightTropos.io

AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...Dr. Haxel Consult

Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics

Introduction to Open Services for Lifecycle Collaboration (OSLC)Axel Reichwein

Turning Industrial Data into ValueBoris Otto

International Data Spaces: Data Sovereignty for Business Model InnovationBoris Otto

Linda newsletter issue 1 dec2014LinDa_FP7

Searching Linked Data with SpinqueArjen de Vries

AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...Dr. Haxel Consult

Mais procurados (20)

MarkLogic Semantic use cases

IDS: Update on Reference Architecture and Ecosystem Design

Modern Data Discovery and Integration in Retail Banking

From Data Lakes to the Data Fabric: Our Vision for Digital Strategy

Thomas Kaleske | KN(owl)edge – the Linked Data Platform at Kuehne + Nagel

Industrial Data Space - Why we need a European Initiative on Data Sovereignty

The Year of the Graph

Industrial Data Space Association - New Members, New Insights, New Future Dir...

Introducing Industrial Data Space Initiative, CPDP Conferende 2017

Architecture Roadmap Visualization using the ArchiMate® 3.0 Modeling Language

Big Data and the Semantic Web: Challenges and Opportunities

Tropos - Data as a Service - Business analytics insight

AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabu...

Transforming Data Management and Time to Insight with Anzo Smart Data Lake®

Introduction to Open Services for Lifecycle Collaboration (OSLC)

Turning Industrial Data into Value

International Data Spaces: Data Sovereignty for Business Model Innovation

Linda newsletter issue 1 dec2014

Searching Linked Data with Spinque

AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...

Semelhante a New Trends in Data Management in the Information Industries

Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media

ML_CORP_DECK_PartnersLloyd SOLDATT

The Value of MetadataDATAVERSITY

Big data vendor panel - MarkLogicMikan Associates

Mark logic Industrialize Your Data IOT Berlin Sept 2019Matt Turner

Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...Semantic Web Company

Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior

Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?SnapLogic

IGNITE 2015 Valentijn de Leeuw - Industry 4.0: The industrial Internet of ThingsElemica

1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi

Big Data Fabric 2.0 Drives Data DemocratizationCambridge Semantics

ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY

Cwin16 tls-partner-mark logic-an innovation journey in manufacturingCapgemini

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo

The New Database Frontier: Harnessing the CloudInside Analysis

Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo

When SAP alone is not enoughCloudera, Inc.

Most Significant Trends Impacting Global Supply Chain and Manufacturing Teamsbobferrari823

Industrial Analytix.0accenture

How to Capitalize on Big Data with Oracle Analytics CloudPerficient, Inc.

Semelhante a New Trends in Data Management in the Information Industries (20)

Embedded-ml(ai)applications - Bjoern Staender

ML_CORP_DECK_Partners

The Value of Metadata

Big data vendor panel - MarkLogic

Mark logic Industrialize Your Data IOT Berlin Sept 2019

Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...

Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud

Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?

IGNITE 2015 Valentijn de Leeuw - Industry 4.0: The industrial Internet of Things

1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...

Big Data Fabric 2.0 Drives Data Democratization

ADV Slides: How to Improve Your Analytic Data Architecture Maturity

Cwin16 tls-partner-mark logic-an innovation journey in manufacturing

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...

The New Database Frontier: Harnessing the Cloud

Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI

When SAP alone is not enough

Most Significant Trends Impacting Global Supply Chain and Manufacturing Teams

Industrial Analytix.0

How to Capitalize on Big Data with Oracle Analytics Cloud

Mais de Matt Turner

Data In Action: Business Value of DataMatt Turner

Data2030 Summit MEA: Data Chaos to Data Culture March 2023Matt Turner

Data2030 Summit Data Megatrends Turner Sept 2022.pptxMatt Turner

From Data Chaos to Data CultureMatt Turner

How Data is Driving AI InnovationMatt Turner

Principles of Information AccessMatt Turner

Securing the Right Metadata and Making it Work for YouMatt Turner

Operationalize Your Data and Lead Your Business TransformationMatt Turner

Three Cool Things You Can Do with StandardsMatt Turner

BBC olympics 2012 experience oct18Matt Turner

Operationalize Your Linked DataMatt Turner

Smart Content Summit: Unlock the Value with the Right Data PatternMatt Turner

Data Security and the Hard Outer ShellMatt Turner

Media publishing meetup ocean of data july 2016Matt Turner

Northeastern DB Class Introduction to Marklogic NoSQL april 2016Matt Turner

The Impact of Smart ContentMatt Turner

Metadata Madness: Semantics Takes Center StageMatt Turner

Smart Content Summit - Unlocking Content With Semantics and MetadataMatt Turner

Kloptek Publishers Forum Keynote May 2014Matt Turner

Hollywood IT Summit Metadata PanelMatt Turner

Mais de Matt Turner (20)

Data In Action: Business Value of Data

Data2030 Summit MEA: Data Chaos to Data Culture March 2023

Data2030 Summit Data Megatrends Turner Sept 2022.pptx

From Data Chaos to Data Culture

How Data is Driving AI Innovation

Principles of Information Access

Securing the Right Metadata and Making it Work for You

Operationalize Your Data and Lead Your Business Transformation

Three Cool Things You Can Do with Standards

BBC olympics 2012 experience oct18

Operationalize Your Linked Data

Smart Content Summit: Unlock the Value with the Right Data Pattern

Data Security and the Hard Outer Shell

Media publishing meetup ocean of data july 2016

Northeastern DB Class Introduction to Marklogic NoSQL april 2016

The Impact of Smart Content

Metadata Madness: Semantics Takes Center Stage

Smart Content Summit - Unlocking Content With Semantics and Metadata

Kloptek Publishers Forum Keynote May 2014

Hollywood IT Summit Metadata Panel

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

🐬 The future of MySQL is Postgres 🐘RTylerCroy

GenCyber Cyber Security Day PresentationMichael W. Hawkins

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Slack Application Development 101 Slidespraypatel2

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

A Call to Action for Generative AI in 2024Results

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Partners Life - Insurer Innovation Award 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

New Trends in Data Management in the Information Industries

3. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3 Hierarchical Era For your application data! • Application- and hardware-specific Data Drives the Need for a New Generation Database Relational Era “For all your structured data!” • Normalized, tabular model • Application- independent query • User control Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Heterogeneous data • Faster time-to-results

4. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4 Harnessing Data & Reimagining Applications  Reduce Risk  Manage Compliance  Create New Value from Data  Optimize Operations  Lower TCO / Better IT Economics  Better Decision-making

6. Enterprise NoSQL Database Platform Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document- centric, schema- agnostic database Scalability and Elasticity ACID Transactions Search and Query Semantics Certified Security Hadoop Integration Scale to petabytes of data without over- provisioning or over-spending Avoid data loss, data corruption, and stale reads—even at speed and scale Lightning fast, sophisticated, sub-second search and query across all of your data Store and query linked data as RDF and SPARQL Make your Hadoop better by connecting it to MarkLogic Government- grade, granular, role-based security

7. DECADE+ OF INNOVATION Working Together To Reimagine Applications

8. PUBLISHING: CHANGE IS THE ONLY CONSTANT

10. FROM PUBLISHERS TO INFORMATION PROVIDERS

11. TRADITIONAL PUBLISHING FORM BASED PRODUCTS DEDICATED PRODUCT INFRASTRUCTURE Product A Dedicated Infrastructure (database + search engine) Product B Product C Company Data Industry Data Filings Reports

12. INFORMATION DELIVERY PLATFORM FORMAT INDEPENDEN T INFORMATION CENTRIC DYNAMIC DELIVERY Company Data Industry Data Filings Reports Deliver the right content, to the right user, in the right format, in real time

13. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13 Top 5 Requirements for Information Providers Getting data IN fast isn’t the problem – it’s getting insights OUT Faster! Data is complex – but users want complexity hidden! Not everyone has permission to access all the data… Repurpose, repurpose, repurpose. Repeat Once you attract them – you must be reliable 1 2 3 4 5

14. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14 Traditional Technology  Rows and columns for content strip information Title Publication Date Category Abstract Section Section 2? Science Article 1 3/1/14 Biology Abstract text . . . Section text Section text Research Book 6/4/13 Surgery Abstract text . . . Section text Section text Science Article 2 6/4/05 Chemistry Abstract text . . . Section text Section text ?

15. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15 Traditional Technology  Rows and columns for content strip information  Hierarchical taxonomies overlap and don’t capture the complexity Title Publication Date Category Abstract Section Section 2? Science Article 1 3/1/14 Biology Abstract text . . . Section text Section text Research Book 6/4/13 Surgery Abstract text . . . Section text Section text Science Article 2 6/4/05 Chemistry Abstract text . . . Section text Section text ? Research Medicine Science Surgery Orthopedics Cell Biology Biochemistry …. Life Sciences Biomedical Sciences Cell Biology Biology Biochemistry …Chemistry Microbiology Biochemistry … ?

16. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16 8. Develop, integrate and test infrastructure & applications 4. Define schemas, indexes and services 1. Design infrastructure, services & applications 2. Analyze Data Formats Articles Books Industry Data Reports 5. Build databases, middleware and services infrastructure 6. Define & implement ETL processes The Functional Solution Silos & Treadmill 7. Load and normalize data 3. Define queries & Service APIs ? ?

17. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17 Hierarchical Era For your application data! • Application- and hardware-specific Data Drives the Need for a New Generation Database Relational Era “For all your structured data!” • Normalized, tabular model • Application- independent query • User control Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Heterogeneous data • Faster time-to-results

18. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18  No need to define up front  Matched to complex content and metadata data modeling  Data is managed in its most accessible, natural form  XML, JSON, RDF, geospatial Flexible Data Model Schema-agnostic, structure-aware Result: Product content and data from multiple sources available to be tailored to any purpose and product

19. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19 Search and Query Search to find answers in documents, relationships, and metadata  Automatic indexing of every data value, text and data structure  Specialized indexes for data values (analytics, facets, sorting), geospatial and triples  All updated in the context of ACID transactions to ensure data integrity and real-time access  Accessible via fully programmable search API with full- text search, type-ahead suggestions, facets, snippeting, highlighted search terms, proximity boosting, relevance ranking, and language support JavaScript XQuery SPARQL Rich Query Capability In-database MapReduce Full-text Search Semantic Search Geospatial Search Result: simplified architecture with a single component for search and database

20. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20 Semantics Enterprise triple store, document store, and database combined  Store and query billions of facts and relationships  Leverage ontologies for domain and role specific context access to data and documents  Efficient metadata management with relationships to ontologies  Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces

21. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21 Documents, data and triples provide complete picture of content Semantics Result: context to tailor information to your user’s role, activity and location

22. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22 Scalability, Elasticity and Cloud Massive enterprise scalability and elasticity  Scale horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents  Process thousands of multi-document multi- statement transactions per second  Start small and scale up or down to meet capacity and performance demands without over- provisioning or over-spending  Fully cloud enabled for automated deployment and management on EC2  Leverage dynamic configurations with Tiered Storage D-NODE D-NODE E-NODE E-NODE D-NODE Result: Enterprise-ready to power mission critical products

23. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23 8. Develop, integrate and test infrastructure & applications 1. Design infrastructure, services & applications With MarkLogic… 3. Define queries & Service APIs ? ? When something changes.... It’s no big deal

24. INFORMATION DELIVERY PLATFORM EXTENDED Content and Customers Complete Picture of Business Metrics Driving Product Development and Sales Company Data Industry Data Filings Reports Catalogs Lists Authors Institutions Social Media + Usage

25. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25 Use Case: Master Data  Foundational data for digital products  Industry topology and trends to drive innovation  User and content metrics to drive product development

26. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26 Use Case: Enhance Digital Products  Present information based on relationships  Go beyond traditional technology with depth of content  Drive efficiency using semantic approach to tagging

27. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27 Use Case: Go Beyond Search  Concept instead of keyword search  Related content and information drive the content discovery and new interactions – SNL40 continuous viewing  Dynamically tailored to the users specific attributes or activity

28. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28 Use Case: ‘Everything Else’  Tailor views and access to information with multiple ontologies  Example: follow scientist from research to the workbench to conferences to publishing  Content delivery tailored to the users role, activity and location

29. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29 Top 5 Requirements for Information Providers Getting data IN fast isn’t the problem – it’s getting insights OUT Faster! Data is complex – but users want complexity hidden! Not everyone has permission to access all the data… Repurpose, repurpose, repurpose. Repeat Once you attract them – you must be reliable 1 2 3 4 5

30. Any Questions?

Notas do Editor

These are the key features to focus on when introducing MarkLogic, and each of these is covered in this deck. The previous slide showed ALL of the features that MarkLogic includes, but here we are focusing on the top 7 key features to help explain what MarkLogic is, and what makes the technology so unique and powerful. There is no other database in the world that has this list of features. To start, if you only know 2 things about MarkLogic, it’s the flexible data model and search and query. These two features are core to how MarkLogic works, and underpin a lot of the other features such as MarkLogic’s ability to scale while still maintaining complex and consistent transactions. In MarkLogic 7 we introduced semantics. MarkLogic is a native document store, and also a native triple store. Triples are stored as RDF and queried with SPARQL—formats defined as W3C standard for linked data. With semantics, you can store and query billions of facts and relationships, and even infer new facts. These facts and relationships provide context for better search and provide flexible data modeling to integrate and link data from different sources. Scalability and elasticity, ACID transactions, and security are three of MarkLogic’s key “enterprise” features to ensure you can easily store and manage all of your data while not breaking the bank, losing any data, or allowing data to get into the wrong hands. It turns out that these features are not to be taken for granted, because they are really hard to do right. MarkLogic has spent a decade building a hardened, trusted platform, and these features are some of the reasons why MarkLogic is the leading enterprise NoSQL database. Lastly, MarkLogic integrates easily with Hadoop and will make Hadoop better. Hadoop has gotten traction lately but most people realize now that it’s not a database. It’s a great place to put your data, and MarkLogic has a lot of unique ways for doing more with your data if you currently have it in Hadoop.
[Celebrate the success of what our customers have been able to achieve over the last decade] MarkLogic recently celebrated 10 years on the market. And, it’s been 10 years, working side by side with publishers and media, to reimagine what publishing is. Over 10 years, your businesses have changed dramatically – and not surprisingly – with the web and kindles and ipads, it was digitize or die. Because you were forced into reimainging your business and the technology that drives it, Publishing and media have lead the way in doing more with data. I’m often surprised at how many other industries are just waking up to the notion that there are other ways to store and use data rather than the traditional relational way they’ve been doing it for 30 years. Rather than treating your content as flat files, or cramming it into database cells, you’ve been using the right tool for the job – which has allowed you to do more with your content, repurpose it, be agile, move quickly and create and deliver products fast. Amazing to see how many organizations stick to using the hammer to get the screw out of the board. With the right tools, you can do more. You can create more. You can repurpose more. Development cycles go from Months not years - Handfuls of developers not armies Re-emphasis the benefit of using the right tool.
Or Top 10 Search Requirements we hear from today’s most successful information providers… This is a subject we love – change is the only constant and this is what we ene
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
I think about this in terms of the move to information provider Putting the value of information in front of the form of delivery And I’m not alone
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
*Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document. With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file. The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried. More Information on Schemas A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.
There is a change control cost in between each one of these steps – not just doing the same job multiple times but also incurring change costs! Bad guy = all the tools you’re using to do this – RDBMS, ETL, etc. Change control processes are what’s stopping you from being productive! Lots of paperwork involved…
Short Description: MarkLogic has built-in search and query capabilities. MarkLogic’s sophisticated indexes provide the power to search and query across hundreds of terabytes worth of documents, relationships, and metadata with the flexibility of multiple query languages. *Note: Server-side JavaScript is a MarkLogic 8 feature. Longer Description: Most databases separate search and query into two distinct functions. MarkLogic changes that, starting with the idea that you should be able to ask your database what’s inside of it. This means not having to bolt-on a separate search solution, and not having to worry about when and how to build the right indexes, or how those indexes can be utilized to perform certain queries. MarkLogic is designed with over 30 sophisticated indexes that can be adjusted and tuned to make even the most complex queries as fast as possible without requiring data duplication, and data is ingested as-is and immediately searchable. The sophisticated indexes mean that developers can ask harder questions and get faster responses. MarkLogic uses multiple query languages for each data types (JavaScript for JSON, XQuery for XML, and SPARQL for RDF). These query languages enable full-text search across unstructured content, rich query capability needed to make complex queries fast, Geospatial search for multiple formats and types (including connections to ESRI ArcGIS and Google Maps), Semantic search across linked data (similar to graph search, and MarkLogic 8 even includes inferencing), and also in-database MapReduce for running massive parallelized queries. One of the unique capabilities of MarkLogic is that the indexes are designed so that developers can write complex queries that run across multiple indexes without causing a performance bottleneck. With MarkLogic, you can query data as-is, or transform and manage data in-place—all with the reliability of a transactional system that maintains full ACID properties. But, it’s important not to overlook the enterprise search experience. Many of MarkLogic’s first customers such as Elsevier were publishers who just needed a way to quickly search across massive amounts of content. The user experience is not too different from that of any major Web search engines, and in fact, MarkLogic’s founder Christopher Lindblad came from the search world, having been the architect on Ultraseek Server, an early enterprise search application developed at Infoseek. MarkLogic has many of the same features that user’s now expect in an enterprise search application, such as type-ahead suggestions, relevance ranking, and snippeting. MarkLogic also includes language support for over 200 languages, including advanced support with tokenization, stemming, and collation for some of the most common languages. And, just to reiterate, all of this comes built-in with MarkLogic—you don’t have to bolt-on any other solution. This simplifies your architecture, and makes things incredibly easy for DBAs and developers. Having integrated search means one less additional platform to worry about. Developers don’t have to use a “lite” version of other search software during testing and eliminate additional, and unnecessary ETL procedures, which reduces risk. System-wide setting such as security are setup once and applied everywhere. If permissions are updated on documents, those updates are reflected automatically and immediately in searches.
Short Description: Store RDF triples and query them using SPARQL—providing meaning and context to your data using the only database that can handle a combination of documents, data, and triples. *Note: MarkLogic 8 extends the use of standard SPARQL so you can do analytics (aggregates) over triples; explore semantics graphs using property paths; and update semantic triples; all using the standard SPARQL 1.1 language over standard protocols. In addition, MarkLogic 8 lets you discover new facts and relationships with automatic inference. Long Description: Semantics provides a universal framework to describe and link different data so that it can be better understood and searched holistically, allowing both people and computers to see and discover relationships in the data. MarkLogic provides the capability to store and query linked data, including a native RDF Triple Store for storing and managing hundreds of billions of triples that can be queried with SPARQL—all right inside MarkLogic. Not only that, but MarkLogic combines the triple store with its document store, providing the capability to store and manage documents, data, and triples together so you can discover, understand, and make decisions in context. Script for Presenting: Enterprise triple store, document store, database …combined MarkLogic Semantics adds the capabilities of an Enterprise Triple Store to its document store and database. Store and query billions of facts and relationships; infer new facts The triple store lets you store and query billions of facts (assertions) and relationships. Facts/relationships are represented as triples, made up of subject, predicate, and object For example, we can represent the facts "John lives in London" and "London is in England" as triples like this: Subject Predicate Object John livesIn London London isIn England We can also infer new facts. From what we (as humans) know about "livesIn" and "isIn", we can infer that John lives in England. The triple store can do that too – you can specify rules that say exactly what a predicate means, and the triple store will infer new facts when querying. Many of these rules are specified in the RDFS and OWL specifications, and can be applied in MarkLogic queries out of the box. Facts and relationships provide context for better search Imagine how much better a search application can be if the app has access to billions of facts and relationships. The app can leverage those facts in several ways (see future slide): Find more relevant information by expanding the terms the user typed in Present more/better information about whatever the user is searching for Publish information dynamically to web or print or mobile Flexible data modeling - integrate and link data from different sources Triples are atomic and schemaless – so they are easy to share, easy to combine. When you model data as triples, it's easy to load the data as-is, and query across all your data. You can also link data from different sources by creating new triples. For example, if you have information about the same customer from two sources, and one source calls the customer "cust123" while the other calls the same customer "cus_id_456", Simply add a triple cust123 sameAs cus_id_456 and you can query across all the information about that customer in a single simple query. As well as creating and extracting your own triples, there are billions of triples available on the Open Linked Data web. For example, you can download sections of dbpedia (the triples version of wikipedia) Einstein was born in Germany Buzz Aldrin was on the crew of Apollo 11 A labrador is a type of dog Or you can download facts from Geonames: London is in England London has a population of 7,504,800 London is at lat/long position 51.5/-0.16667 Or you can go to data.gov to get facts about food from the Dept of Agriculture (http://data-gov.tw.rpi.edu/wiki/Dataset_1294) Pineapple juice has 140 calories per serving See http://www.w3.org/wiki/DataSetRDFDumps for a partial listing of RDF data available for download and ingestion into MarkLogic. See http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete for a listing of Open Government RDF datasets. Standards-based for ease of use and integration MarkLogic Semantics is based on W3C standards. RDF describes the data model for facts and relationships (http://www.w3.org/RDF/). MarkLogic can load RDF files in all the popular RDF formats – RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, and TriG (http://docs.marklogic.com/guide/semantics/loading#id_70682) SPARQL is the W3C standard language for querying RDF. MarkLogic supports SPARQL 1.1, which includes paths, aggregates, and inserts/deletes. (http://www.w3.org/TR/sparql11-query/ and http://www.w3.org/TR/sparql11-update/) MarkLogic also supports standard interfaces. http://www.w3.org/TR/sparql11-protocol/ defines a SPARQL endpoint, which is a standard REST endpoint for SPARQL queries. http://www.w3.org/TR/sparql11-http-rdf-update/ defines the Graph Store HTTP Protocol, which is a standard REST endpoint for managing RDF graphs. Even better with search, bitemporal The real power of MarkLogic comes not from a single feature, but in the ability to combine features in a single, powerful query. Semantics isn't a product, it's a feature of a product. MarkLogic Semantics works particularly well with search (including GeoSpatial search) and bitemporal. In MarkLogic, you can embed triples in XML or JSON documents and run combination queries. You can combine SPARQL and cts:query in two ways: run a SPARQL query that is filtered by a cts:query condition; or embed a cts:triple-range-query (which returns a cts:query) in a cts:search. For example, you might want to ask "show me all the people who met with John". If you have triples of the form "john metWith X", that's a simple SPARQL query. But if those triples are embedded in the documents where that fact was asserted or discovered – say, a police report or e-mail exchange – you can ask much richer questions such as "show me all the people who met with John, where the fact was discovered in the last 6 months and the source is a police report from a county in the eastern US and that report also mentions some kind of weapon and some kind of controlled substance". Or you might want to ask "how many emails and tweets in my sample are generally positive?" If you have triples of the form "message1002 hasSentiment +9", that's a simple SPARQL query. But if those triples are embedded in the messages, you can ask much richer questions such as "show me snippets of all the messages that were overwhelmingly positive, and were sent by someone who is an executive of a Fortune 500 company, between these dates, and which mention the companies ‘IBM’ and ‘Oracle’, and mention a word that has something to do with takeovers or acquisitions". Bitemporal (MarkLogic 8 feature): Bitemporal Data Management handles historical data along two different timelines, making it possible to rewind the information “as it actually was” in combination with “as it was recorded” at some point in time. It facilitates the creation of complete audit trail of data. Since you can compose SPARQL and cts:query, you can do a bitemporal SPARQL query! Simply run the SPARQL query with a cts:query constraint over one or both bitemporal axes.
Short Description: MarkLogic scales horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents—and still processes thousands of transactions per second. Longer Description: Elasticity and scalability are critical to address the growing volumes of data. By 2020, the digital universe will grow to 40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child). The need already exists to process petabytes worth of data fast and with low overhead. MarkLogic allows you to start small or go big. From 3 node clusters to 250+ node clusters or 10,000 documents to 1 Billion—MarkLogic scales horizontally as your data grows or shrinks. You can add or remove nodes easily, helping you keep the database in line with performance needs without over-provisioning. And, MarkLogic doesn’t need “big iron.” Run it on cost-effective commodity hardware in any environment—in the cloud, virtualized, or on-premises. MarkLogic also handles thousands of transactions per second, even at scale—all while maintaining full ACID properties. This unique capability positioned MarkLogic as the best choice to run healthcare.gov and a large operational trade store at a top investment bank. Performance usually suffers at scale with most databases. But, MarkLogic scales easily to handle hundreds of Terabytes using shared-nothing architecture. Data partitions are completely independent of each other and can act independently. So, when you need more partitions, you just add more and queries run just as efficient as they did with the first cluster. Changing cluster configurations is a pain with most databases but MarkLogic provides easy administration to add or remove clusters. Another feature that helps you manage your data at scale is tiered storage. MarkLogic tiered storage provides the ability to store and manage data in different tiers based on cost and performance trade-offs—whether it’s flash storage, traditional local or shared disk storage, HDFS, or Amazon cloud storage. With tiered storage, data is easily migrated between these tiers without any ETL, additional software, or expensive infrastructure changes. Organizations can easily balance performance and capacity through the information lifecycle—meeting performance SLAs and making data governance easy. MarkLogic Large Deployment Example 4 clusters 16 databases 200 D-Nodes 50 E-Nodes 800 Forests 1.2B+ documents 22k QPS 45 racks 1PB of storage 57TB of RAM 15K cores of compute
With MarkLogic – keep going / no traffic lights - We’ve got a single platform with database, built-in search, and application services so there’s less work up front - We don’t analyze data formats, just load ‘em in! When it comes to schemas – evolution not revolution – don’t have to stop, and if you pull a wire out the thing doesn’t break; “sustainable evolution” (way to describe semantics) You’ve only got one database and infrastructure, so nothing to do there…. There’s no complicated ETL or data normalization required… And our robust single stack platform of database, search, and application services means there’s less to test - LESS TO TEST / LESS CHANGE, FASTER TESTING, LESS COST – FASTER TO VALUE – “GO FASTER” STRIPES ON HERE
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
*Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document. With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file. The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried. More Information on Schemas A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.
*Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document. With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file. The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried. More Information on Schemas A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.

New Trends in Data Management in the Information Industries

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a New Trends in Data Management in the Information Industries

Semelhante a New Trends in Data Management in the Information Industries (20)

Mais de Matt Turner

Mais de Matt Turner (20)

Último

Último (20)

New Trends in Data Management in the Information Industries

Notas do Editor