Presentation from the Copyright Clearance Center Distinguished Speaker Series presentation February 26th, 2015.
As the publishing industry is transforming from form based, single purpose products to information providers focused on the curation of data and content tailoring its delivery to the role, action and location of the users, there has been a parallel transformation in the management of the data and content that are the raw materials for these products.
Matt Turner, MarkLogic’s CTO for Media and Publishing, will talk about the new generation of information management technology focusing on how they are helping transform the information industries and revolutionize how people think about managing data and content.
Topic that will be covered include NoSQL / new generation databases, search, and semantic technology and information product trends with example of innovative teams leveraging these new capabilities.
6. Enterprise NoSQL Database Platform
Flexible Data
Model
Store and manage
JSON, XML, RDF,
and Geospatial data
with a document-
centric, schema-
agnostic database
Scalability
and Elasticity
ACID
Transactions
Search and
Query
Semantics Certified
Security
Hadoop
Integration
Scale to
petabytes of data
without over-
provisioning or
over-spending
Avoid data loss,
data corruption,
and stale
reads—even at
speed and scale
Lightning fast,
sophisticated,
sub-second
search and
query across all
of your data
Store and query
linked data as
RDF and
SPARQL
Make your
Hadoop better
by connecting
it to MarkLogic
Government-
grade, granular,
role-based
security
24. INFORMATION DELIVERY PLATFORM EXTENDED
Content and
Customers
Complete Picture of
Business
Metrics Driving Product
Development and
Sales
Company Data
Industry Data
Filings
Reports
Catalogs Lists
Authors Institutions Social Media +
Usage
These are the key features to focus on when introducing MarkLogic, and each of these is covered in this deck.
The previous slide showed ALL of the features that MarkLogic includes, but here we are focusing on the top 7 key features to help explain what MarkLogic is, and what makes the technology so unique and powerful. There is no other database in the world that has this list of features. To start, if you only know 2 things about MarkLogic, it’s the flexible data model and search and query. These two features are core to how MarkLogic works, and underpin a lot of the other features such as MarkLogic’s ability to scale while still maintaining complex and consistent transactions.
In MarkLogic 7 we introduced semantics. MarkLogic is a native document store, and also a native triple store. Triples are stored as RDF and queried with SPARQL—formats defined as W3C standard for linked data. With semantics, you can store and query billions of facts and relationships, and even infer new facts. These facts and relationships provide context for better search and provide flexible data modeling to integrate and link data from different sources.
Scalability and elasticity, ACID transactions, and security are three of MarkLogic’s key “enterprise” features to ensure you can easily store and manage all of your data while not breaking the bank, losing any data, or allowing data to get into the wrong hands. It turns out that these features are not to be taken for granted, because they are really hard to do right. MarkLogic has spent a decade building a hardened, trusted platform, and these features are some of the reasons why MarkLogic is the leading enterprise NoSQL database.
Lastly, MarkLogic integrates easily with Hadoop and will make Hadoop better. Hadoop has gotten traction lately but most people realize now that it’s not a database. It’s a great place to put your data, and MarkLogic has a lot of unique ways for doing more with your data if you currently have it in Hadoop.
[Celebrate the success of what our customers have been able to achieve over the last decade]
MarkLogic recently celebrated 10 years on the market. And, it’s been 10 years, working side by side with publishers and media, to reimagine what publishing is. Over 10 years, your businesses have changed dramatically – and not surprisingly – with the web and kindles and ipads, it was digitize or die.
Because you were forced into reimainging your business and the technology that drives it, Publishing and media have lead the way in doing more with data. I’m often surprised at how many other industries are just waking up to the notion that there are other ways to store and use data rather than the traditional relational way they’ve been doing it for 30 years.
Rather than treating your content as flat files, or cramming it into database cells, you’ve been using the right tool for the job – which has allowed you to do more with your content, repurpose it, be agile, move quickly and create and deliver products fast. Amazing to see how many organizations stick to using the hammer to get the screw out of the board.
With the right tools, you can do more. You can create more. You can repurpose more. Development cycles go from Months not years - Handfuls of developers not armies
Re-emphasis the benefit of using the right tool.
Or Top 10 Search Requirements we hear from today’s most successful information providers…
This is a subject we love – change is the only constant and this is what we ene
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7.
Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features**
MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more.
Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility**
Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast.
Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements**
MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
I think about this in terms of the move to information provider
Putting the value of information in front of the form of delivery
And I’m not alone
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7.
Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features**
MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more.
Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility**
Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast.
Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements**
MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7.
Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features**
MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more.
Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility**
Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast.
Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements**
MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
*Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document.
With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file.
The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried.
More Information on Schemas
A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.
There is a change control cost in between each one of these steps – not just doing the same job multiple times but also incurring change costs!
Bad guy = all the tools you’re using to do this – RDBMS, ETL, etc.
Change control processes are what’s stopping you from being productive! Lots of paperwork involved…
Short Description:
MarkLogic has built-in search and query capabilities. MarkLogic’s sophisticated indexes provide the power to search and query across hundreds of terabytes worth of documents, relationships, and metadata with the flexibility of multiple query languages.
*Note: Server-side JavaScript is a MarkLogic 8 feature.
Longer Description:
Most databases separate search and query into two distinct functions. MarkLogic changes that, starting with the idea that you should be able to ask your database what’s inside of it. This means not having to bolt-on a separate search solution, and not having to worry about when and how to build the right indexes, or how those indexes can be utilized to perform certain queries. MarkLogic is designed with over 30 sophisticated indexes that can be adjusted and tuned to make even the most complex queries as fast as possible without requiring data duplication, and data is ingested as-is and immediately searchable.
The sophisticated indexes mean that developers can ask harder questions and get faster responses. MarkLogic uses multiple query languages for each data types (JavaScript for JSON, XQuery for XML, and SPARQL for RDF). These query languages enable full-text search across unstructured content, rich query capability needed to make complex queries fast, Geospatial search for multiple formats and types (including connections to ESRI ArcGIS and Google Maps), Semantic search across linked data (similar to graph search, and MarkLogic 8 even includes inferencing), and also in-database MapReduce for running massive parallelized queries. One of the unique capabilities of MarkLogic is that the indexes are designed so that developers can write complex queries that run across multiple indexes without causing a performance bottleneck. With MarkLogic, you can query data as-is, or transform and manage data in-place—all with the reliability of a transactional system that maintains full ACID properties.
But, it’s important not to overlook the enterprise search experience. Many of MarkLogic’s first customers such as Elsevier were publishers who just needed a way to quickly search across massive amounts of content. The user experience is not too different from that of any major Web search engines, and in fact, MarkLogic’s founder Christopher Lindblad came from the search world, having been the architect on Ultraseek Server, an early enterprise search application developed at Infoseek. MarkLogic has many of the same features that user’s now expect in an enterprise search application, such as type-ahead suggestions, relevance ranking, and snippeting. MarkLogic also includes language support for over 200 languages, including advanced support with tokenization, stemming, and collation for some of the most common languages.
And, just to reiterate, all of this comes built-in with MarkLogic—you don’t have to bolt-on any other solution. This simplifies your architecture, and makes things incredibly easy for DBAs and developers. Having integrated search means one less additional platform to worry about. Developers don’t have to use a “lite” version of other search software during testing and eliminate additional, and unnecessary ETL procedures, which reduces risk. System-wide setting such as security are setup once and applied everywhere. If permissions are updated on documents, those updates are reflected automatically and immediately in searches.
Short Description:
Store RDF triples and query them using SPARQL—providing meaning and context to your data using the only database that can handle a combination of documents, data, and triples.
*Note: MarkLogic 8 extends the use of standard SPARQL so you can do analytics (aggregates) over triples; explore semantics graphs using property paths; and update semantic triples; all using the standard SPARQL 1.1 language over standard protocols. In addition, MarkLogic 8 lets you discover new facts and relationships with automatic inference.
Long Description:
Semantics provides a universal framework to describe and link different data so that it can be better understood and searched holistically, allowing both people and computers to see and discover relationships in the data. MarkLogic provides the capability to store and query linked data, including a native RDF Triple Store for storing and managing hundreds of billions of triples that can be queried with SPARQL—all right inside MarkLogic. Not only that, but MarkLogic combines the triple store with its document store, providing the capability to store and manage documents, data, and triples together so you can discover, understand, and make decisions in context.
Script for Presenting:
Enterprise triple store, document store, database …combined
MarkLogic Semantics adds the capabilities of an Enterprise Triple Store to its document store and database.
Store and query billions of facts and relationships; infer new facts
The triple store lets you store and query billions of facts (assertions) and relationships.
Facts/relationships are represented as triples, made up of subject, predicate, and object
For example, we can represent the facts "John lives in London" and "London is in England" as triples like this:
Subject Predicate Object
John livesIn London
London isIn England
We can also infer new facts. From what we (as humans) know about "livesIn" and "isIn", we can infer that John lives in England.
The triple store can do that too – you can specify rules that say exactly what a predicate means, and the triple store will infer new facts when querying.
Many of these rules are specified in the RDFS and OWL specifications, and can be applied in MarkLogic queries out of the box.
Facts and relationships provide context for better search
Imagine how much better a search application can be if the app has access to billions of facts and relationships.
The app can leverage those facts in several ways (see future slide):
Find more relevant information by expanding the terms the user typed in
Present more/better information about whatever the user is searching for
Publish information dynamically to web or print or mobile
Flexible data modeling - integrate and link data from different sources
Triples are atomic and schemaless – so they are easy to share, easy to combine.
When you model data as triples, it's easy to load the data as-is, and query across all your data.
You can also link data from different sources by creating new triples.
For example, if you have information about the same customer from two sources, and one source calls the customer "cust123" while the other calls the same customer "cus_id_456",
Simply add a triple
cust123 sameAs cus_id_456
and you can query across all the information about that customer in a single simple query.
As well as creating and extracting your own triples, there are billions of triples available on the Open Linked Data web.
For example, you can download sections of dbpedia (the triples version of wikipedia)
Einstein was born in Germany
Buzz Aldrin was on the crew of Apollo 11
A labrador is a type of dog
Or you can download facts from Geonames:
London is in England
London has a population of 7,504,800
London is at lat/long position 51.5/-0.16667
Or you can go to data.gov to get facts about food from the Dept of Agriculture (http://data-gov.tw.rpi.edu/wiki/Dataset_1294)
Pineapple juice has 140 calories per serving
See http://www.w3.org/wiki/DataSetRDFDumps for a partial listing of RDF data available for download and ingestion into MarkLogic.
See http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete for a listing of Open Government RDF datasets.
Standards-based for ease of use and integration
MarkLogic Semantics is based on W3C standards.
RDF describes the data model for facts and relationships (http://www.w3.org/RDF/).
MarkLogic can load RDF files in all the popular RDF formats – RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, and TriG
(http://docs.marklogic.com/guide/semantics/loading#id_70682)
SPARQL is the W3C standard language for querying RDF.
MarkLogic supports SPARQL 1.1, which includes paths, aggregates, and inserts/deletes.
(http://www.w3.org/TR/sparql11-query/ and http://www.w3.org/TR/sparql11-update/)
MarkLogic also supports standard interfaces.
http://www.w3.org/TR/sparql11-protocol/ defines a SPARQL endpoint, which is a standard REST endpoint for SPARQL queries.
http://www.w3.org/TR/sparql11-http-rdf-update/ defines the Graph Store HTTP Protocol, which is a standard REST endpoint for managing RDF graphs.
Even better with search, bitemporal
The real power of MarkLogic comes not from a single feature, but in the ability to combine features in a single, powerful query.
Semantics isn't a product, it's a feature of a product.
MarkLogic Semantics works particularly well with search (including GeoSpatial search) and bitemporal.
In MarkLogic, you can embed triples in XML or JSON documents and run combination queries.
You can combine SPARQL and cts:query in two ways: run a SPARQL query that is filtered by a cts:query condition; or embed a cts:triple-range-query (which returns a cts:query) in a cts:search.
For example, you might want to ask "show me all the people who met with John".
If you have triples of the form "john metWith X", that's a simple SPARQL query.
But if those triples are embedded in the documents where that fact was asserted or discovered – say, a police report or e-mail exchange – you can ask much richer questions such as
"show me all the people who met with John, where the fact was discovered in the last 6 months and the source is a police report from a county in the eastern US and that report also mentions some kind of weapon and some kind of controlled substance".
Or you might want to ask "how many emails and tweets in my sample are generally positive?"
If you have triples of the form "message1002 hasSentiment +9", that's a simple SPARQL query.
But if those triples are embedded in the messages, you can ask much richer questions such as "show me snippets of all the messages that were overwhelmingly positive, and were sent by someone who is an executive of a Fortune 500 company, between these dates, and which mention the companies ‘IBM’ and ‘Oracle’, and mention a word that has something to do with takeovers or acquisitions".
Bitemporal (MarkLogic 8 feature):
Bitemporal Data Management handles historical data along two different timelines, making it possible to rewind the information “as it actually was” in combination with “as it was recorded” at some point in time. It facilitates the creation of complete audit trail of data.
Since you can compose SPARQL and cts:query, you can do a bitemporal SPARQL query!
Simply run the SPARQL query with a cts:query constraint over one or both bitemporal axes.
Short Description:
MarkLogic scales horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents—and still processes thousands of transactions per second.
Longer Description:
Elasticity and scalability are critical to address the growing volumes of data. By 2020, the digital universe will grow to 40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child). The need already exists to process petabytes worth of data fast and with low overhead. MarkLogic allows you to start small or go big. From 3 node clusters to 250+ node clusters or 10,000 documents to 1 Billion—MarkLogic scales horizontally as your data grows or shrinks. You can add or remove nodes easily, helping you keep the database in line with performance needs without over-provisioning. And, MarkLogic doesn’t need “big iron.” Run it on cost-effective commodity hardware in any environment—in the cloud, virtualized, or on-premises. MarkLogic also handles thousands of transactions per second, even at scale—all while maintaining full ACID properties. This unique capability positioned MarkLogic as the best choice to run healthcare.gov and a large operational trade store at a top investment bank.
Performance usually suffers at scale with most databases. But, MarkLogic scales easily to handle hundreds of Terabytes using shared-nothing architecture. Data partitions are completely independent of each other and can act independently. So, when you need more partitions, you just add more and queries run just as efficient as they did with the first cluster. Changing cluster configurations is a pain with most databases but MarkLogic provides easy administration to add or remove clusters.
Another feature that helps you manage your data at scale is tiered storage. MarkLogic tiered storage provides the ability to store and manage data in different tiers based on cost and performance trade-offs—whether it’s flash storage, traditional local or shared disk storage, HDFS, or Amazon cloud storage. With tiered storage, data is easily migrated between these tiers without any ETL, additional software, or expensive infrastructure changes. Organizations can easily balance performance and capacity through the information lifecycle—meeting performance SLAs and making data governance easy.
MarkLogic Large Deployment Example
4 clusters
16 databases
200 D-Nodes
50 E-Nodes
800 Forests
1.2B+ documents
22k QPS
45 racks
1PB of storage
57TB of RAM
15K cores of compute
With MarkLogic – keep going / no traffic lights
- We’ve got a single platform with database, built-in search, and application services so there’s less work up front
- We don’t analyze data formats, just load ‘em in!
When it comes to schemas – evolution not revolution – don’t have to stop, and if you pull a wire out the thing doesn’t break; “sustainable evolution” (way to describe semantics)
You’ve only got one database and infrastructure, so nothing to do there….
There’s no complicated ETL or data normalization required…
And our robust single stack platform of database, search, and application services means there’s less to test - LESS TO TEST / LESS CHANGE, FASTER TESTING, LESS COST – FASTER TO VALUE – “GO FASTER” STRIPES ON HERE
“New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7.
Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features**
MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more.
Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility**
Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast.
Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements**
MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
*Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document.
With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file.
The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried.
More Information on Schemas
A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.
*Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document.
With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file.
The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried.
More Information on Schemas
A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.