SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
12 Things the Semantic Web Should
Know about Content Analytics
Seth Grimes, Alta Plana Corporation
June 2011 | Sponsored by OpenText
Content analytics is sense-making technology. It semanticizes online, social, and
enterprise content. It facilitates semantic data integration, search, and information
management and is an underappreciated foundational technology for building the
Semantic Web. Technologists and business leaders alike will benefit from understanding
the role content analytics plays in semantic computing, starting with 12 essential points.
The Semantic Web and Content Analytics...............................................................3	
   Entity extraction is a form of content analytics ................................................3	
2. There are more entities that are dreamt of in DBpedia, Freebase,, and the like................................................................................4	
3. Content analytics discovers, annotates, and extracts the broad range of
information in content, far beyond entities.......................................................4	
4. Content analytics handles subjectivity: Sentiment, opinion, and emotion..........5	
5. Content covers more than just text managed in a content management
system and published to the web ....................................................................6	
6. Content analytics is part of a collection of complementary and overlapping
analytical technologies ....................................................................................7	
7. Content analytics generates semantic and structural metadata ........................7	
8. Content analytics facilitates semantic search and semantic data integration....8	
9. Content analytics scales from individual messages to wide data spaces
and large corpora ............................................................................................9	
10. Content analytics can operate in real time for a wide variety of business
goals and business domains ...........................................................................9	
11. Content Analytics is delivered installed, on the cloud, and as-a-service:
Your choice......................................................................................................9	
12. Content analytics can be customized, extended, and configured via
inclusion of controlled vocabularies, taxonomies, and ontologies.................10	
Semantic computing exploits machine-represented meaning to enhance search, data
integration, knowledge management, and information-centered business processes. The
ultimate goal is to enable automated knowledge discovery and business-process
execution across a linked data web. However, this goal will not be reachable in any
meaningful sense unless and until a broad set of information-rich endpoints is available
for major business and personal purposes. These Semantic Web endpoints – triple
stores that capture entities and relationships, supporting distributed query and inference –
and other forms of semantically annotated content aren’t instantiated and populated by
some magical process. They must be created.
The creation of meaning – the generation of structured information from “unstructured”
sources – is the province of content analytics. Content analytics, along with modern
applications that couple content production and annotation along with efforts to map
databases into linked-data repositories, are the foundational technologies that facilitate
semantic computing and populate the Semantic Web.
So long as the Semantic Web lacks a critical mass of usable data from online, social, and
enterprise sources, the Semantic Web will have form but not function. The set of core
Semantic Web technologies, a stack of standards and protocols, on their own are not
enough. The Semantic Web and broader semantic computing need data, yet almost no
historical information, and very little of the information being produced today is in
semantic formats. Content analytics can extract semantics for that mass of
“unstructured” information to provide semantic structure. By semanticizing the range of
existing content, content analytics can and will fuel the realization of the Semantic Web.
The Semantic Web and Content Analytics
Despite its very important (and as yet mostly potential) Semantic Web role, and despite
the business value being delivered today by content analytics, the technology, solutions,
and broader applications are not sufficiently well understood; hence this paper, 12 Things
the Semantic Web (and semantic computing practitioners) Should Know about Content
Analytics. Let us start with a fundamental point:
1. Entity extraction is a form of content analytics
Entities are concrete things, often named in some form of lexicon; for example, people
(Thor, Barack Obama), companies (IBM, General Motors), places (Paris, Canada), events
(the World Series), enzymes (hexokinase), and even research papers (“The
Unreasonable Effectiveness of Data”). Entity extraction is a process that starts by finding
entities in source materials, whether web pages, email, audio streams, images, or some
other material of interest. Once discerned, the entity is disambiguated (Is “Ford” a car, an
industrial company, an actor [which?], a theater, or a place to cross a river?). Then it is
typed (Person, Organization, etc.), and (perhaps) mapped into a canonical form
according to a controlled vocabulary. It may be designated with a uniform resource
identifier that facilitates associating diverse information to the source material.
Entity extraction is a form of content analysis. It involves reaching into the content,
whatever its form, and understanding the inherent structure that is apparent to any
educated human reader: the “chunks” into which text and other content is separated, the
word morphology, grammar, and larger-scale structure that humans grasp without
conscious reflection. The parsing steps may seem simple, but tasks such as
disambiguation, which entails consideration of context and usage, decidedly are not.
Vikings in a sports article are different from Vikings in a history text; beyond document
type, word sequence “the Vikings lost their fourth straight game” tells us which sense of
Vikings is in play. Yet –
2. There are more entities that are dreamt of in DBpedia,
Freebase,, and the like
Common entity sources do not cover all business, scientific, news, or cultural domains.
An entity annotation service designed foremost for financial news sources won’t help you
much with laboratory science or understanding Iraqi Arabic blog chatter.
Content analytics tools support a variety of techniques that allow you to go beyond the
common sources. Tools may allow you to import and apply your own lexicons and
taxonomies, and they may infer new entities via syntactic analysis and machine learning
(techniques that decode grammar and apply pattern analyses to build or expand on a list
of features of interest). Further, content analytics may resolve anaphora, including
pronouns as well as other forms of co-reference, accepting different ways of referring to a
single thing. The application of natural-language processing helps us understand that in
the text –
“Sarkozy's desire to become the new President's main international partner – and,
indeed, personal friend – was palpable. Consequently, the famously passionate and
emotive Frenchman responded to Obama's reserved personality…”
– “the new President” is Obama and “the famously passionate and emotive Frenchman”
is Sarkozy. But entities are not all that content analytics can find.
3. Content analytics discovers, annotates, and extracts the
broad range of information in content, far beyond entities
RDF schemas capture relationships among entities: FriendOf, EmployedBy, OwnerOf,
and so on; the lists are long, varying by data space. Entity relationships may be
engineered in a top-down, prescriptive manner, or they may be mapped from sources
such as relational databases that capture relationships. Wherever they originate,
relationships are the key to knowledge and raw material for inference.
If your approach is to extract entities and restrict yourself to relationships expressed in
ontologies or other knowledge repositories, you may be leaving vast amounts of valuable
information unanalyzed. Source materials capture and express relationships. After all, a
blog posting, a tweet, an article, an e-mail message, a video: every form of content was
created to communicate. It would be silly to parse a news article and report that country
X, person Y, and company Z were mentioned without also extracting the entity
relationships present in the text.
Content may contain conventional data, and not just in marked-up data tables. Consider
a sentence from a datelined article,
“The Dow Jones Industrial Average finished the trading day at 12,605.32, up 45.14
points (0.36 percent). The S&P 500 closed at 1,343.6, up 2.92 points (0.22 percent).”
Content analytics can extract this data, to RDF or to a database table, along with
metadata such as the names of the article author and publication, the publication date,
the article’s URL, as well as other available information from HTML Meta tags and page-
embedded FOAF, RDFa, or other microcode. Content analytics can infer from the text –
“Among actively traded Colorado stocks, Accelr8 Technology Corp. (AXK)...”
– that (possibly) named entities Accelr8 Technology Corp., AXK, and Colorado are
related; sophisticated content analytics will ascribe the ticker symbol AXK to Accelr8 and
capture that Accelr8 is located in the geographic area Colorado. Beyond these facts and
relationships, strong content analytics will associate the conceptual class “stock market
index” with the DJIA and S&P 500 and will identify topics such as “financial markets
reporting” and themes such as “the economy” with the source article.
How far beyond entities?
4. Content analytics handles subjectivity: Sentiment,
opinion, and emotion
We can classify information as factual or as subjective. Attitudinal information –
sentiment, opinions, emotions – is very important to business applications that include
customer service and support, marketing, product, and service quality, contextual
advertising placement, and policy and politics. A business that is listening will pick up on
tweets such as –
@robwolfeusa Wow, at #Hilton in Long Island. Exec floor room guaranteed not
available and no rooms clean and available at 4:30PM.
– that indicate problems. Content analytics, in this instance, will understand what hotel
property is being referred to, what the issue was, and who was posting (the potential
often exists to match a social handle to a name or other identifying information and from
there to actual business transactions); this facilitates processing and quick responses.
This example looks at and matches individual records; content analytics is also applied to
aggregate sentiment, classified by familiar categories such as location, age, and sex as
well as by company specific dimensions such as product and location.
This class of subjectivity analysis looks for the voice of the customer (or prospect,
influencer, voter, patient, or market) as expressed online in blogs, forum postings,
reviews, email, surveys, contact-center conversations, and a range of other feedback
sources. It is sensitive to the identity of the person who is posting, the needs of the
person who may be consuming the information, to context, and to plans or intent captured
in text. While subjective information may not have the ability to be matched to particular
persons, the benefits of knowing who is posting are prompting entity-analytics R&D into
identity resolution based on clues found in text.
Our next point should be obvious by now:
5. Content covers more than just text managed in a content
management system and published to the web
We have user-generated content online in the form of articles, blogs and comments,
status updates, profiles, and forum postings. And certainly, we have content in the
conventional sense, material that is created and published via formal, managed
processes. But the content label also extends to email, corporate documents and
reports; SMS/IM text, contact-center notes and transcripts; and also, as mentioned, to
audio streams, images, and video. This includes the above in original, as-created form
and in derived (duplicated, quoted, sampled, distorted, and otherwise reworked) forms.
Consider rich-media content in particular. Content analytics solutions are already in use
to search, analyze, and mine audio streams for contact-center applications, capable to
search not only on speech transcribed to text but on phonemes, on the fragments from
which speech is composed, with advanced abilities to distinguish among speakers in a
conversation and to detect emotion. A consumer-grade electronic camera’s ability to
identify people within the photo frame and to detect whether a subject is smiling or
blinking is content analytics; automated image recognition capabilities, and not just via
externally applied tags, are advancing rapidly, as is ability to decode image changes in a
video stream.
Content analytics, coupled with (other) SemWeb technology and operating independently,
can be applied to the spectrum of information types across organizational barriers.
Analytics, broadly drawn, provides the key.
6. Content analytics is part of a collection of
complementary and overlapping analytical technologies
Analytics is the search for business insight in online, social, and enterprise data.
Analytics comes in many forms, under a variety of names. The definition common to
them all is that analytics transforms source data to derive business information that is
stored to databases and communicated in the form of numbers, tables, charts, and
Data mining discerns patterns in data in structured forms, typically in databases, to
produce predictive models suitable for classification, forecasting, and other functions. BI
typically applies dimensional models to data and supports reporting and interactive data
analysis, but it may also include predictive-model deployment and in some instances, will
also subsume the data mining process. Web analytics is not typically grouped under the
BI umbrella, but it is BI, drawing from web server log files to mine behavior patterns from
click-stream data, presented in familiar BI dashboards, reports, and charts and feeding
data-mining processes that seek to model quantities such as website conversion (a fancy
name for sales) and shopping-cart or session abandonment. Social-network analysis
looks at the dynamic graph of connections and message propagation across social and
enterprise platforms. Lastly, location intelligence is a special sort of BI with data types,
structures, analysis, and presentation methods tailored for geospatial data.
These analytics variants operate on numerical, quantified data. Content analytics
complements them, in some cases by extracting data (e.g., geographic locations and
numbers from data tables) from textual sources and in other cases by using their
capabilities for exploratory analysis of text sourced information; for instance, when
classified by geographic source or topic and rendered in a map, when presented in BI
dashboards and charts, and when incorporated in predictive securities-trading models.
But content analytics can do more than just quantify free-form sources, shown in our next
two points.
7. Content analytics generates semantic and structural
Metadata is descriptive information. Comparing content to a letter, the writing, and
postmark on the envelope is metadata. Consider electronic examples: the values of the
To, From, CC, Subject, and routing header fields of an email message; the author, file
name, file type, last-saved date, title, language, and tags applied to a document; values
annotated with web page META tags, and so on. Some of this metadata is structural,
some of it is semantics.
The Dublin Core Metadata Initiative is perhaps the most prominent metadata-standards
proponent, providing for natural-language and formal semantic shared vocabularies that
facilitate interoperability.1
The natural-language processing (NLP) components of content
analytics solutions can and do discern and extract metadata from free-form and semi-
structured source materials; all done with the possibility of Dublin Core conformance and
of meeting particular, situational needs by extracting advanced metadata such as topics
and themes.
Content analytics tools will, depending on the provider and on the user’s needs, create
and store an XML-/RDF-/FOAF-annotated version of source materials, extract information
of interest to a file or database, or, when invoked as-a-service, return XML-, JSON-, etc.
marked up.
Here’s where we come to search and linking.
8. Content analytics facilitates semantic search and
semantic data integration
Web pages annotated with concepts, topic, synonyms, etc., and with key information
content micro-formatted– this is Search Engine Optimization (SEO) –will be more directly
accessible as search evolves into information access. For both web search and local
enterprise search, that extracted information can be indexed as the basis for concept and
faceted search (which are two varieties of semantic search), and for faceted navigation,
where users and site visitors see results classified into high-level categories known as
facets (facets may be predetermined or they may have been discovered in source
materials via NLP and clustering).
Content analytics also enables similarity search, where we can search for documents,
messages, or objects that are statistically or semantically similar to one we’re viewing,
and for similar searches, which are search queries similar to the one we have issued.
Similarity measurement is useful beyond interactive search; for instance for tracking the
diffusion of content – messages, press releases, quotations, and so on – across news,
social, and interpersonal messages, whether for media measurement, copyright
enforcement, or research. Given content’s complexity, content analytics’ ability to
“fingerprint” content and measure similarity is an asset in tracking efforts.
Lastly, while annotation is great for SEO and semantic search, it also facilitates data
integration, also known as data fusion and record linkage. For Semantic Web
applications, annotations would include URIs; for other applications, integration could be
accomplished via other content-extracted key information.
Automatic summarization and abstracting are under the content analytics umbrella.
9. Content analytics scales from individual messages to
wide data spaces and large corpora
Content analytics scales through the use of high-throughput technologies such as
Hadoop and deployment on grid-based, scalable hardware. Further –
10. Content analytics can operate in real time for a wide
variety of business goals and business domains
The choice of particular techniques and tools, where scalability, the need for speed, and
other capabilities are concerned, will depend on the information sources, business goals,
the type of insights to be sought, and the skills of the users. If the business need is for
real-time news and social monitoring for brand and reputation management, security, or
military intelligence, one class of solution will be in order that would be very different in
application from a solution chosen to provide semantic search and navigation for an
online commerce site.
Focusing on real-time capabilities and also the ability to handle noisy social text (replete
with slang, idiom, misspellings, abbreviations, sarcasm, and the like), we see that content
analytics’ capabilities are a neat complement to the structured Semantic Web, which
would be hard-pressed to keep up with today’s flood of raw, chaotic information. The
pairing of structured sources and ad-hoc analyses can be especially powerful.
11. Content Analytics is delivered installed, on the cloud,
and as-a-service: Your choice
Most members of the semantics community are familiar with a few as-a-service
annotation services, accessible via web services APIs. They represent only the visible
top of a much larger, metaphorical, content analytics iceberg. First, there are many more
annotation services, with capabilities that extend far beyond English-language entity
analytics to encompass deep information extraction, in the content analytics world. The
only barrier to their semantics-world and Semantic Web use is lack of awareness.
Further, content analytics is available on the cloud, in hosted form, or may be installed on
your own hardware.
12. Content analytics can be customized, extended, and
configured via inclusion of controlled vocabularies,
taxonomies, and ontologies
Analytics means flexibility, the ability to square formal methods and structures with ad-
hoc, situational needs and to rely both on shared, standardized resources and on
protocols. It is also the ability to depend on proprietary assets and materials not yet
brought into compliance with modern forms and into the Semantic Web.
We have examined 12 Things the Semantic Web (and Semantic Computing Practitioners)
Should Know about Content Analytics. But really, they reduce to a single paragraph:
Content analytics makes sense of the mess of content – of online, social, and enterprise
text, and moving forward, of rich media including images, audio, and video – for purposes
that extend to semantic data integration, search, and information management. Content
analytics, by helping semanticize existing data, is a foundation technology for the
Semantic Web and semantic computing. Content analytics is delivering business value
today, complementing BI, web analytics, location intelligence, and predictive analytics.
Prospective users can look to a variety of technologies and tools to find or craft a solution
that best meets particular needs, whether for individual, embedded, or enterprise use.
Given that hosted and as-a-service (as well as installed) options are available, getting
started is not difficult; given the breadth of capabilities, standards adherence, and
customizability, there are few adoption barriers. Semantics practitioners will readily see
the value of the technology and will find it well worth trying.
Visit for more information about OpenText solutions. OpenText is a publicly traded company on both NASDAQ (OTEX) and the TSX (OTC) Copyright © 2010 by OpenText Corporation. Trademarks or registered
trademarks of OpenText Corporation. This list is not exhaustive. All other trademarks or registered trademarks are the property of their respective owners. All rights reserved. 11PROD0234EN
Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation,
founding chair of the Text Analytics Summit and the Sentiment Analysis Symposium, and
contributing editor at TechWeb's InformationWeek. He consults, writes, and speaks on
business intelligence, data management and analysis systems, text mining, visualization,
and related topics. Follow him on Twitter
About OpenText
OpenText is the world’s largest independent provider of Enterprise Content Management
(ECM) software. The Company's solutions manage information for all types of business,
compliance and industry requirements in the world's largest companies, government
agencies and professional service firms. OpenText supports approximately 46,000
customers and millions of users in 114 countries and 12 languages. For more information
about OpenText, visit

Mais conteúdo relacionado

Mais procurados

A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...Michael Mortenson
Trends, Tools and Tips for Technology Careers
Trends, Tools and Tips for Technology CareersTrends, Tools and Tips for Technology Careers
Trends, Tools and Tips for Technology CareersMichael Mortenson
Marketing Analytics using R/Python
Marketing Analytics using R/PythonMarketing Analytics using R/Python
Marketing Analytics using R/PythonSagar Singh
Sentiment Analysis: The Marketplace and Providers
Sentiment Analysis: The Marketplace and ProvidersSentiment Analysis: The Marketplace and Providers
Sentiment Analysis: The Marketplace and ProvidersSeth Grimes
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...ijtsrd
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights Joe Lamantia
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute PoojaPatidar11

Mais procurados (20)

Data analytics
Data analyticsData analytics
Data analytics
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
Trends, Tools and Tips for Technology Careers
Trends, Tools and Tips for Technology CareersTrends, Tools and Tips for Technology Careers
Trends, Tools and Tips for Technology Careers
Analytics 2
Analytics 2Analytics 2
Analytics 2
Marketing Analytics using R/Python
Marketing Analytics using R/PythonMarketing Analytics using R/Python
Marketing Analytics using R/Python
Sentiment Analysis: The Marketplace and Providers
Sentiment Analysis: The Marketplace and ProvidersSentiment Analysis: The Marketplace and Providers
Sentiment Analysis: The Marketplace and Providers
Data analytics
Data analyticsData analytics
Data analytics
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Classification of data
Classification of dataClassification of data
Classification of data
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Data analytics
Data analyticsData analytics
Data analytics
Data analytics
Data analyticsData analytics
Data analytics
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
Data analytics
Data analyticsData analytics
Data analytics
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
Sample Sample
Text analytics
Text analyticsText analytics
Text analytics
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics


The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social SentimentSeth Grimes
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics TodaySeth Grimes
Technology Frontiers: Text, Sentiment, and Sense
Technology Frontiers: Text, Sentiment, and SenseTechnology Frontiers: Text, Sentiment, and Sense
Technology Frontiers: Text, Sentiment, and SenseSeth Grimes
The State of Semantics
The State of SemanticsThe State of Semantics
The State of SemanticsSeth Grimes
Smart Content = Smart Business
Smart Content = Smart BusinessSmart Content = Smart Business
Smart Content = Smart BusinessSeth Grimes
Social Media AND THE Enterprise Business Intelligence/Analytics Connection
Social Media AND THE Enterprise Business Intelligence/Analytics ConnectionSocial Media AND THE Enterprise Business Intelligence/Analytics Connection
Social Media AND THE Enterprise Business Intelligence/Analytics ConnectionSeth Grimes
Text, Content, and Social Analytics: BI for the New World
Text, Content, and Social Analytics: BI for the New WorldText, Content, and Social Analytics: BI for the New World
Text, Content, and Social Analytics: BI for the New WorldSeth Grimes
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSeth Grimes
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment AnalysisSeth Grimes
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social MediaSeth Grimes
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationSeth Grimes
Design of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataDesign of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataLucie Šperková

Destaque (12)

The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
Technology Frontiers: Text, Sentiment, and Sense
Technology Frontiers: Text, Sentiment, and SenseTechnology Frontiers: Text, Sentiment, and Sense
Technology Frontiers: Text, Sentiment, and Sense
The State of Semantics
The State of SemanticsThe State of Semantics
The State of Semantics
Smart Content = Smart Business
Smart Content = Smart BusinessSmart Content = Smart Business
Smart Content = Smart Business
Social Media AND THE Enterprise Business Intelligence/Analytics Connection
Social Media AND THE Enterprise Business Intelligence/Analytics ConnectionSocial Media AND THE Enterprise Business Intelligence/Analytics Connection
Social Media AND THE Enterprise Business Intelligence/Analytics Connection
Text, Content, and Social Analytics: BI for the New World
Text, Content, and Social Analytics: BI for the New WorldText, Content, and Social Analytics: BI for the New World
Text, Content, and Social Analytics: BI for the New World
Search, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled VisionSearch, Signals & Sense: An Analytics Fueled Vision
Search, Signals & Sense: An Analytics Fueled Vision
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment Analysis
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
Design of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataDesign of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream data

Semelhante a 12 Things the Semantic Web Should Know about Content Analytics

Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Findwise
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery Attivio
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And FootballAmanda Gray
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentThe Digital Group
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Dan Keldsen
Intelligentcontent2009Salim Ismail
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”voginip
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”VOGIN-academie
Metadata and Analytics
Metadata and AnalyticsMetadata and Analytics
Metadata and Analyticsbrunomase
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
Semantic Enterprise: A Step Toward Agent-Driven Integration
Semantic Enterprise: A Step Toward Agent-Driven IntegrationSemantic Enterprise: A Step Toward Agent-Driven Integration
Semantic Enterprise: A Step Toward Agent-Driven IntegrationCognizant
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011Seth Grimes
"Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption" "Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption" J T "Tom" Johnson
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSemantic Web Company

Semelhante a 12 Things the Semantic Web Should Know about Content Analytics (20)

Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
Accelerate Data Discovery
Accelerate Data Discovery   Accelerate Data Discovery
Accelerate Data Discovery
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Metadata and Analytics
Metadata and AnalyticsMetadata and Analytics
Metadata and Analytics
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
AAUP 2008: Making XML Work (T. Kerner)
AAUP 2008: Making XML Work (T. Kerner)AAUP 2008: Making XML Work (T. Kerner)
AAUP 2008: Making XML Work (T. Kerner)
Semantic Enterprise: A Step Toward Agent-Driven Integration
Semantic Enterprise: A Step Toward Agent-Driven IntegrationSemantic Enterprise: A Step Toward Agent-Driven Integration
Semantic Enterprise: A Step Toward Agent-Driven Integration
Database Essay
Database EssayDatabase Essay
Database Essay
Semantic intelligence
Semantic intelligenceSemantic intelligence
Semantic intelligence
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
"Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption" "Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption"
Web Mining
Web MiningWeb Mining
Web Mining
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies

Mais de Seth Grimes

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowSeth Grimes
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextSeth Grimes
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market TrendsSeth Grimes
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPersSeth Grimes
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AISeth Grimes
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case studySeth Grimes
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisSeth Grimes
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to PracticeSeth Grimes
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseSeth Grimes
Sentiment, Opinion & Emotion on the Multilingual Web
Sentiment, Opinion & Emotion on the Multilingual WebSentiment, Opinion & Emotion on the Multilingual Web
Sentiment, Opinion & Emotion on the Multilingual WebSeth Grimes
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Seth Grimes

Mais de Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Emotion AI
Emotion AIEmotion AI
Emotion AI
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and Sense
Sentiment, Opinion & Emotion on the Multilingual Web
Sentiment, Opinion & Emotion on the Multilingual WebSentiment, Opinion & Emotion on the Multilingual Web
Sentiment, Opinion & Emotion on the Multilingual Web
Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)Text Analytics Applied (LIDER roadmapping presentation)
Text Analytics Applied (LIDER roadmapping presentation)


Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

Último (20)

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

12 Things the Semantic Web Should Know about Content Analytics

  • 1. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 12 Things the Semantic Web Should Know about Content Analytics Seth Grimes, Alta Plana Corporation June 2011 | Sponsored by OpenText Abstract Content analytics is sense-making technology. It semanticizes online, social, and enterprise content. It facilitates semantic data integration, search, and information management and is an underappreciated foundational technology for building the Semantic Web. Technologists and business leaders alike will benefit from understanding the role content analytics plays in semantic computing, starting with 12 essential points.
  • 2. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 2 Contents Introduction.................................................................................................................3   The Semantic Web and Content Analytics...............................................................3   1.   Entity extraction is a form of content analytics ................................................3   2. There are more entities that are dreamt of in DBpedia, Freebase,, and the like................................................................................4   3. Content analytics discovers, annotates, and extracts the broad range of information in content, far beyond entities.......................................................4   4. Content analytics handles subjectivity: Sentiment, opinion, and emotion..........5   5. Content covers more than just text managed in a content management system and published to the web ....................................................................6   6. Content analytics is part of a collection of complementary and overlapping analytical technologies ....................................................................................7   7. Content analytics generates semantic and structural metadata ........................7   8. Content analytics facilitates semantic search and semantic data integration....8   9. Content analytics scales from individual messages to wide data spaces and large corpora ............................................................................................9   10. Content analytics can operate in real time for a wide variety of business goals and business domains ...........................................................................9   11. Content Analytics is delivered installed, on the cloud, and as-a-service: Your choice......................................................................................................9   12. Content analytics can be customized, extended, and configured via inclusion of controlled vocabularies, taxonomies, and ontologies.................10   Conclusion................................................................................................................10  
  • 3. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 3 Introduction Semantic computing exploits machine-represented meaning to enhance search, data integration, knowledge management, and information-centered business processes. The ultimate goal is to enable automated knowledge discovery and business-process execution across a linked data web. However, this goal will not be reachable in any meaningful sense unless and until a broad set of information-rich endpoints is available for major business and personal purposes. These Semantic Web endpoints – triple stores that capture entities and relationships, supporting distributed query and inference – and other forms of semantically annotated content aren’t instantiated and populated by some magical process. They must be created. The creation of meaning – the generation of structured information from “unstructured” sources – is the province of content analytics. Content analytics, along with modern applications that couple content production and annotation along with efforts to map databases into linked-data repositories, are the foundational technologies that facilitate semantic computing and populate the Semantic Web. So long as the Semantic Web lacks a critical mass of usable data from online, social, and enterprise sources, the Semantic Web will have form but not function. The set of core Semantic Web technologies, a stack of standards and protocols, on their own are not enough. The Semantic Web and broader semantic computing need data, yet almost no historical information, and very little of the information being produced today is in semantic formats. Content analytics can extract semantics for that mass of “unstructured” information to provide semantic structure. By semanticizing the range of existing content, content analytics can and will fuel the realization of the Semantic Web. The Semantic Web and Content Analytics Despite its very important (and as yet mostly potential) Semantic Web role, and despite the business value being delivered today by content analytics, the technology, solutions, and broader applications are not sufficiently well understood; hence this paper, 12 Things the Semantic Web (and semantic computing practitioners) Should Know about Content Analytics. Let us start with a fundamental point: 1. Entity extraction is a form of content analytics Entities are concrete things, often named in some form of lexicon; for example, people (Thor, Barack Obama), companies (IBM, General Motors), places (Paris, Canada), events (the World Series), enzymes (hexokinase), and even research papers (“The Unreasonable Effectiveness of Data”). Entity extraction is a process that starts by finding entities in source materials, whether web pages, email, audio streams, images, or some other material of interest. Once discerned, the entity is disambiguated (Is “Ford” a car, an
  • 4. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 4 industrial company, an actor [which?], a theater, or a place to cross a river?). Then it is typed (Person, Organization, etc.), and (perhaps) mapped into a canonical form according to a controlled vocabulary. It may be designated with a uniform resource identifier that facilitates associating diverse information to the source material. Entity extraction is a form of content analysis. It involves reaching into the content, whatever its form, and understanding the inherent structure that is apparent to any educated human reader: the “chunks” into which text and other content is separated, the word morphology, grammar, and larger-scale structure that humans grasp without conscious reflection. The parsing steps may seem simple, but tasks such as disambiguation, which entails consideration of context and usage, decidedly are not. Vikings in a sports article are different from Vikings in a history text; beyond document type, word sequence “the Vikings lost their fourth straight game” tells us which sense of Vikings is in play. Yet – 2. There are more entities that are dreamt of in DBpedia, Freebase,, and the like Common entity sources do not cover all business, scientific, news, or cultural domains. An entity annotation service designed foremost for financial news sources won’t help you much with laboratory science or understanding Iraqi Arabic blog chatter. Content analytics tools support a variety of techniques that allow you to go beyond the common sources. Tools may allow you to import and apply your own lexicons and taxonomies, and they may infer new entities via syntactic analysis and machine learning (techniques that decode grammar and apply pattern analyses to build or expand on a list of features of interest). Further, content analytics may resolve anaphora, including pronouns as well as other forms of co-reference, accepting different ways of referring to a single thing. The application of natural-language processing helps us understand that in the text – “Sarkozy's desire to become the new President's main international partner – and, indeed, personal friend – was palpable. Consequently, the famously passionate and emotive Frenchman responded to Obama's reserved personality…” – “the new President” is Obama and “the famously passionate and emotive Frenchman” is Sarkozy. But entities are not all that content analytics can find. 3. Content analytics discovers, annotates, and extracts the broad range of information in content, far beyond entities RDF schemas capture relationships among entities: FriendOf, EmployedBy, OwnerOf, and so on; the lists are long, varying by data space. Entity relationships may be engineered in a top-down, prescriptive manner, or they may be mapped from sources
  • 5. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 5 such as relational databases that capture relationships. Wherever they originate, relationships are the key to knowledge and raw material for inference. If your approach is to extract entities and restrict yourself to relationships expressed in ontologies or other knowledge repositories, you may be leaving vast amounts of valuable information unanalyzed. Source materials capture and express relationships. After all, a blog posting, a tweet, an article, an e-mail message, a video: every form of content was created to communicate. It would be silly to parse a news article and report that country X, person Y, and company Z were mentioned without also extracting the entity relationships present in the text. Content may contain conventional data, and not just in marked-up data tables. Consider a sentence from a datelined article, “The Dow Jones Industrial Average finished the trading day at 12,605.32, up 45.14 points (0.36 percent). The S&P 500 closed at 1,343.6, up 2.92 points (0.22 percent).” Content analytics can extract this data, to RDF or to a database table, along with metadata such as the names of the article author and publication, the publication date, the article’s URL, as well as other available information from HTML Meta tags and page- embedded FOAF, RDFa, or other microcode. Content analytics can infer from the text – “Among actively traded Colorado stocks, Accelr8 Technology Corp. (AXK)...” – that (possibly) named entities Accelr8 Technology Corp., AXK, and Colorado are related; sophisticated content analytics will ascribe the ticker symbol AXK to Accelr8 and capture that Accelr8 is located in the geographic area Colorado. Beyond these facts and relationships, strong content analytics will associate the conceptual class “stock market index” with the DJIA and S&P 500 and will identify topics such as “financial markets reporting” and themes such as “the economy” with the source article. How far beyond entities? 4. Content analytics handles subjectivity: Sentiment, opinion, and emotion We can classify information as factual or as subjective. Attitudinal information – sentiment, opinions, emotions – is very important to business applications that include customer service and support, marketing, product, and service quality, contextual advertising placement, and policy and politics. A business that is listening will pick up on tweets such as – @robwolfeusa Wow, at #Hilton in Long Island. Exec floor room guaranteed not available and no rooms clean and available at 4:30PM.
  • 6. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 6 – that indicate problems. Content analytics, in this instance, will understand what hotel property is being referred to, what the issue was, and who was posting (the potential often exists to match a social handle to a name or other identifying information and from there to actual business transactions); this facilitates processing and quick responses. This example looks at and matches individual records; content analytics is also applied to aggregate sentiment, classified by familiar categories such as location, age, and sex as well as by company specific dimensions such as product and location. This class of subjectivity analysis looks for the voice of the customer (or prospect, influencer, voter, patient, or market) as expressed online in blogs, forum postings, reviews, email, surveys, contact-center conversations, and a range of other feedback sources. It is sensitive to the identity of the person who is posting, the needs of the person who may be consuming the information, to context, and to plans or intent captured in text. While subjective information may not have the ability to be matched to particular persons, the benefits of knowing who is posting are prompting entity-analytics R&D into identity resolution based on clues found in text. Our next point should be obvious by now: 5. Content covers more than just text managed in a content management system and published to the web We have user-generated content online in the form of articles, blogs and comments, status updates, profiles, and forum postings. And certainly, we have content in the conventional sense, material that is created and published via formal, managed processes. But the content label also extends to email, corporate documents and reports; SMS/IM text, contact-center notes and transcripts; and also, as mentioned, to audio streams, images, and video. This includes the above in original, as-created form and in derived (duplicated, quoted, sampled, distorted, and otherwise reworked) forms. Consider rich-media content in particular. Content analytics solutions are already in use to search, analyze, and mine audio streams for contact-center applications, capable to search not only on speech transcribed to text but on phonemes, on the fragments from which speech is composed, with advanced abilities to distinguish among speakers in a conversation and to detect emotion. A consumer-grade electronic camera’s ability to identify people within the photo frame and to detect whether a subject is smiling or blinking is content analytics; automated image recognition capabilities, and not just via externally applied tags, are advancing rapidly, as is ability to decode image changes in a video stream. Content analytics, coupled with (other) SemWeb technology and operating independently, can be applied to the spectrum of information types across organizational barriers. Analytics, broadly drawn, provides the key.
  • 7. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 7 6. Content analytics is part of a collection of complementary and overlapping analytical technologies Analytics is the search for business insight in online, social, and enterprise data. Analytics comes in many forms, under a variety of names. The definition common to them all is that analytics transforms source data to derive business information that is stored to databases and communicated in the form of numbers, tables, charts, and visualizations. Data mining discerns patterns in data in structured forms, typically in databases, to produce predictive models suitable for classification, forecasting, and other functions. BI typically applies dimensional models to data and supports reporting and interactive data analysis, but it may also include predictive-model deployment and in some instances, will also subsume the data mining process. Web analytics is not typically grouped under the BI umbrella, but it is BI, drawing from web server log files to mine behavior patterns from click-stream data, presented in familiar BI dashboards, reports, and charts and feeding data-mining processes that seek to model quantities such as website conversion (a fancy name for sales) and shopping-cart or session abandonment. Social-network analysis looks at the dynamic graph of connections and message propagation across social and enterprise platforms. Lastly, location intelligence is a special sort of BI with data types, structures, analysis, and presentation methods tailored for geospatial data. These analytics variants operate on numerical, quantified data. Content analytics complements them, in some cases by extracting data (e.g., geographic locations and numbers from data tables) from textual sources and in other cases by using their capabilities for exploratory analysis of text sourced information; for instance, when classified by geographic source or topic and rendered in a map, when presented in BI dashboards and charts, and when incorporated in predictive securities-trading models. But content analytics can do more than just quantify free-form sources, shown in our next two points. 7. Content analytics generates semantic and structural metadata Metadata is descriptive information. Comparing content to a letter, the writing, and postmark on the envelope is metadata. Consider electronic examples: the values of the To, From, CC, Subject, and routing header fields of an email message; the author, file name, file type, last-saved date, title, language, and tags applied to a document; values annotated with web page META tags, and so on. Some of this metadata is structural, some of it is semantics.
  • 8. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 8 The Dublin Core Metadata Initiative is perhaps the most prominent metadata-standards proponent, providing for natural-language and formal semantic shared vocabularies that facilitate interoperability.1 The natural-language processing (NLP) components of content analytics solutions can and do discern and extract metadata from free-form and semi- structured source materials; all done with the possibility of Dublin Core conformance and of meeting particular, situational needs by extracting advanced metadata such as topics and themes. Content analytics tools will, depending on the provider and on the user’s needs, create and store an XML-/RDF-/FOAF-annotated version of source materials, extract information of interest to a file or database, or, when invoked as-a-service, return XML-, JSON-, etc. marked up. Here’s where we come to search and linking. 8. Content analytics facilitates semantic search and semantic data integration Web pages annotated with concepts, topic, synonyms, etc., and with key information content micro-formatted– this is Search Engine Optimization (SEO) –will be more directly accessible as search evolves into information access. For both web search and local enterprise search, that extracted information can be indexed as the basis for concept and faceted search (which are two varieties of semantic search), and for faceted navigation, where users and site visitors see results classified into high-level categories known as facets (facets may be predetermined or they may have been discovered in source materials via NLP and clustering). Content analytics also enables similarity search, where we can search for documents, messages, or objects that are statistically or semantically similar to one we’re viewing, and for similar searches, which are search queries similar to the one we have issued. Similarity measurement is useful beyond interactive search; for instance for tracking the diffusion of content – messages, press releases, quotations, and so on – across news, social, and interpersonal messages, whether for media measurement, copyright enforcement, or research. Given content’s complexity, content analytics’ ability to “fingerprint” content and measure similarity is an asset in tracking efforts. Lastly, while annotation is great for SEO and semantic search, it also facilitates data integration, also known as data fusion and record linkage. For Semantic Web 1­‐basics/  
  • 9. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 9 applications, annotations would include URIs; for other applications, integration could be accomplished via other content-extracted key information. Automatic summarization and abstracting are under the content analytics umbrella. 9. Content analytics scales from individual messages to wide data spaces and large corpora Content analytics scales through the use of high-throughput technologies such as Hadoop and deployment on grid-based, scalable hardware. Further – 10. Content analytics can operate in real time for a wide variety of business goals and business domains The choice of particular techniques and tools, where scalability, the need for speed, and other capabilities are concerned, will depend on the information sources, business goals, the type of insights to be sought, and the skills of the users. If the business need is for real-time news and social monitoring for brand and reputation management, security, or military intelligence, one class of solution will be in order that would be very different in application from a solution chosen to provide semantic search and navigation for an online commerce site. Focusing on real-time capabilities and also the ability to handle noisy social text (replete with slang, idiom, misspellings, abbreviations, sarcasm, and the like), we see that content analytics’ capabilities are a neat complement to the structured Semantic Web, which would be hard-pressed to keep up with today’s flood of raw, chaotic information. The pairing of structured sources and ad-hoc analyses can be especially powerful. 11. Content Analytics is delivered installed, on the cloud, and as-a-service: Your choice Most members of the semantics community are familiar with a few as-a-service annotation services, accessible via web services APIs. They represent only the visible top of a much larger, metaphorical, content analytics iceberg. First, there are many more annotation services, with capabilities that extend far beyond English-language entity analytics to encompass deep information extraction, in the content analytics world. The only barrier to their semantics-world and Semantic Web use is lack of awareness. Further, content analytics is available on the cloud, in hosted form, or may be installed on your own hardware.
  • 10. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 10 12. Content analytics can be customized, extended, and configured via inclusion of controlled vocabularies, taxonomies, and ontologies Analytics means flexibility, the ability to square formal methods and structures with ad- hoc, situational needs and to rely both on shared, standardized resources and on protocols. It is also the ability to depend on proprietary assets and materials not yet brought into compliance with modern forms and into the Semantic Web. Conclusion We have examined 12 Things the Semantic Web (and Semantic Computing Practitioners) Should Know about Content Analytics. But really, they reduce to a single paragraph: Content analytics makes sense of the mess of content – of online, social, and enterprise text, and moving forward, of rich media including images, audio, and video – for purposes that extend to semantic data integration, search, and information management. Content analytics, by helping semanticize existing data, is a foundation technology for the Semantic Web and semantic computing. Content analytics is delivering business value today, complementing BI, web analytics, location intelligence, and predictive analytics. Prospective users can look to a variety of technologies and tools to find or craft a solution that best meets particular needs, whether for individual, embedded, or enterprise use. Given that hosted and as-a-service (as well as installed) options are available, getting started is not difficult; given the breadth of capabilities, standards adherence, and customizability, there are few adoption barriers. Semantics practitioners will readily see the value of the technology and will find it well worth trying.
  • 11. TOGETHER, WE ARE THE CONTENT EXPERTS WHITE PAPER 11 Visit for more information about OpenText solutions. OpenText is a publicly traded company on both NASDAQ (OTEX) and the TSX (OTC) Copyright © 2010 by OpenText Corporation. Trademarks or registered trademarks of OpenText Corporation. This list is not exhaustive. All other trademarks or registered trademarks are the property of their respective owners. All rights reserved. 11PROD0234EN Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation, founding chair of the Text Analytics Summit and the Sentiment Analysis Symposium, and contributing editor at TechWeb's InformationWeek. He consults, writes, and speaks on business intelligence, data management and analysis systems, text mining, visualization, and related topics. Follow him on Twitter About OpenText OpenText is the world’s largest independent provider of Enterprise Content Management (ECM) software. The Company's solutions manage information for all types of business, compliance and industry requirements in the world's largest companies, government agencies and professional service firms. OpenText supports approximately 46,000 customers and millions of users in 114 countries and 12 languages. For more information about OpenText, visit