Data is everywhere, but far too often, not the information we need. Businesses continue to generate a huge volume of memos, reports, minutes of meetings, planning documents, proposals, emails, website content, blogs, wikis and other content. But this wealth of data is not providing companies with the information base it needs to make the right decisions when it needs to. Because all this unstructured data is not actionable intelligence. As a result, although we are awash with data everywhere, we make uninformed decisions based on a very small slice of that information that is readily available to us. This white paper explores a solution strategy.
Regression analysis: Simple Linear Regression Multiple Linear Regression
Actionable Intelligence From Unstructured Data using MDA
1. Call
888.453.0014
ADA SOFTWARE
SOFTWARE MODERNIZATION - POWERED BY MODELING
The automated software modernization company
Call
888.453.0014
Informational Primer
Actionable
Intelligence
from
Unstructured
Data
MEMBER
www.adasoftusa.com
Software Modernization. It’s all we do!!! P 1 7 AGE OF
379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837
2. SOFTWARE MODERNIZATION - POWERED BY MODELING
EXECUTIVE SUMMARY
D
ata is everywhere, but far too often, $6 million per year searching for information and
not the information we need. Busi- not finding it. Add to this the lost revenues
nesses continue to generate a huge caused by unproductive employee time.
volume of memos, reports, minutes The potential loss from unstructured data is,
of meetings, planning documents, proposals, therefore, multi-faceted and consists of:
emails, website content, blogs, wikis and other Uninformed decisions
content. But this wealth of data is not providing Overlooked risks
companies with the information base it needs to Loss of employee time
make the right decisions when it needs to. Be- Loss of opportunity
cause all this unstructured data is not actionable Loss of revenues
intelligence. As a result, although we are awash All of these can be fixed by our meta-
with data everywhere, we make uninformed deci- model driven information management solution
sions based on a very small slice of that informa- that can turn all this unstructured data into rich,
tion that is readily available to us. Figure-1 shows actionable intelligence.
how the Information Framework stands broken.
Worse still, all this underutilized deluge
of unstructured data is actually causing compa-
nies to lose money.
IDC estimated in
their report titled
“The High Cost of
Not Finding Infor-
mation” (IDC
#29127) that com-
panies with 1,000
white collar employ-
ees typically
wasted in excess of
Fig - 1
Software Modernization. It’s all we do!!! PAGE 2 OF 7
3. SOFTWARE MODERNIZATION - POWERED BY MODELING
ACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATA
Make Unstructured Data Come Alive
UNDERSTANDING “UNSTRUCTURE” this structure automatically unless we find a way
of adding machine-readable information to all this
“Unstructured data” is not really unstruc-
data.
tured. Let us take the example of a paper maga-
zine. It has a wonderful structure. The Table of
Contents offers an instant overview of the entire SOLUTIONS STRATEGY
magazine and provides an useable index that we The key to unleashing knowledge from al
can use to jump to any article by page number. this powerful, but untapped, information lies in
Within articles, there are pictures to help us visu- being able to:
“Unstructured” data alize the information contained in Generate the right METADATA (data about
might have an the text; there are headings that the unstructured data) that a machine can
excellent structure of
its own - that are bolded and tell us what a understand,
computers do not section of text is talking about; CATEGORIZE the data using an easily un-
understand.
there are blurbs (information call- derstood VOCABULARY, and a TAXONOMY
outs) that highlights some of the that indicates the data hierarchy and relation-
main points of the article; there might be an ab- ships.
stract providing a gist of the whole article; there Provide a KNOWLEDGE RETRIEVAL
may be footnotes, citations and references that mechanism that understands all of the
link the ideas expressed with a world of informa- above.
tion outside the magazine, There are advertise-
ments, which we immediately recognize as ad-
APPLYING OMG STANDARDS
vertisements. There is information about the edi-
OMG has modeling standards embodied
torial team, the company publishing the maga-
in Model Driven Architecture that can be utilized
zine and the authors of the various articles,
for modeling any kind of information (though it is
An e-mail gives you all the information as
originally intended for modeling and understand-
to who wrote the e-mail; when was it written; to
whom was it addressed; who all received a copy;
what was it all about (the Subject); the main body
of the message; and, any reference material pro-
vided as an attachment or an URL
So there is, indeed, a lot of structure in
what we started out as identifying as
“unstructured data”.
ing software systems). The Knowledge Discovery
The problem is not with the data. The
Metamodel (KDM), for instance, separates
problem is that a machine does not understand
Software Modernization. It’s all we do!!! PAGE 3 OF 7
4. SOFTWARE MODERNIZATION - POWERED BY MODELING
knowledge about existing systems into four or- Abstract Syntax Tree Metamodel representing
thogonal dimensions: Structure, Behavior, Data the unstructured data.
and User Interface, Unlike software, data has no
behavior. But it has associative fact patterns. So AUTOMATIC CATEGORIZERS
we utilize a modified version of the KDM concept Automatic categorizers will act on the
adapted for understanding unstructured data, metadata and perform the following functions:
which we call mKDM. Linguistic analysis
OMG also has an initiative called the Se- Statistical inference
mantics of Business Vocabulary and Rules Machine learning
(SBVR) which is a standard for establishing a Rule-based processing
business vocabulary and terminology system that These will obtain the relevant vocabular-
can be used to express business models. This is ies, taxonomies and rules from the Semantics of
very useful for defining vocabularies to under- Business Vocabulary & Rules (SBVR) that is part
of our reference Knowledge Modeling Standard.
The SBVR will provide the relevant business
vocabulary necessary to do this job properly.
For instance, if we are doing this job for a stock-
broker, the relevant business vocabulary will be
far different from what will be relevant for a law
firm. Documents will be assigned to multiple
stand unstructured data, taxonomies to catego- categories.
rize unstructured data, and rules for processing The output of the automatic categorizers
unstructured data. will be a Metadata Repository; and catalogs, fact
Coupled together, mKDM and SBVR pro- patterns and indexes.
vide the base technology for creating metadata;
defining the vocabularies, taxonomies and rules KNOWLEDGE RETRIEVAL ENGINE
for processing the data; and retrieving useful in- Regardless of whether the user is
formation based on linked entities as well as searching or browsing or seeking information
“inferred” fact patterns. This helps convert un- through a web service or an API, the actual re-
structured data into actionable intelligence. trieval will be performed by a Knowledge Re-
trieval Engine. It has to scan and parse the
PARTS OF THE SYSTEM “request for information” with reference to the
SCANNERS AND PARSERS same vocabularies, taxonomies and rules in the
Scanners and parsers will process the SBVR that were used by the automatic categoriz-
unstructured data with reference to the Knowl- ers.
edge Modeling Standards of OMG, and produce It will then retrieve two kinds of informa-
symbol tables and syntax trees. This will be an tion:
Software Modernization. It’s all we do!!! PAGE 4 OF 7
5. SOFTWARE MODERNIZATION - POWERED BY MODELING
1. ENTITY EXTRACTION: Focus on identifying than one way of representing the results: textu-
named entities. ally or visually, through spatial diagrams or mind
2. FACT EXTRACTION: Focus on fact patterns maps.
and detecting relationship between data us- Figure-2 is a schematic representing the
ing “inference”. methodology..
The retrieved information will be focused and
relevant to the “request for information”. It will be PRACTICAL APPLICATIONS
actionable intelligence.
Apart from the holistic application of this
methodology across an enterprise for rich pro-
PACKAGING & DELIVERY ENGINE
ductivity gains, greater revenue and informed
The retrieved information has to be pack-
decisions, this methodology also has many
aged and delivered to the seeker of information
smaller practical applications on limited sets of
using the right channel, The request can come
data.
from one of many channels, such as interactive
search, interactive browsing, web services, or
E-DISCOVERY FROM EMAILS
well defined APIs. The results are pushed back
Email has become the standard for both
through the same channel. There is also more
Fig - 2
Software Modernization. It’s all we do!!! PAGE 5 OF 7
6. SOFTWARE MODERNIZATION - POWERED BY MODELING
internal and external communication. A com- Advanced search capabilities to find specific
pany's email contains important, and sometimes records within your complete and secure ar-
confidential, information that is today increasingly chive.
going into massive e-mail archives, whether to Locate and produce evidence-quality mes-
comply with mandatory gov- sages with metadata in seconds.
Powerful E-Mail
Analytics ernment regulations or for in- Analyze a complete audit trail for every mes-
can provide never
formation archival. sage.
before discovered
intelligence from E-discovery refers to Review and classify every message (based
plain company
discovery in civil litigation on your company's rules and permissions)
emails
which deals with information in that leaves or enters your organization's do-
electronic format also referred to as Electronically mains.
Stored Information (ESI). Emails can be a prime Messages can easily be classified for legal
source of information in civil litigation. hold when court or counsel requests that all
Financial and other firms subject to Sar- data relevant to a particular case be pre-
banes-Oxley regulatory compliance need effec- served.
tive e-discovery mechanisms from their e-mail
E-mail analytics designed to be utilized in
archives and other documents.
complex litigation or investigative matters.
Our solution can help you implement a
Search for and identify key individuals and
powerful information retrieval mechanism from
assess their relationships and communica-
Email Archives, resulting in the following capabili-
tion patterns. The Activity Schematic in Fig-
ties, and more:
Fig - 3
Software Modernization. It’s all we do!!! PAGE 6 OF 7
7. SOFTWARE MODERNIZATION - POWERED BY MODELING
ure-3 displays communication patterns with a professionals like research scientists, pharma-
key individual placed in the center, and e- cologists, chemists, biologists, chemical engi-
mail correspondents connected with radial neers, production floor specialists, clinical trial
spokes. units and others. All the information flow amongst
Timeline View can be produced as a horizon- these diverse entities located in diverse geo-
tal timeline to help assess critical time peri- graphical locations has a very large share of
ods in the matter under investigation. “unstructured” data.
E-mail Analytics help you to easily identify
communications of the key players for sub- LAW FIRMS
stantive review. Law firms try to make sense from un-
We can transform your email archive into structured data every single minute of their exis-
rich actionable intelligence. tence. With the expanding Internet-driven uni-
verse, making sense out of information overload
OTHER REGULATORY COMPLIANCE and using the results meaningfully for their cli-
Regulatory compliance also weighs ents’ benefit is an ever-expanding challenge.
down the life science companies
(pharmaceutical, biotech and medical device CONTENT PUBLISHERS
companies). FDA regulations pertaining to clini- Companies engaged in any kind of pub-
cal trials, manufacturing proc- lishing, especially delivery of content over the
The Lifebood of the
Enterprise is esses and drug discovery re- Internet, are competing for differentiation in
information. The quire similar diligence in pre- search capability.
information economy
thrives and survives serving “evidence” for a stipu- Content metadata is most important, as
on information. lated period of time. Such evi- that is indispensable for setting up the catalogs,
dence is also contained in the fact patterns and indexes that, in turn, can trans-
unstructured data items like e- late into accuracy of information delivered.
mails and documents. Our methodology equips
the company and auditors with reliable and quick INTELLIGENCE & LAW ENFORCEMENT
e-discovery processes, apart from other proac- Especially in this age of rampant terror-
tive compliance monitoring functions of interest to ism, proactive prevention of crimes is a top prior-
the company. ity. The huge world of “unstructured” data con-
stantly evolving on the Internet is a rich source of
RESEARCH AND DEVELOPMENT intelligence and alerts, but too humungous for
Pharmaceutical companies engaged in manual processing and/or informal methods.
drug development, for instance, can benefit from A methodology such as ours can effec-
every bit of better intelligence and every minute tively harness and delivery untold value from the
of human effort saved. Drug development activi- Internet.
ties for a single product can span over ten years
and involve collaboration from a wide range of
Software Modernization. It’s all we do!!! PAGE 7 OF 7
8. SOFTWARE MODERNIZATION - POWERED BY MODELING
When one needs a heart bypass, one goes to a cardiac surgeon.
Call
When one needs the best storage solutions, one goes to EMC, the storage specialists.
888.453.0014
Why would you go to Accenture, Cap Gemini, Infosys or Wipro for software modernization?
WE ARE THE SOFTWARE MODERNIZATION SPECIALISTS. IT IS ALL WE DO.
Software modernization.
It’s all we do!!!
www.adasoftusa.com
379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837
Software Modernization. It’s all we do!!! PAGE 8 OF 7