This document summarizes a presentation about why metadata matters for SharePoint search and information governance. It introduces the two speakers, Cem Aykan from Microsoft and Don Miller from Concept Searching. The agenda covers Microsoft's roadmap for SharePoint search and Concept Searching's solutions for information governance. The presentation then discusses enterprise search, building an information governance concept index, and how to achieve SharePoint information governance through automated metadata generation.
Driving Behavioral Change for Information Management through Data-Driven Gree...
Why Metadata Matters in SharePoint Search and Information Governance Webinar
1. Why Metadata Matters
in SharePoint Search
and Information Governance
Cem Aykan
Senior Product Manager of Enterprise Search
Microsoft
cem.aykan@microsoft.com
Twitter @Microsoft
Don Miller
Vice President of Commercial Accounts
Concept Searching
donm@conceptsearching.com
Twitter @conceptsearch
2. Expert Speakers
Cem Aykan – Senior Product Manager of Enterprise Search
at Microsoft has been active in the field from the early days of
FAST and joined Microsoft with the FAST Search acquisition
back in 2008. Since then, he has been involved in a range of
search projects and initiatives, and is currently working on
nextgen Search and Discovery scenarios.
Don Miller – Vice President of Sales at Concept Searching
has over 20 years’ experience in knowledge management.
He is a frequent speaker on records management, and
information architecture challenges and solutions, and has
been a guest speaker at Taxonomy Boot Camp, and numerous
SharePoint events about information organization and records
management.
3. Agenda
• Microsoft
• Roadmap for SharePoint Search
• SharePoint Search 2013 features and functions
• Impact and changing landscape of Microsoft’s focus on the cloud
and Office 365
• Concept Searching
• Information Governance and why it matters – on-premise and in the
cloud
• Technologies and solutions
• Next Steps
6. More then 10 Blue Links…
Unified Search
10 blue links vs
Visual and actionable
Contextual and
personalized
Rich recommendations
driven by user behavior
Extensible search
platform with industry
standards
Business-Critical SharePoint
8. Why Search and Information Governance?
Information overload
makes it difficult to stay
on top of topics that
matter most
Finding the right
information for the task at
hand can be difficult and
time consuming
Connecting with the right
experts across different
teams is challenging
9. Forward Looking…
•
•
•
•
•
Faster release cadence on service
A/B Instrumentation on new features
Enhanced analytics
Connected experiences across O365
Mobile experiences & responsive designs
•
•
•
•
•
Extract meaning and context from social
Engaging, visual and actionable results
Flexible on-premises, cloud and hybrid
Consume diverse content and signals
Don’t just search, ask questions
10. Native Integration into Search/SharePoint 2013
• SharePoint 2013 is the enabler to achieve the objectives of
Information Governance
• conceptClassifier for SharePoint platform does not replace
SharePoint Search but augments it by providing the rich multi-term
metadata to the search index, auto-classifies content, and allows
management of content via the Term Store/taxonomy
conceptClassifier for SharePoint, coupled with the features and
functions of SharePoint Search, results in a powerful enterprise
application that can be used to significantly improve ‘findability’ and is
critical to solving Information Governance challenges
11. The Global Leader in
Managed Metadata Solutions
•
Company founded in 2002
• Product launched in 2003
• Focus on management of structured and unstructured information
•
Technology Platform
• Delivered as a web service
• Automatic concept identification, content tagging, auto-classification,
taxonomy management
• Only statistical vendor that can extract conceptual metadata
•
2009, 2010, 2011, 2012, 2013, 2014 ‘100 Companies that Matter in KM’
KMWorld and Trend Setting product of 2009, 2010, 2011, 2012, 2013
•
Authority to Operate enterprise wide US Air Force and enterprise wide
NETCON US Army
•
Locations: US, UK, and South Africa
•
Client base: Fortune 500/1000 organizations
•
Microsoft Business-Critical SharePoint Program partner,
Gold Certification in Application Development
•
Smart Content Framework™ for Information Governance comprising
• Five Building Blocks for success
• Product Platforms: conceptClassifier for SharePoint, conceptClassifier for Office 365,
conceptClassifier, and Concept Searching Technology
12. What is Information Governance?
Wikipedia says:
• “Information governance, or IG, is the set of multi-disciplinary structures,
policies, procedures, processes and controls implemented to manage
information at an enterprise level, supporting an organization's immediate and
future regulatory, legal, risk, environmental and operational requirements.”
• “IG encompasses more than traditional records management. It incorporates
privacy attributes, electronic discovery requirements, storage optimization, and
metadata management.”
Gartner says:
• “Information governance is the specification of decision rights and an
accountability framework to encourage desirable behavior in the valuation,
creation, storage, use, archival and deletion of information. It includes the
processes, roles, standards and metrics that ensure the effective and efficient
use of information in enabling an organization to achieve its goals.”
More simply put, the goal of Information Governance is to optimize the value of
information, while simultaneously minimizing the associated risks and costs.
12
13. What is Information Governance?
• Managing the information lifecycle of structured and unstructured
information to improve business performance and to address
• Regulatory Compliance
• Organizational Policy and Risk
• Privacy/Security
• At a tactical solution level
•
•
•
•
Search
Records Management
Metadata Management
Migration
• At an application solution level
• eDiscovery
• PII, PHI, FOIA
• Enterprise Metadata Platform
13
14. Why do you care?
• Without effective governance, most technology focused metadata
projects will fail (Forrester Research)
• Less than 50% of content is correctly indexed, meta tagged, or
efficiently searchable
• Unstructured data and metadata are increasing at an average
annual growth rate of 62%
• Corporations will be responsible for the security, privacy, reliability,
and compliance of 85% of that information
(IDC 2010 Digital Universe Study)
• 67% of data loss in records management is due to end user error
(Prism International)
• 70% of data breaches are due to end user error (Ponemon Institute)
15. Hopefully this isn’t your job!
“Gartner predicts by 2016, 20% of CIOs in regulated industries
will lose their jobs for failing to implement the discipline of
information governance successfully.”
16. Smart Content Framework™
Sum of parts is greater than whole
Metadata driven applications – conceptClassifier for SharePoint platform has been deployed
by clients in diverse industries to automatically generate metadata and use that metadata to
apply and enforce Information Governance policies
17. Metadata
“The metadata infrastructure provides the critical glue that binds the information
infrastructure to the underlying IT infrastructure.
Sound information governance practices would take advantage of the metadata
infrastructure to ensure that content and data are managed consistently and
adhere to written policies, across on-premise and
cloud based environments.”
2010 IDC Digital Universe Study
Advantages
• Ability to develop a single repository of organizationally relevant metadata to be
made available to any application that requires the use of metadata
• Elimination of costs and errors associated with end user tagging
• Normalization of content across functional and geographic boundaries to
remove ambiguity in vocabulary
• Metadata managed and changed in one place
• Ability to apply policy consistently across diverse repositories and applications
• Provides flexibility to rapidly make changes to the repository for regulatory
compliance where changes are immediately available for use by applications
18. Insight
“Automated tools for handling the flood of information are the only solution to
coping with the increasing demands for compliance, more targeted discovery and
better business intelligence.
IDC
Advantages
• Provides the ability to find and deliver the most relevant and granular results
from large, heterogeneous repositories
• Provides access to relevant knowledge assets that typically would not be found
• Reduces duplication of content
• Makes content available for re-use and re-purposing instead of recreating it
• Removes ambiguity in search
• Compliance and security of content assets
• Improves any interactive metadata application such as search, eDiscovery,
litigation support, FOIA, text analytics, social tagging, and collaboration
18
19. Risk
“More than 100,000 international laws and regulations are potentially relevant
to Forbes Global 1000 companies—ranging from financial disclosure
requirements to standards for data retention and privacy. Additionally, many of
these regulations are evolving and often vary or even contradict one another
across borders and jurisdictions.”
Lorrie Luellig, Of counsel, Ryley Carlock & Applewhite, PC
Advantages
Risk is different for every organization – regulatory, intellectual property protection, cyber
security, eDiscovery, data retention, even the use of information in unintended ways
• The ability to effectively identify and validate the ‘risk’ factor
• Cost versus Benefit – you may want to assume risk in certain instances
• Provides the ability to identify risk – known and unknown – while weighing
information value
• Proactively addresses and reduces risk factors through the use of business
processes and technology
• Integrated into an organization’s enterprise objectives or functional objectives
19
20. Policy
“Sound information governance practices and tools would enable organizations to
align their data retention, acceptable use and communication, data privacy,
records management, and information security policies, processes, and
technical controls.”
Worldwide Governance, Risk, and Compliance Infrastructure 2010–2014
Advantages
Policy is driven by the organization – people – not by applications
• Requires the appropriate and individualized approach for the disposition of
diverse content
• Includes identifying where the content resides, cleansing the content,
identifying the relationship between content, then defining the policies
• Key component is business user responsibilities and adaptability of the users to
follow new procedures – i.e. elimination of end user tagging
• Provides the infrastructure processes where content is relevant, protected,
archived, or deleted
20
21. Action
Action as a pillar in the Smart Content Framework™ is the execution and
interactive management of the policies and subsequent processes that ensures
all unstructured and semi-structured content is processed in a manner that
achieves the Information Governance objectives.
Advantages
• Fulfils the defined organizational policies that reduce risk and enable
effective management of all semi-structured and unstructured content
• Enforceable and adopted by business users
• Facilitates and improves business processes
• Quantifiable and able to be measured
21
22. How do we get to Information Governance with SharePoint?
How do we get to there with
SharePoint?
1.
2.
3.
4.
5.
Manually create term set for
Information Governance and
manage
Manually search content to
validate
Manually add metadata to
documents in alignment with
Regulations, Risk and Policy
Apply metadata to all legacy
content
SharePoint 2010 or 2013 and
Office 365
How do we really get there?
1.
2.
3.
4.
Extract out semantic vocabulary
from your content
(conceptTaxonomyManager)
Validate in alignment with your
vocabulary
(conceptTaxonomyManager)
Automate applying metadata to
new and legacy content
(conceptClassifier for SharePoint)
Move documents into SharePoint
2010, 2013 or Office 365
(conceptTaxonomyWorkflow)
23. How to Achieve SharePoint Information Governance
conceptClassifier for SharePoint and conceptClassifier for Office 365 platforms:
• conceptClassifier
Both automated and manual classification is supported to one or more term sets
within the Term Store and across content hubs.
• conceptTaxonomyManager
This is an advanced enterprise class, easy-to-use taxonomy and term set
development and management tool. It integrates natively with the SharePoint
Term Store reading and writing in real time ensuring that the taxonomy/term set
definition is maintained in only one place.
• conceptSearch Compound Term Indexing Engine
Licensed for the sole use of building and refining the taxonomy/term set, the
engine provides automatic semantic metadata generation that extracts
multi-word terms or concepts along with keywords and acronyms.
Optional Product:
• conceptTaxonomyWorkflow
Can perform an action on a document following a classification decision when
certain criteria are met. The workflow source type works in SharePoint 2007,
2010, and 2013, as well as all document types, including FILE and HTTP.
24. Why is Metadata so hard to get right for Information Governance?
A manual metadata approach will fail 95%+ of the time
Issue
Organizational Impact
Inconsistent
Less than 50% of content is correctly indexed, meta-tagged or
efficiently searchable rendering it unusable to the organization (IDC)
Subjective
Highly trained information specialists will agree on meta tags between
33%-50% of the time (C. Cleverdon)
Cumbersome - expensive
Average cost of manually tagging one item runs from $4 - $7 per
document and does not factor in the accuracy of the meta tags nor the
repercussions from mistagged content (Hoovers)
Malicious compliance
End users select first value in list
(Perspectives on Metadata, Sarah Courier)
No perceived value for end user
What’s in it for me? End user creates document, does not see value
for organization nor risks associated with litigation and
non-conformance to policies
What have you seen
Metadata will continue to be a problem due to inconsistent human
behavior
25. Building an Information Governance Concept Index
Concept Searching has a unique approach to ensure success
•
Concept Searching’s unique statistical concept identification underpins all technologies
•
Multi-word suggestion is explicitly more valuable than single term suggestion algorithms
Concept Searching
provides Automatic
Concept Term Extraction
Triple
Heart
Bypass
Baseball
Three
Organ
Center
Highway
Avoid
•
conceptClassifier for SharePoint will generate conceptual
metadata by extracting multi-word terms that identify
‘triple heart bypass’ as a concept as opposed to single keywords
•
Metadata can be used by any search engine index or any
application/process that uses metadata.
26. How to create Metadata Alignment for Information Governance
conceptClassifier for SharePoint provides an automated metadata approach for
an immediate ROI and enforces Information Governance
•
•
•
•
Create enterprise automated metadata framework/model
• Average return on investment minimum of 38% and
runs as high as 600% (IDC)
Apply consistent meaningful metadata to enterprise
content
• Incorrect meta tags costs an organization $2,500
per user per year – in addition potential costs for
non-compliance (IDC)
Guide users to relevant content with taxonomy
navigation
• Savings of $8,965 per year per user based on an
$80K salary (Chen & Dumais)
• 100% “Recall” of content, 35% Faster access to
content “Precision”
1. Create
concept index
from your
content
7. Life Cycle
Management
2. Model and
Validate
6. Records
Management
and PII
Use automatic conceptual metadata generation to
improve Records Management
• Eliminate inconsistent end user tagging at $4-$7 per
record (Hoovers)
• Improve compliance processes, eliminate potential
privacy exposures
3. Automate
Tagging
5. Business
Processes
4. Findability
27. Enterprise Search
“By itself the search function has limited value. The real value of
search and information access technologies is in the ongoing
efforts needed to establish effective taxonomies, to index and
classify content of all kinds, in order to provide meaningful results.”
Tom Eid, Research Vice President, Gartner Group
28. Building Blocks
Metadata, Insight, Governance
Situation:
• Not-for-profit organization that contributes to the prevention and
cure of cancer
• More than 30,000 users
• Outpatient treatment programs that record more than 328,300
visits a year
Challenge:
• Portal to enable patients to access information relevant to their
specific health situations
• Accurate, medically sound, and secure information necessary
Solution:
• conceptClassifier for SharePoint platform
• SharePoint 2010
Microsoft FAST Search
• Integrated solution with partner Aeturnum
Benefits:
• Accuracy of search
• Relevance of results
• Confidence in data
• Control and trust
“With more than 30,000 current users,
the MyMoffitt Patient Portal has seen
significant growth, and of the new
patients that come to Moffitt, 87%
register for a patient portal account. All
developments and enhancements are
about improving the patient experience.”
Jennifer Camps, Director of Portal
Technologies and Data Management,
Moffitt Cancer Center
Read the Case Study
30. Records Management
“It is simply not realistic to expect broad sets of employees to
navigate extensive classification options while referring to a
records schedule that may weigh in at more than 100 pages.”
Forrester Research/ARMA International Survey
31. Data Privacy and Cyber Security
“70% of all breaches are due to the organization’s own staff.”
Ponemon Institute
32. Building Blocks
Metadata, Policy, Governance
Situation:
• UK County Council
• Serves approximately 1 million citizens
Challenge:
• Management of digital records
• Reduction in paper records
• Reduce physical and digital storage
Solution:
• conceptClassifier for SharePoint platform
• SharePoint Search
• SharePoint Records Management
• Managed Metadata Services
“One way to manage records
whatever their medium”
Benefits:
• Migration of thousands of documents from File Shares to SharePoint to deliver
and integrated view of all information
• Cleanse content existing in multiple repositories
• Real time identification and classification from all sources of ingestion – fax,
scanned, email, etc. – as well as from all semi-structured and unstructured
content
• Improved search to quickly identify value of document/record in ‘plain English’
33. Building Blocks
Metadata, Insight, Policy, Governance
Situation:
• Budget of $6.9 billion
• Over 60,000 users
• Runs 75 hospitals and clinics providing care to more than 2.6 million
beneficiaries
Challenge:
• Data Privacy
• Intelligent Migration
• Before and after
• Records Management
• 72,000 Site Collections, 5,300 retention codes, classify 200,000
documents per hour with minimum resources
Solution:
• conceptClassifier for SharePoint platform
Benefits:
• Automatic tagging based on organizational vocabulary and descriptors
• Automatic routing and the ability to change the SharePoint content type
• Eliminated manual tagging, removes from unauthorized access and
portability
• No security exposures or breaches in 4 years
“Concept Searching’s Taxonomy
Manager provides our Subject
Matter Experts with a user friendly
web interface enabling the
development of controlled
vocabularies that can be used to
filter search results and autoclassify content to folder
structures.”
J.D. Whitlock, Lt Col, USAF,
MSC, CPHIMS
Air Force Medical Service
Read the Case Study
35. Migration
“At the 2012 Compliance, Governance and Oversight Counsel
(CGOC) Summit, a survey of corporate CIOs and general
counsels found that, typically, 1% of corporate information is
on litigation hold, 5% is in a records-retention category and
25% has current business value. This means that
approximately 69% of the data most organizations
keep can – and should – be deleted.”
Compliance, Governance and Oversight Council (CGOC)
36. Building Blocks
Metadata, Governance, Migration
Situation:
• Multiple clients
Challenge:
• Simply moving content to new location did not
provide any benefits
• Human error and time was too costly
Solution:
• conceptClassifier for SharePoint platform
Benefits:
• Cleanses irrelevant and unnecessary documents
• Dramatically reduces the time for migration
• Eliminates manual intervention
• Improves the outcome enabling improvements in:
• Search
• Records management
• Data privacy
• eDiscovery and litigation support
• Text analytics
conceptClassifier for SharePoint
identified 66,000 duplicates out of a
total of 270,000 documents,
representing a 24% reduction in disk
space.
Global Supplier of Automotive Parts
The goal was to improve search for
40,000 business users but needed to
migrate literally millions of documents.
conceptClassifier for SharePoint was
used for the pre and post migration and
for enabling concept based searching
with their existing search engine and
taxonomy based search after the
migration.
38. eDiscovery, FOIA, Litigation Support
“Law firms must keep up with the ever-increasing number of
compliance regulations for their clients. In addition, the average
Fortune 500 companies have 125 lawsuits at any given point. If law
firms and compliance departments have control of the information,
they will know where to look and be able to preserve the
information during discovery. IG can therefore also serve as an
organizational tool during litigation.”
National Law Review
39. Building Blocks
Metadata, Insight, Governance
Situation:
• Legal department
• FOIA processing
Challenge:
• Reduce costs associated with litigation support
• Overabundance of content – much was not tagged
• Increase relevance in finding appropriate information
Solution:
• conceptClassifier for SharePoint
Benefits:
• Vocabulary normalization
• Enables ‘concept based’ searching and eliminate the
construction of complex queries
• Removes the ambiguity in content
• Enables the identification of new information to be captured and
identified early in the discovery process and immediately made
available to the discovery team
• Eliminates manual tagging unless authorized
• Scalable to consume terabytes of content
40. Final Comments – Q&A
• SharePoint and SharePoint 2013 Search provide the foundation to
build an Information Governance Platform
• conceptClassifier for SharePoint augments and automates the
critical components for Information Governance
•
•
•
•
Content validation in alignment with Policy and Risk
Auto-application of metadata in alignment with Policy and Risk
Action/Migration of content in alignment with Policy and Risk
Native Metadata to enhance SharePoint 2013 Search
• Native SharePoint integration into Term Store/managed metadata
service
• SharePoint 2010, 2013, Office 365
• Lowest cost to deploy, lowest cost to maintain, fast ROI
• Metadata Survey – What are industry leaders doing?
• http://www.conceptsearching.com/wp/sharepoint-survey/
41. Next Steps
Ready to explore SharePoint Search and Integration into
Line of Business Applications to Achieve Information Governance
and a Solid ROI?
Please contact Don Miller, Vice President of Commercial Accounts
at Concept Searching
donm@conceptsearching.com
42. Please join us for our Next Webinar
Climbing the Slippery Slope of SharePoint Migrations
Date: March 25th
Time: 11:30am-12:30pm EST
“A survey of corporate CIOs and general counsels found that, typically, 69% of the
data most organizations keep can – and should – be deleted.”
Compliance, Governance and Oversight Counsel (CGOC) Summit
So what happens to the 69%? Most likely it will get migrated with no rhyme or
reason. Just because it seems easier. And the organization is still left with
mismanaged, useless information. That’s only one migration scenario. Migrations
can be fraught with delays, budget over runs, and overall frustration.
Register for this practical and informative webinar, sponsored by Portal Solutions
and Concept Searching and learn how you can eliminate migration challenges and
reach the pinnacle of success.
To Register: https://www3.gotomeeting.com/register/937305526
43. Thank You
Cem Aykan
Senior Product Manager of Enterprise Search
Microsoft
cem.aykan@microsoft.com
Twitter @Microsoft
Don Miller
Vice President of Commercial Accounts
Concept Searching
donm@conceptsearching.com
Twitter @conceptsearch