SlideShare uma empresa Scribd logo
1 de 13
Big Data and Content
Classification
Paul Balas
How to make meaning out of Big
Data
 Big Data as the poster-child for marketing of open-source
software built-off alternative database storage structures has
become a 'Big Nothing'. The ambiguity around what Big Data
means requires endless hours of explanation and really only
focuses on the problems around dealing with data containing
large volumes, velocity, or variety (I'm waiting for more catchy
v's such as Victory, or Value!). My perspective is around the
phrase 'Big Understanding' which is an optimistic 'View' of
making sense of our data and turning it into information. The
focus has to shift.
Classification = Relevance
 No matter what vendors say, the better the classification and
structure of your data, the better your search and analytical
capabilities will be. Even tools that help with classification
require custom rules and dictionaries, and they tend to be
domain specific. If you want high quality Big Data, you need
Data Governance.
Data Governance = Big Quality
 If you want a high-quality analysis, your data has to be
standardized and consistent. This is especially true where there
is a large degree of variety in your inputs. For example, if you
have different geopolitical hierarchies for each input source, you
have to align them into a standard, or your customer won't find
Colorado information when they typed in CO (ok, a trivial
example, but valid). Data Governance requires people, process,
and tools, and often requires organizational change.
Many companies would benefit more from improving the quality
and 'findability' of their data over piling more data into an already
inconsistent data store.
Data Governance Lifecycle
 Applying Data Governance to Big Data helps you to
 Understand the quality of your data
 Be able to categorize it into well-defined groupings, with
commonly shared definitions
 Be able to look at new data and categorize it into new or existing
groups
 Share it with your stakeholders
 Manage it over time
A Framework to gain perspective
 The following slides attempt to provide a framework for
understanding the lifecycle around information management
and understanding form the perspective of managing and
applying meaning to your data
Communication between
People and Processes
Data Governance
Life Cycle
VTO Management
Transactions
Content Creation &
Sourcing
Content +
Governed VTO
Vocabulary
Taxonomy
Ontology
(VTO)
Unstructured
Content
Structured
Content
 Content
VTO 
VTO
Content
VTO
Content Mining
& Classification
Analytics
Search
Vocabulary, Taxonomy, and
Ontology (VTO)
 Humans use systems of organization to make order of their
world
 Effective experiences with Big Data are driven by Subject Matter
Experts or machines categorizing content with a common
language that can be shared and understood by consumers of
the data
 Governed Vocabularies, Taxonomies, and Ontologies are the
pick-lists, hierarchies, and relationships that define content,
which Subject Matter Experts use to categorize, share, and
analyze data
Content Creation and Sourcing
 Content is created by people interacting with computer systems as well
as by machines generating data
 When you have more than one stream of data being produced by
different inputs, the rules for categorization differ between systems
 Understanding your data sources whether it’s one or more systems
require you know how the data is produced, and therefor how it can be
analyzed
 Big Data promises that you don’t need to know the meaning of your
input data as you collect it
 It doesn’t mean that you don’t need to define and understand it before
you begin to analyze it
 If you apply meaning and structure to your data, the quality of your
analysis will improve or even be possible
Content Mining and Classification
 Categorization of your data isn’t a one-time event unless your
analysis is a one-time event
 Subject Matter Experts need the ability to analyze new data,
and revisit old data to make sure nothing has changed
 Content Mining is a technique to bring understanding to your
data and how it fits to your view of the world
 Most Big Data Platforms are weak (today) in this area
 For Big Data, there is a disconnect in how vendors support
tooling from when we analyze our data and when we
categorize it and apply meaning
VTO Management
 Vocabularies, Taxonomies, and Ontologies require
management over time
 They are not done in isolation, requiring collaboration
between Subject Matter Experts and stakeholders
 They must be easily shared, versioned, and implemented
against your data
 Application of defined VTO’s against Big Data is a challenge
in current vendor offerings
Search, Transactions, Analytics
 Search – keyword or navigated searching through detailed or
aggregated data
 Transactions – adding data to an existing store via people or
machines
 Analytics – statistics, probabilities, creating models …
 Big, Medium, or Small data for each of these activities are
benefited by good categorization and application of VTO
standards
Conclusion
 As Big Data continues to gain momentum in the confusing
vendor marketplace, don’t loose sight of the basics, don’t
give in to unbounded promises of being able to analyze your
data to perfection without consideration of the end-goal of
why you are collecting this data in the first place -
To apply meaning and understanding to your problem at-hand,
and share it with people who can take fruitful action that results in
improvement

Mais conteúdo relacionado

Mais procurados

Dbms Introduction and Basics
Dbms Introduction and BasicsDbms Introduction and Basics
Dbms Introduction and BasicsSHIKHA GAUTAM
 
Getting started with Tableau
Getting started with TableauGetting started with Tableau
Getting started with TableauParth Acharya
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxAbdullahAbbasi55
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersMatillion
 
What is database.pptx
What is database.pptxWhat is database.pptx
What is database.pptxaftabjordan1
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?Thanakrit Lersmethasakul
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Outreach Digital
 

Mais procurados (20)

Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Dbms Introduction and Basics
Dbms Introduction and BasicsDbms Introduction and Basics
Dbms Introduction and Basics
 
Getting started with Tableau
Getting started with TableauGetting started with Tableau
Getting started with Tableau
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptx
 
Tableau ppt
Tableau pptTableau ppt
Tableau ppt
 
ELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it mattersELT vs. ETL - How they’re different and why it matters
ELT vs. ETL - How they’re different and why it matters
 
Kdd process
Kdd processKdd process
Kdd process
 
What is database.pptx
What is database.pptxWhat is database.pptx
What is database.pptx
 
RDBMS.
RDBMS.RDBMS.
RDBMS.
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
DBMS
DBMSDBMS
DBMS
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?How different between Big Data, Business Intelligence and Analytics ?
How different between Big Data, Business Intelligence and Analytics ?
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau
 

Semelhante a Big Data and Classification

Chief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - PresentationChief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - PresentationSrinivasan Sankar
 
DISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docxDISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docxcuddietheresa
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartHewlett Packard Enterprise Services
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...Enterprise Knowledge
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
enterprise-data-everywhere
enterprise-data-everywhereenterprise-data-everywhere
enterprise-data-everywhereBill Peer
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsNoel Hatch
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7Rohit Mittal
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementDATAVERSITY
 
GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378Parag Kapile
 

Semelhante a Big Data and Classification (20)

Taxonomy and seo sla 05-06-10(jc)
Taxonomy and seo   sla 05-06-10(jc)Taxonomy and seo   sla 05-06-10(jc)
Taxonomy and seo sla 05-06-10(jc)
 
Chief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - PresentationChief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - Presentation
 
Big Data : a 360° Overview
Big Data : a 360° Overview Big Data : a 360° Overview
Big Data : a 360° Overview
 
DISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docxDISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docx
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey SwartStreamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
Streamlining Your Path to Metadata Charlotte Robidoux Stacey Swart
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
Data Mining
Data MiningData Mining
Data Mining
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
enterprise-data-everywhere
enterprise-data-everywhereenterprise-data-everywhere
enterprise-data-everywhere
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata Management
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378GROUP PROJECT REPORT_FY6055_FX7378
GROUP PROJECT REPORT_FY6055_FX7378
 
Abstract
AbstractAbstract
Abstract
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 

Último

DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 

Último (20)

DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 

Big Data and Classification

  • 1. Big Data and Content Classification Paul Balas
  • 2. How to make meaning out of Big Data  Big Data as the poster-child for marketing of open-source software built-off alternative database storage structures has become a 'Big Nothing'. The ambiguity around what Big Data means requires endless hours of explanation and really only focuses on the problems around dealing with data containing large volumes, velocity, or variety (I'm waiting for more catchy v's such as Victory, or Value!). My perspective is around the phrase 'Big Understanding' which is an optimistic 'View' of making sense of our data and turning it into information. The focus has to shift.
  • 3. Classification = Relevance  No matter what vendors say, the better the classification and structure of your data, the better your search and analytical capabilities will be. Even tools that help with classification require custom rules and dictionaries, and they tend to be domain specific. If you want high quality Big Data, you need Data Governance.
  • 4. Data Governance = Big Quality  If you want a high-quality analysis, your data has to be standardized and consistent. This is especially true where there is a large degree of variety in your inputs. For example, if you have different geopolitical hierarchies for each input source, you have to align them into a standard, or your customer won't find Colorado information when they typed in CO (ok, a trivial example, but valid). Data Governance requires people, process, and tools, and often requires organizational change. Many companies would benefit more from improving the quality and 'findability' of their data over piling more data into an already inconsistent data store.
  • 5. Data Governance Lifecycle  Applying Data Governance to Big Data helps you to  Understand the quality of your data  Be able to categorize it into well-defined groupings, with commonly shared definitions  Be able to look at new data and categorize it into new or existing groups  Share it with your stakeholders  Manage it over time
  • 6. A Framework to gain perspective  The following slides attempt to provide a framework for understanding the lifecycle around information management and understanding form the perspective of managing and applying meaning to your data
  • 7. Communication between People and Processes Data Governance Life Cycle VTO Management Transactions Content Creation & Sourcing Content + Governed VTO Vocabulary Taxonomy Ontology (VTO) Unstructured Content Structured Content  Content VTO  VTO Content VTO Content Mining & Classification Analytics Search
  • 8. Vocabulary, Taxonomy, and Ontology (VTO)  Humans use systems of organization to make order of their world  Effective experiences with Big Data are driven by Subject Matter Experts or machines categorizing content with a common language that can be shared and understood by consumers of the data  Governed Vocabularies, Taxonomies, and Ontologies are the pick-lists, hierarchies, and relationships that define content, which Subject Matter Experts use to categorize, share, and analyze data
  • 9. Content Creation and Sourcing  Content is created by people interacting with computer systems as well as by machines generating data  When you have more than one stream of data being produced by different inputs, the rules for categorization differ between systems  Understanding your data sources whether it’s one or more systems require you know how the data is produced, and therefor how it can be analyzed  Big Data promises that you don’t need to know the meaning of your input data as you collect it  It doesn’t mean that you don’t need to define and understand it before you begin to analyze it  If you apply meaning and structure to your data, the quality of your analysis will improve or even be possible
  • 10. Content Mining and Classification  Categorization of your data isn’t a one-time event unless your analysis is a one-time event  Subject Matter Experts need the ability to analyze new data, and revisit old data to make sure nothing has changed  Content Mining is a technique to bring understanding to your data and how it fits to your view of the world  Most Big Data Platforms are weak (today) in this area  For Big Data, there is a disconnect in how vendors support tooling from when we analyze our data and when we categorize it and apply meaning
  • 11. VTO Management  Vocabularies, Taxonomies, and Ontologies require management over time  They are not done in isolation, requiring collaboration between Subject Matter Experts and stakeholders  They must be easily shared, versioned, and implemented against your data  Application of defined VTO’s against Big Data is a challenge in current vendor offerings
  • 12. Search, Transactions, Analytics  Search – keyword or navigated searching through detailed or aggregated data  Transactions – adding data to an existing store via people or machines  Analytics – statistics, probabilities, creating models …  Big, Medium, or Small data for each of these activities are benefited by good categorization and application of VTO standards
  • 13. Conclusion  As Big Data continues to gain momentum in the confusing vendor marketplace, don’t loose sight of the basics, don’t give in to unbounded promises of being able to analyze your data to perfection without consideration of the end-goal of why you are collecting this data in the first place - To apply meaning and understanding to your problem at-hand, and share it with people who can take fruitful action that results in improvement