SlideShare uma empresa Scribd logo
1 de 80
The Role of Community-Driven Data Curation for Enterprises Edward Curry, Andre Freitas, Seán O'Riain  ed.curry@deri.org http://www.deri.org/ http://www.EdwardCurry.org/
Speaker  Profile Research Scientist at the Digital Enterprise Research Institute (DERI) Leading international web science research organization Researching how web of data is changing way business work and interact with information Projects include studies of enterprise linked data, community-based data curation, semantic data analytics, and semantic search Investigate utilization within the pharmaceutical, oil & gas, financial, advertising, media, manufacturing, health care, ICT, and automotive industries Invited speaker at the 2010 MIT Sloan CIO Symposium to an audience of more than 600 CIOs
Web of Data
Acknowledgements Collaborators Andre Freitas & SeánO'Riain Insight from Thought Leaders Evan Sandhaus (Semantic Technologist), Rob Larson (Vice President Product Development and Management), and Gregg Fenton (Director Emerging Platforms) from the New York Times Krista Thomas (Vice President, Marketing & Communications), Tom Tague (OpenCalais initiative Lead) from Thomson Reuters Antony Williams (VP of Strategic Development ) from ChemSpider Helen Berman (Director), John Westbrook (Product Development) from the Protein Data Bank  Nick Lynch (Architect with AstraZeneca) from the Pistoia Alliance.  The work presented has been funded by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).
Further Information 	The Role of Community-Driven 	Data Curation for Enterprises Edward Curry, Andre Freitas, & Seán O'Riain In David Wood (ed.),  Linking Enterprise Data Springer, 2010. Available Free at:  http://3roundstones.com/led_book/led-curry-et-al.html
Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
The Business Need ,[object Object]
Access to the right information
Confidence in that informationWorking incomplete inaccurate, or wrong information can have disastrous consequences
The Problems with Data Flawed Data Effects 25% of critical data in world’s top companies (Gartner) Data Quality Recent banking crisis (Economist Dec’09) Inaccurate figures made it difficult to manage operations (investments exposure and risk) “asset are defined differently in different programs” “numbers did not always add up” “departments do not trust each other’s figures” “figures … not worth the pixels they were made of”
What is Data Curation? DigitalCuration  Selection, preservation, maintenance, collection, and archiving of digital assets DataCuration Active management of data over its life-cycle Data Curators Ensure data is trustworthy, discoverable, accessible, reusable, and fit for use Museum cataloguers of the Internet age
What is Data Curation? Data Governance Convergence of data quality, data management, business process management, and risk management Data Curation is a complimentary activity Part of overall data governance strategy for organization  Data Curator = Data Steward  ?? Overlapping terms between communities
Data Quality and Curation What is Data Quality? Desirable characteristics for information resource  Described as a series of quality dimensions Discoverability, Accessibility, Timeliness, Completeness, Interpretation, Accuracy, Consistency, Provenance & Reputation Data curation can be used to improve these quality dimensions
Data Quality and Curation Discoverability & Accessibility Curate to streamline search by storing and classifying in appropriate and consistent manner Accuracy Curate to ensure data correctly represents the “real-world” values it models Consistency Curate to ensure datacreated and maintained using standardized definitions, calculations, terms, and identifiers
Data Quality and Curation Provenance & Reputation Curate to track source of data and determine reputation Curate to include the objectivity of the source/producer Is the information unbiased, unprejudiced, and impartial? Or does it come from a reputable but partisan source? Other dimensions discussed in chapter
How to Curate Data Data Curation is a large field with sophisticated techniques and processes Sectionprovides high-leveloverview on: Should you curate data? Types of Curation Setting up a curation process Additional detail and references available in book chapter
Should You Curate Data? Curation can have multiple motivations Improving accessibility, quality, consistency,… Will the data benefit from curation? Identify business case Determine if potential return support investment Not all enterprise data should be curated Suits knowledge-centric data rather than transactional operations data
Types of Data Curation Multiple approaches to curate data, no single correct way Who? Individual Curators Curation Departments Community-based Curation How? Manual Curation (Semi-)Automated Sheer Curation
Types of Data Curation – Who? Individual Data Curators Suitable for infrequently changing small quantity of data  (<1,000 records) Minimal curation effort (minutes per record)
Types of Data Curation – Who? Curation Departments Curation experts working with subject matter experts to curate data within formal process Can deal with large curation effort (000’s of records) Limitations Scalability: Can struggle with large quantities of dynamic data (>million records)  Availability: Post-hoc nature creates delay incurated data availability
Types of Data Curation - Who? Community-Based Data Curation Decentralized approach to data curation Crowd-sourcing the curation process Leverages community of users to curate data  Wisdom of the community (crowd) Can scale to millions of records
Types of Data Curation – How? Manual Curation Curators directly manipulate data Can tie users up with low-value add activities (Sem-)Automated Curation Algorithms can (semi-)automate curation activities such as data cleansing, record duplication and classification Can be supervised or approved by human curators
Types of Data Curation – How? Sheer curation, or Curation at Source Curation activities integrated in normal workflow of those creating and managing data Can be as simple as vetting or “rating” the results of a curation algorithm Results can be available immediately Blended Approaches: Best of Both  Sheer curation +post hoc curation department Allows immediate access to curated data  Ensures quality control with expert curation
Setting up a Curation Process 5 Steps to setup a curation process: 1 - Identify what data you need to curate 2 - Identify who will curate the data 3 - Define the curation workflow 4 - Identity appropriate data-in & data-out formats 5 - Identify the artifacts, tools, and processes needed to support the curation process
Setting up a Curation Process Step 1: Identify what data you need to curate Newly created data and/or legacy data?  How is new data created?  Do users create the data, or is it imported from an external source?  How frequently is new data created/updated?  What quantity of data is created? How much legacy data exists? Is it stored within a single source, or scattered across multiple sources?
Setting up a Curation Process  Step 2: Identify who will curate the data Individuals, depts, groups, institutions,community Step 3: Define the curation workflow What curation activities are required? How will curation activities be carried out? Step 4: Identity suitable data-in & -out formats What is the best format for the data? Right format for receiving and publishing data is critical Support multiple formats to maximum participation
Setting up a Curation Process Step 5: Identify the artifacts, tools, and processes needed to support curation Workflow support/Community collaboration platforms Algorithms can (semi-)automate curation activities Major factors that influence approach: Quantity of data to be curated (new and legacy data) Amount of effort required to curate the data Frequency of data change / data dynamics Availability of experts
Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
Community–based Curation Two community approaches: Internal corporate communities External pre-competitive communities To determine the right model consider: What the purpose of the community is?  Will resulting curateddataset be publicly available? Or restricted?
Community–based Curation Internal Communities Taps potential of workforce to assist data curation Curate competitive enterprise data that will remain internal to the company May not always be the case e.g. product technical support and marketing data  Can work in conjunction with curation dept. Community governance typically follows the organization’s internal governance model
Pre-competitive Communities Pre-competitive collaboration Well-established technique for open innovation  Notable examples
What is Pre-Competitive Data? Two Types of Enterprise Data Propriety data for competitive advantage Common data with no competitive advantage What is pre-competitive data? Has little potential for differentiation Can be shared without conferring commercial advantage to competitor Common non-competitive data Needs to be maintaining and curated Companies duplicate effort in-house incurring full-cost
Pre-competitive Communities External pre-competitive communities Share costs, risks, and technical challenges Common curation tasks carried out once inpublic domain rather than multiple timesin each company Reduces cost required to provide and maintain data Can increase the quantity, quality, and access Focus turns to value-add competitive activity Move “competitive onus” from novel data to novel algorithms, shifting emphasis from “proprietary data” to a “proprietary understanding of data” e.g. Protein Data Bank and Pistoia Alliance in Pharma
External Pre-competitive Communities Two popular community models are Organization consortium Open community Organization consortium Operates like a private democratic club Usually closed community, members invited based on skill-set to contribute Output data - public or limited tomembers Consortiums follow a democratic process Member voting rights may reflect level of investment Larger players may be leaders of the consortium
External Pre-competitive Communities Open community Everyone can participate “Founder(s)” defines desired curation activity Seek public support to contribute to curation activates Wikipedia, Linux, and Apache are good examples of large open communities
Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
Wikipedia The World Largest Open Digital Curation Community
Wikipedia Open-source encyclopedia Collaboratively built by large community Challenges existing models of content creation More than 19,000,000 articles 270+ languages, 3,200,000+ articles in English More than 157,000 active contributors Studies show accuracy and stylistic formality are equivalent to resources developed in expert-based closed communities i.e. Columbia and Britannica encyclopedias
Wikipedia MediaWiki  Wiki platform behind Wikipedia Widespread and popular technology Wikis can also support data curation Lowers entry barriers for collaborative data curation Widely used inside organizations Intellipedia covering 16 U.S. Intelligence agencies Wiki Proteins,curatedProtein data for knowledge discovery and annotation
Wikipedia Decentralized environment supports creation of high quality information with: Social organization Artifacts, tools & processes for cooperative work coordination Wikipedia collaboration dynamics highlightgood practices
Wikipedia – Social Organization Any usercan edit its contents Without prior registration Does not lead to a chaotic scenario In practice highly scalable approach for high quality content creation on the Web Relies on simple but highly effective way to coordinate its curation process Curation is activity of Wikipedia admins Responsibility for information quality standards
Wikipedia – Social Organization Four main types of accounts: Anonymous users Identified by their associated IP address Registered users Users with an account in the Wikipedia website Administrators/Editors Registered users with additional permissions in the system Access to curation tools Bots  Programs that perform repetitive tasks
Wikipedia – Social Organization
Wikipedia – Social Organization Incentives Improvement of one’s reputation Sense of efficacy Contributing effectively to a meaningful project  Over time focus of editors typically change From curators of a few articles in specific topics  To more global curation perspective Enforcing quality assessment of Wikipedia as a whole
Wikipedia – Artifacts, Tools & Processes  Wiki Article Editor (Tool) WYSIWYG or markup text editor Talk Pages (Tool) Public arena for discussions around Wikipedia resources Watchlists (Tool) Helps curators to actively monitor the integrity and quality of resources they contribute Permission Mechanisms (Tool) Users with administrator status can perform critical actions such as remove pages and grant administrative permissions to new users
Wikipedia – Artifacts, Tools & Processes  Automated Edition (Tool) Bots are automated or semi-automated tools that perform repetitive tasks over content Page History and Restore (Tool) Historical trail of changes to a Wikipedia Resource Guidelines, Policies & Templates (Artifact) Defines curation guidelines for editors to assess article quality  Dispute Resolution (Process) Dispute mechanism between editors over the article contents Article Edition, Deletion, Merging, Redirection, Transwiking, Archival (Process) Describe the curation actions over Wikipedia resources
Wikipedia - DBPedia DBPedia Knowledge base Inherits massive volume of curated Wikipedia data Built using information info box properties Indirectly uses wiki as data curation platform DBPediaprovides direct access to data 3.4 million entities and 1 billion RDF triples Comprehensive data infrastructure  Concept URIs, definitions, and basic types
Wikipedia - DBPedia
The New York Times 100 Years of Expert Data Curation
The New York Times Largest metropolitan and third largest newspaper in the United States ,[object Object]
Most popular newspaper website in US
100 year old curated repository defining its participation in the emerging Web of Data,[object Object]
The New York Times  Index Department was created in 1913 Curation and cataloguingofNYT resources  Since 1851 NYT had low quality index for internal use Developed a comprehensive catalog using a controlled vocabulary Covering subjects, personal names, organizations, geographic locations and titles of creative works (books, movies, etc), linked to articles and their summaries Current Index Dept. has~15 people
The New York Times Challenges with consistently and accurately classifying news articles over time Keywords expressing subjects may show some variance due to cultural or legal constraints Identities of some entities, such as organizations and places, changed over time Controlled vocabulary grew to hundreds of thousands of categories Adding complexity to classification process
The New York Times Increased importance of Web drove need to improve categorization of online content Curation carried out by Index Department Library-time (days to weeks) Print edition can handle next-day index  Not suitable for real-time online publishing  nytimes.com needed a same-day index
The New York Times Introduced two stage curation process Editorial staff performed best-effort semi-automated sheer curation at point of online pub. Several hundreds journalists Index Department follow up with long-term accurate classification and archiving Benefits: Non-expert journalist curators provide instant accessibility to online users Index Department provides long-term high-quality curation in a “trust but verify” approach
NYT Curation Workflow  Curation starts with article getting out of the newsroom
NYT Curation Workflow  Member of editorial staff submits article to web-based rule based information extraction system (SAS Teragram)
NYT Curation Workflow  Teragram uses linguistic extraction rules based on subset of Index Dept’s controlled vocab.
NYT Curation Workflow  Teragram suggests tags based on the Index vocabulary that can potentially describe the content of article
NYT Curation Workflow  Editorial staff member selects terms that best describe the contents and inserts new tags if necessary
NYT Curation Workflow  Reviewed by the taxonomy managers with feedback to editorial staff on classification process
NYT Curation Workflow  Article is published online at nytimes.com
NYT Curation Workflow  At later stage article receives second level curation by Index Dept. additional Index tags and a summary
NYT Curation Workflow  Article is submitted to NYT Index
The New York Times Early adopter of Linked Open Data (June ‘09)
The New York Times Linked Open Data @ data.nytimes.com Subset of 10,000 tagsfrom index vocabulary Dataset of people, organizations & locations Complemented by search services to consume data about articles, movies, best sellers, Congress votes, real estate,… Benefits Improves traffic by third party data usage Lowers development cost of new applications for different verticals inside the website E.g. movies, travel, sports, books
Thomson Reuters Data Curation: A Core Business Competency
Thomson Reuters Thomson Reuters is an information provider Created by acquisition of Reuters by Thomson Over 50,000 employees Commercial presence in 100+ countries Provides specialist curated information and information-based services Selects most relevant information for customers Classifying, enriching and distributing it in a way that can be readily consumed
Thomson Reuters Curation process Working over approximately 1000 data sources Automatic tools provide first level triage and classification Refined by intervention of human curators Curator is a domain specialist Employs thousands of curators
Thomson Reuters OneCalais platform Reduces workload for classification ofcontent Natural Language Processingonunstructured text Automatically derives tags for analyzed content Enrichment with machine readable structured data Provides description of specific entities (places, people, events, facts) present in the text Open Calais (free version of OneCalais)  20.000+ users,>4 million trans per day CNET, CBS Interactive, The Huffington Post, The Powerhouse Museum of Science and Design,…
ChemSpider Structure centric chemical community  Over 300 data sources with 25 million records Provided by chemical vendors, government databases, private laboratories and individual Pharmarealizing benefits of open data Heavily leveraged by pharmaceutical companies as pre-competitive resources for experimental and clinical trial investigation  Glaxo Smith Kline made its proprietary malaria dataset of 13,500 compounds available
Protein Data Bank Dedicated to improving understanding of biological systems functions with 3-D structure of macromolecules  Started in 1971 with 3 core members Originally offered 7 crystal structures  Grown to 63,000 structures Over 300 million dataset downloads Expanded beyond curated data download service to include complex molecular visualized, search, and analysis capabilities
Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
Best Practices from Case Study Learning Social Best Practices Participation Engagement Incentives Community Governance Models Technical Best Practices Data Representation Human- andAutomatedCuration Track Provenance
Social Best Practices Participation Stakeholders involvement fordata producers and consumers must occur early in project Provides insight into basic questions of what they want to do, for whom, and what it will provide White papers are effective means to present these ideas, and solicit opinion from community Can be used to establish informal ‘social contract’ for community
Social Best Practices Engagement Outreach activities essential for promotion and feedback Typical consumers-to-contributors ratios of less than 5% Social communication and networking forums are useful Majority of community may not communicate using these media Communication by email still remains important
Social Best Practices Incentives Sheer curationneedsline of sight from data curating activity, to tangible exploitation benefits Lack of awareness of value proposition will slow emergence ofcollaborative contributions Recognizing contributing curators through a formal feedback mechanism Reinforces contribution culture Directly increases output quality
Social Best Practices Community Governance Models Effective governance structure is vital to ensure success of community  Internal communities and consortium perform well when they leverage traditional corporate and democratic governance models  Open communities need to engage the community within the governance process Follow less orthodox approaches using meritocratic and autocratic principles

Mais conteúdo relacionado

Mais procurados

TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...
TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...
TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...Pieter De Leenheer
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Camille Mathieu
 
challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkKamleshKumar394
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Big_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_ReddiboinaBig_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_ReddiboinaMadhu Reddiboina
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
A technical Introduction to Big Data Analytics
A technical Introduction to Big Data AnalyticsA technical Introduction to Big Data Analytics
A technical Introduction to Big Data AnalyticsPethuru Raj PhD
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesGregg Barrett
 
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?How COVID-19 is Accelerating Digital Transformation in Health and Social Care?
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?NUS-ISS
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataMicrosoft
 
Data Governance in the Big Data Era
Data Governance in the Big Data EraData Governance in the Big Data Era
Data Governance in the Big Data EraPieter De Leenheer
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataDavid Pittman
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightSunil Ranka
 

Mais procurados (20)

R180305120123
R180305120123R180305120123
R180305120123
 
TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...
TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...
TiE DC GovCon Panel on Emerging Technologies: AI/ML/Blockchain/Data Managemen...
 
Machine Learning for Data Management - Scenarios and Outlook
Machine Learning for Data Management - Scenarios and OutlookMachine Learning for Data Management - Scenarios and Outlook
Machine Learning for Data Management - Scenarios and Outlook
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
Metadata Standards and Organizational Resource Allocation: A Case for the Eff...
 
challenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing frameworkchallenges of big data to big data mining with their processing framework
challenges of big data to big data mining with their processing framework
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Big_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_ReddiboinaBig_Data_ML_Madhu_Reddiboina
Big_Data_ML_Madhu_Reddiboina
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
Analytics3.0 e book
Analytics3.0 e bookAnalytics3.0 e book
Analytics3.0 e book
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
A technical Introduction to Big Data Analytics
A technical Introduction to Big Data AnalyticsA technical Introduction to Big Data Analytics
A technical Introduction to Big Data Analytics
 
Big Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and ChallengesBig Data: Opportunities, Strategy and Challenges
Big Data: Opportunities, Strategy and Challenges
 
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?How COVID-19 is Accelerating Digital Transformation in Health and Social Care?
How COVID-19 is Accelerating Digital Transformation in Health and Social Care?
 
Analytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big dataAnalytics 3.0 Measurable business impact from analytics & big data
Analytics 3.0 Measurable business impact from analytics & big data
 
National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015National Conference - Big Data - 31 Jan 2015
National Conference - Big Data - 31 Jan 2015
 
Data Governance in the Big Data Era
Data Governance in the Big Data EraData Governance in the Big Data Era
Data Governance in the Big Data Era
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to Foresight
 

Destaque

Challenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial DataChallenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial DataEdward Curry
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTEdward Curry
 
Data Curation at the New York Times
Data Curation at the New York TimesData Curation at the New York Times
Data Curation at the New York TimesEdward Curry
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInstituto Familia y Adopción
 
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...Edward Curry
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementEdward Curry
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachEdward Curry
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) DataEdward Curry
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Edward Curry
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
 

Destaque (20)

Challenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial DataChallenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial Data
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
 
Data Curation at the New York Times
Data Curation at the New York TimesData Curation at the New York Times
Data Curation at the New York Times
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Influenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizajeInfluenciencia del mundo emocional en el aprendizaje
Influenciencia del mundo emocional en el aprendizaje
 
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and Trends
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy Management
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics Approach
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing Consumers
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) Data
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing Systems
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 

Semelhante a The Role of Community-Driven Data Curation for Enterprises

Cff data governance best practices
Cff data governance best practicesCff data governance best practices
Cff data governance best practicesBeth Fitzpatrick
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckBeth Fitzpatrick
 
How to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationHow to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationMichal Hodinka
 
Master Data-Driven Decision-Making in 2024
Master Data-Driven Decision-Making in 2024Master Data-Driven Decision-Making in 2024
Master Data-Driven Decision-Making in 2024USDSI
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachFindWhitePapers
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsAbhishek Sood
 
An examination of the ethical considerations involved in data analytics
An examination of the ethical considerations involved in data analyticsAn examination of the ethical considerations involved in data analytics
An examination of the ethical considerations involved in data analyticsUncodemy
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaChris Waller
 
Information Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionInformation Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionCapgemini
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfAbdulrahimShaibuIssa
 
MITS Advanced Research TechniquesResearch ProposalStudent’s Na
MITS Advanced Research TechniquesResearch ProposalStudent’s NaMITS Advanced Research TechniquesResearch ProposalStudent’s Na
MITS Advanced Research TechniquesResearch ProposalStudent’s NaEvonCanales257
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli
 
The value of big data analytics
The value of big data analyticsThe value of big data analytics
The value of big data analyticsMarc Vael
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxPrabhaJoshi4
 
DISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docxDISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docxcuddietheresa
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfcedrinemadera
 

Semelhante a The Role of Community-Driven Data Curation for Enterprises (20)

Cff data governance best practices
Cff data governance best practicesCff data governance best practices
Cff data governance best practices
 
Data Analytics Ethics: Issues and Questions (Arnie Aronoff, Ph.D.)
Data Analytics Ethics: Issues and Questions (Arnie Aronoff, Ph.D.)Data Analytics Ethics: Issues and Questions (Arnie Aronoff, Ph.D.)
Data Analytics Ethics: Issues and Questions (Arnie Aronoff, Ph.D.)
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
 
Working with data
Working with dataWorking with data
Working with data
 
How to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationHow to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organization
 
Master Data-Driven Decision-Making in 2024
Master Data-Driven Decision-Making in 2024Master Data-Driven Decision-Making in 2024
Master Data-Driven Decision-Making in 2024
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step Approach
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
 
An examination of the ethical considerations involved in data analytics
An examination of the ethical considerations involved in data analyticsAn examination of the ethical considerations involved in data analytics
An examination of the ethical considerations involved in data analytics
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
Information Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer SatisfactionInformation Governance: Reducing Costs and Increasing Customer Satisfaction
Information Governance: Reducing Costs and Increasing Customer Satisfaction
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
MITS Advanced Research TechniquesResearch ProposalStudent’s Na
MITS Advanced Research TechniquesResearch ProposalStudent’s NaMITS Advanced Research TechniquesResearch ProposalStudent’s Na
MITS Advanced Research TechniquesResearch ProposalStudent’s Na
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
The value of big data analytics
The value of big data analyticsThe value of big data analytics
The value of big data analytics
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
DISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docxDISCUSSION 15 4All students must review one (1) Group PowerP.docx
DISCUSSION 15 4All students must review one (1) Group PowerP.docx
 
Data quality management
Data quality managementData quality management
Data quality management
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdf
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

The Role of Community-Driven Data Curation for Enterprises

  • 1. The Role of Community-Driven Data Curation for Enterprises Edward Curry, Andre Freitas, Seán O'Riain ed.curry@deri.org http://www.deri.org/ http://www.EdwardCurry.org/
  • 2. Speaker Profile Research Scientist at the Digital Enterprise Research Institute (DERI) Leading international web science research organization Researching how web of data is changing way business work and interact with information Projects include studies of enterprise linked data, community-based data curation, semantic data analytics, and semantic search Investigate utilization within the pharmaceutical, oil & gas, financial, advertising, media, manufacturing, health care, ICT, and automotive industries Invited speaker at the 2010 MIT Sloan CIO Symposium to an audience of more than 600 CIOs
  • 4. Acknowledgements Collaborators Andre Freitas & SeánO'Riain Insight from Thought Leaders Evan Sandhaus (Semantic Technologist), Rob Larson (Vice President Product Development and Management), and Gregg Fenton (Director Emerging Platforms) from the New York Times Krista Thomas (Vice President, Marketing & Communications), Tom Tague (OpenCalais initiative Lead) from Thomson Reuters Antony Williams (VP of Strategic Development ) from ChemSpider Helen Berman (Director), John Westbrook (Product Development) from the Protein Data Bank Nick Lynch (Architect with AstraZeneca) from the Pistoia Alliance. The work presented has been funded by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).
  • 5. Further Information The Role of Community-Driven Data Curation for Enterprises Edward Curry, Andre Freitas, & Seán O'Riain In David Wood (ed.), Linking Enterprise Data Springer, 2010. Available Free at: http://3roundstones.com/led_book/led-curry-et-al.html
  • 6. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
  • 7.
  • 8. Access to the right information
  • 9. Confidence in that informationWorking incomplete inaccurate, or wrong information can have disastrous consequences
  • 10. The Problems with Data Flawed Data Effects 25% of critical data in world’s top companies (Gartner) Data Quality Recent banking crisis (Economist Dec’09) Inaccurate figures made it difficult to manage operations (investments exposure and risk) “asset are defined differently in different programs” “numbers did not always add up” “departments do not trust each other’s figures” “figures … not worth the pixels they were made of”
  • 11. What is Data Curation? DigitalCuration Selection, preservation, maintenance, collection, and archiving of digital assets DataCuration Active management of data over its life-cycle Data Curators Ensure data is trustworthy, discoverable, accessible, reusable, and fit for use Museum cataloguers of the Internet age
  • 12. What is Data Curation? Data Governance Convergence of data quality, data management, business process management, and risk management Data Curation is a complimentary activity Part of overall data governance strategy for organization Data Curator = Data Steward ?? Overlapping terms between communities
  • 13. Data Quality and Curation What is Data Quality? Desirable characteristics for information resource Described as a series of quality dimensions Discoverability, Accessibility, Timeliness, Completeness, Interpretation, Accuracy, Consistency, Provenance & Reputation Data curation can be used to improve these quality dimensions
  • 14. Data Quality and Curation Discoverability & Accessibility Curate to streamline search by storing and classifying in appropriate and consistent manner Accuracy Curate to ensure data correctly represents the “real-world” values it models Consistency Curate to ensure datacreated and maintained using standardized definitions, calculations, terms, and identifiers
  • 15. Data Quality and Curation Provenance & Reputation Curate to track source of data and determine reputation Curate to include the objectivity of the source/producer Is the information unbiased, unprejudiced, and impartial? Or does it come from a reputable but partisan source? Other dimensions discussed in chapter
  • 16. How to Curate Data Data Curation is a large field with sophisticated techniques and processes Sectionprovides high-leveloverview on: Should you curate data? Types of Curation Setting up a curation process Additional detail and references available in book chapter
  • 17. Should You Curate Data? Curation can have multiple motivations Improving accessibility, quality, consistency,… Will the data benefit from curation? Identify business case Determine if potential return support investment Not all enterprise data should be curated Suits knowledge-centric data rather than transactional operations data
  • 18. Types of Data Curation Multiple approaches to curate data, no single correct way Who? Individual Curators Curation Departments Community-based Curation How? Manual Curation (Semi-)Automated Sheer Curation
  • 19. Types of Data Curation – Who? Individual Data Curators Suitable for infrequently changing small quantity of data (<1,000 records) Minimal curation effort (minutes per record)
  • 20. Types of Data Curation – Who? Curation Departments Curation experts working with subject matter experts to curate data within formal process Can deal with large curation effort (000’s of records) Limitations Scalability: Can struggle with large quantities of dynamic data (>million records) Availability: Post-hoc nature creates delay incurated data availability
  • 21. Types of Data Curation - Who? Community-Based Data Curation Decentralized approach to data curation Crowd-sourcing the curation process Leverages community of users to curate data Wisdom of the community (crowd) Can scale to millions of records
  • 22. Types of Data Curation – How? Manual Curation Curators directly manipulate data Can tie users up with low-value add activities (Sem-)Automated Curation Algorithms can (semi-)automate curation activities such as data cleansing, record duplication and classification Can be supervised or approved by human curators
  • 23. Types of Data Curation – How? Sheer curation, or Curation at Source Curation activities integrated in normal workflow of those creating and managing data Can be as simple as vetting or “rating” the results of a curation algorithm Results can be available immediately Blended Approaches: Best of Both Sheer curation +post hoc curation department Allows immediate access to curated data Ensures quality control with expert curation
  • 24. Setting up a Curation Process 5 Steps to setup a curation process: 1 - Identify what data you need to curate 2 - Identify who will curate the data 3 - Define the curation workflow 4 - Identity appropriate data-in & data-out formats 5 - Identify the artifacts, tools, and processes needed to support the curation process
  • 25. Setting up a Curation Process Step 1: Identify what data you need to curate Newly created data and/or legacy data? How is new data created? Do users create the data, or is it imported from an external source? How frequently is new data created/updated? What quantity of data is created? How much legacy data exists? Is it stored within a single source, or scattered across multiple sources?
  • 26. Setting up a Curation Process Step 2: Identify who will curate the data Individuals, depts, groups, institutions,community Step 3: Define the curation workflow What curation activities are required? How will curation activities be carried out? Step 4: Identity suitable data-in & -out formats What is the best format for the data? Right format for receiving and publishing data is critical Support multiple formats to maximum participation
  • 27. Setting up a Curation Process Step 5: Identify the artifacts, tools, and processes needed to support curation Workflow support/Community collaboration platforms Algorithms can (semi-)automate curation activities Major factors that influence approach: Quantity of data to be curated (new and legacy data) Amount of effort required to curate the data Frequency of data change / data dynamics Availability of experts
  • 28. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
  • 29. Community–based Curation Two community approaches: Internal corporate communities External pre-competitive communities To determine the right model consider: What the purpose of the community is? Will resulting curateddataset be publicly available? Or restricted?
  • 30. Community–based Curation Internal Communities Taps potential of workforce to assist data curation Curate competitive enterprise data that will remain internal to the company May not always be the case e.g. product technical support and marketing data Can work in conjunction with curation dept. Community governance typically follows the organization’s internal governance model
  • 31. Pre-competitive Communities Pre-competitive collaboration Well-established technique for open innovation Notable examples
  • 32. What is Pre-Competitive Data? Two Types of Enterprise Data Propriety data for competitive advantage Common data with no competitive advantage What is pre-competitive data? Has little potential for differentiation Can be shared without conferring commercial advantage to competitor Common non-competitive data Needs to be maintaining and curated Companies duplicate effort in-house incurring full-cost
  • 33. Pre-competitive Communities External pre-competitive communities Share costs, risks, and technical challenges Common curation tasks carried out once inpublic domain rather than multiple timesin each company Reduces cost required to provide and maintain data Can increase the quantity, quality, and access Focus turns to value-add competitive activity Move “competitive onus” from novel data to novel algorithms, shifting emphasis from “proprietary data” to a “proprietary understanding of data” e.g. Protein Data Bank and Pistoia Alliance in Pharma
  • 34. External Pre-competitive Communities Two popular community models are Organization consortium Open community Organization consortium Operates like a private democratic club Usually closed community, members invited based on skill-set to contribute Output data - public or limited tomembers Consortiums follow a democratic process Member voting rights may reflect level of investment Larger players may be leaders of the consortium
  • 35. External Pre-competitive Communities Open community Everyone can participate “Founder(s)” defines desired curation activity Seek public support to contribute to curation activates Wikipedia, Linux, and Apache are good examples of large open communities
  • 36. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
  • 37. Wikipedia The World Largest Open Digital Curation Community
  • 38. Wikipedia Open-source encyclopedia Collaboratively built by large community Challenges existing models of content creation More than 19,000,000 articles 270+ languages, 3,200,000+ articles in English More than 157,000 active contributors Studies show accuracy and stylistic formality are equivalent to resources developed in expert-based closed communities i.e. Columbia and Britannica encyclopedias
  • 39. Wikipedia MediaWiki Wiki platform behind Wikipedia Widespread and popular technology Wikis can also support data curation Lowers entry barriers for collaborative data curation Widely used inside organizations Intellipedia covering 16 U.S. Intelligence agencies Wiki Proteins,curatedProtein data for knowledge discovery and annotation
  • 40. Wikipedia Decentralized environment supports creation of high quality information with: Social organization Artifacts, tools & processes for cooperative work coordination Wikipedia collaboration dynamics highlightgood practices
  • 41. Wikipedia – Social Organization Any usercan edit its contents Without prior registration Does not lead to a chaotic scenario In practice highly scalable approach for high quality content creation on the Web Relies on simple but highly effective way to coordinate its curation process Curation is activity of Wikipedia admins Responsibility for information quality standards
  • 42. Wikipedia – Social Organization Four main types of accounts: Anonymous users Identified by their associated IP address Registered users Users with an account in the Wikipedia website Administrators/Editors Registered users with additional permissions in the system Access to curation tools Bots Programs that perform repetitive tasks
  • 43. Wikipedia – Social Organization
  • 44. Wikipedia – Social Organization Incentives Improvement of one’s reputation Sense of efficacy Contributing effectively to a meaningful project Over time focus of editors typically change From curators of a few articles in specific topics To more global curation perspective Enforcing quality assessment of Wikipedia as a whole
  • 45. Wikipedia – Artifacts, Tools & Processes Wiki Article Editor (Tool) WYSIWYG or markup text editor Talk Pages (Tool) Public arena for discussions around Wikipedia resources Watchlists (Tool) Helps curators to actively monitor the integrity and quality of resources they contribute Permission Mechanisms (Tool) Users with administrator status can perform critical actions such as remove pages and grant administrative permissions to new users
  • 46. Wikipedia – Artifacts, Tools & Processes Automated Edition (Tool) Bots are automated or semi-automated tools that perform repetitive tasks over content Page History and Restore (Tool) Historical trail of changes to a Wikipedia Resource Guidelines, Policies & Templates (Artifact) Defines curation guidelines for editors to assess article quality Dispute Resolution (Process) Dispute mechanism between editors over the article contents Article Edition, Deletion, Merging, Redirection, Transwiking, Archival (Process) Describe the curation actions over Wikipedia resources
  • 47. Wikipedia - DBPedia DBPedia Knowledge base Inherits massive volume of curated Wikipedia data Built using information info box properties Indirectly uses wiki as data curation platform DBPediaprovides direct access to data 3.4 million entities and 1 billion RDF triples Comprehensive data infrastructure Concept URIs, definitions, and basic types
  • 48.
  • 50. The New York Times 100 Years of Expert Data Curation
  • 51.
  • 52. Most popular newspaper website in US
  • 53.
  • 54. The New York Times  Index Department was created in 1913 Curation and cataloguingofNYT resources Since 1851 NYT had low quality index for internal use Developed a comprehensive catalog using a controlled vocabulary Covering subjects, personal names, organizations, geographic locations and titles of creative works (books, movies, etc), linked to articles and their summaries Current Index Dept. has~15 people
  • 55. The New York Times Challenges with consistently and accurately classifying news articles over time Keywords expressing subjects may show some variance due to cultural or legal constraints Identities of some entities, such as organizations and places, changed over time Controlled vocabulary grew to hundreds of thousands of categories Adding complexity to classification process
  • 56. The New York Times Increased importance of Web drove need to improve categorization of online content Curation carried out by Index Department Library-time (days to weeks) Print edition can handle next-day index Not suitable for real-time online publishing nytimes.com needed a same-day index
  • 57. The New York Times Introduced two stage curation process Editorial staff performed best-effort semi-automated sheer curation at point of online pub. Several hundreds journalists Index Department follow up with long-term accurate classification and archiving Benefits: Non-expert journalist curators provide instant accessibility to online users Index Department provides long-term high-quality curation in a “trust but verify” approach
  • 58. NYT Curation Workflow Curation starts with article getting out of the newsroom
  • 59. NYT Curation Workflow Member of editorial staff submits article to web-based rule based information extraction system (SAS Teragram)
  • 60. NYT Curation Workflow Teragram uses linguistic extraction rules based on subset of Index Dept’s controlled vocab.
  • 61. NYT Curation Workflow Teragram suggests tags based on the Index vocabulary that can potentially describe the content of article
  • 62. NYT Curation Workflow Editorial staff member selects terms that best describe the contents and inserts new tags if necessary
  • 63. NYT Curation Workflow Reviewed by the taxonomy managers with feedback to editorial staff on classification process
  • 64. NYT Curation Workflow Article is published online at nytimes.com
  • 65. NYT Curation Workflow At later stage article receives second level curation by Index Dept. additional Index tags and a summary
  • 66. NYT Curation Workflow Article is submitted to NYT Index
  • 67. The New York Times Early adopter of Linked Open Data (June ‘09)
  • 68. The New York Times Linked Open Data @ data.nytimes.com Subset of 10,000 tagsfrom index vocabulary Dataset of people, organizations & locations Complemented by search services to consume data about articles, movies, best sellers, Congress votes, real estate,… Benefits Improves traffic by third party data usage Lowers development cost of new applications for different verticals inside the website E.g. movies, travel, sports, books
  • 69. Thomson Reuters Data Curation: A Core Business Competency
  • 70. Thomson Reuters Thomson Reuters is an information provider Created by acquisition of Reuters by Thomson Over 50,000 employees Commercial presence in 100+ countries Provides specialist curated information and information-based services Selects most relevant information for customers Classifying, enriching and distributing it in a way that can be readily consumed
  • 71. Thomson Reuters Curation process Working over approximately 1000 data sources Automatic tools provide first level triage and classification Refined by intervention of human curators Curator is a domain specialist Employs thousands of curators
  • 72. Thomson Reuters OneCalais platform Reduces workload for classification ofcontent Natural Language Processingonunstructured text Automatically derives tags for analyzed content Enrichment with machine readable structured data Provides description of specific entities (places, people, events, facts) present in the text Open Calais (free version of OneCalais) 20.000+ users,>4 million trans per day CNET, CBS Interactive, The Huffington Post, The Powerhouse Museum of Science and Design,…
  • 73. ChemSpider Structure centric chemical community Over 300 data sources with 25 million records Provided by chemical vendors, government databases, private laboratories and individual Pharmarealizing benefits of open data Heavily leveraged by pharmaceutical companies as pre-competitive resources for experimental and clinical trial investigation Glaxo Smith Kline made its proprietary malaria dataset of 13,500 compounds available
  • 74. Protein Data Bank Dedicated to improving understanding of biological systems functions with 3-D structure of macromolecules Started in 1971 with 3 core members Originally offered 7 crystal structures Grown to 63,000 structures Over 300 million dataset downloads Expanded beyond curated data download service to include complex molecular visualized, search, and analysis capabilities
  • 75. Overview Curation Background The Business Need for Curated Data What is Data Curation? Data Quality and Curation How to Curate Data Curation Communities and Enterprise Data Case Studies Wikipedia, The New York Times, Thomson Reuters, ChemSpider, Protein Data Bank Best Practices from Case Study Learning 
  • 76. Best Practices from Case Study Learning Social Best Practices Participation Engagement Incentives Community Governance Models Technical Best Practices Data Representation Human- andAutomatedCuration Track Provenance
  • 77. Social Best Practices Participation Stakeholders involvement fordata producers and consumers must occur early in project Provides insight into basic questions of what they want to do, for whom, and what it will provide White papers are effective means to present these ideas, and solicit opinion from community Can be used to establish informal ‘social contract’ for community
  • 78. Social Best Practices Engagement Outreach activities essential for promotion and feedback Typical consumers-to-contributors ratios of less than 5% Social communication and networking forums are useful Majority of community may not communicate using these media Communication by email still remains important
  • 79. Social Best Practices Incentives Sheer curationneedsline of sight from data curating activity, to tangible exploitation benefits Lack of awareness of value proposition will slow emergence ofcollaborative contributions Recognizing contributing curators through a formal feedback mechanism Reinforces contribution culture Directly increases output quality
  • 80. Social Best Practices Community Governance Models Effective governance structure is vital to ensure success of community Internal communities and consortium perform well when they leverage traditional corporate and democratic governance models Open communities need to engage the community within the governance process Follow less orthodox approaches using meritocratic and autocratic principles
  • 81. Technical Best Practices Data Representation Must be robust and standardized to encourage community usage and tools development Support for legacy data formats and ability to translate data forward to support new technology and standards Human & Automated Curation Balancing will improve data quality Automated curation should always defer to, and never override, human curation edits Automate validating data deposition and entry Target community at focused curation tasks
  • 82. Technical Best Practices Track Provenance All curation activities should be recorded and maintained as part data provenance effort Especially where human curators are involved Users can have different perspectives of provenance A scientist may need to evaluate the fine grained experiment description behind the data For a business analyst the ’brand’ of data provider can be sufficient for determining quality
  • 83. Conclusions Data curation can ensure the quality of data and its fitness for use Pre-competitive data can be shared without conferring a commercial advantage Pre-competitive data communities Common curation tasks carried out once in public domain Reduces cost, increase quantity and quality