SlideShare uma empresa Scribd logo
1 de 15
Reasoning Over Big Data Stores
Eric Little, PhD
VP Data Science
Polytechnic School of Engineering - NYU
eric.little@osthus.com
Slide 2
Who We Are & What We Do
OSTHUS, Inc. is the U.S.
subsidiary of OSTHUS GmbH
Global presence - offices in
Germany, U.S. & China
Provide advanced solutions,
consulting and technology
services for Pharmaceutical and
Biotech R&D
 Technology provider for the
Allotrope effort, globally aligning
several pharma and biotech
companies
Slide 3
Semantic Technologies – Smart Data Piece
Semantic Technologies
 Provide several important features for emerging new technologies
• Controlled vocabularies
• Taxonomies
• Metadata structures
• Ontology models
• Logical inference
Data today continues to evolve and grow in both size and complexity.
We need hybrid solutions that can provide real insights
 Analytics is growing into a new kind of field – Data Science
 Is data science about interacting with machines or humans?
 Must be able to strike a balance between complexity of the data and
simplicity of the presentation to the user
Slide 4
Metadata, Reference Data & Master Data
• While often lumped together, these are distinct kinds of data
• Semantic Technologies can help with the organization of these
kinds of data – but should not be done in isolation
• Scalability is achieved using complementary approaches
Increasedconceptualcomplexity
IncreasedScalabilityIssues
Slide 5
Graphs are good for information –
not so good for high-bandwidth
applications where speed and
scalability are the primary drivers.
Can require highly specialized
hardware, software techniques or
engineers
Semantics should be confined to
the metadata aspects of the
problem – use other tech for the
rest
Where Semantics Can Fall Short
Slide 6
Big Data is a real challenge –
but starting to become a buzz
word
 Many “Big Data Problems”
can be reduced to smaller
data problems
Applications exist that require
complex inferencing over very
large data sets
 A current client has lab
readings from 40,000+
devices
How to do this effectively?
The Big Data Problem
Slide 7
Why Not Just Build the Data Lake?
Data lakes are fine when you
are gathering and storing the
data
 What happens later on when
a lot of data is in there?
The benefits are that data can
stay in its original form – no
real ETL
But running analytics across
disparate stores is very
challenging
“Without metadata, every
subsequent use of data means
analysts start from scratch.”
(Gartner 2014)
Slide 8
Reasoning Over Big Data Is A Growing Topic
There has been an inordinate amount of time and energy spent on
just queries.
 This is not reasoning though – it is just retrieval
What is Reasoning?
 More than just automated query sets run in sequence or parallel
 Reasoning is about inferring new information that isn’t in the raw data.
 It is a heuristic – where one discovers or learns something new for
themselves
 Deductive, Inductive, Abductive
Slide 9
Logical Reasoning (does
not always assume set
theory)
Mathematical Reasoning
(which is logical
reasoning, but assumes
set theory as the basis)
9
Types of Reasoning One Can Use
Slide 10
Reasoning Evolution
Slide 11
Types of Semantic Inference (Forward and
Backward Chaining)
Uses Modus Ponens
Finds a T consequent and
affirms related antecedent
(verifies connection)
Uses Modus Ponens
Finds a T antecedent & affirms a
related consequent (new
knowledge)
Slide 12
Ontology Layering Is Important for Scale
Data Source Models
Multi- & Single-Source Data
Integration Models
Domain Models (Objs, Attributes,
Process & Relations)
System Lvl Models (Rules)
DataTraceability(Provenance)
UserDrivenOntologies
Upper-Lvl Models
Meta-data
Levels
(Human
Concepts)
Data-centric
Levels
(Machine
Language)
Metaphysics – not just data models
Data Sources connected directly to higher classifications
Federation allows for improved scale
Slide 13
Get your semantics experts and your big data scientists on the same
page
 Utilize tables where possible – avoid multi-node graph hops
 Use graphs for metadata – leave instance data in place when possible
 Large graphs should be avoided
 Lots of columns and rows are fine – joins across tables are not
 Break graph information into other formats wherever possible
Pre-compute phases are important
 Pre-compute multi-table joins based on SME input, known semantic
patterns, business rules/logics, etc.
 Use statistical methods to cluster data (e.g., normalcy calcs)
Use the tech that is right for the job
Combining Semantics and NoSQL
Slide 14
One Example of Using RDF in Cloud-scalable
Applications
Example of a current approach being used – there are others
Can scale across multiple cloud nodes (where TS’s have issues)
Triples are indexed items
THANK YOU – QUESTIONS?
Eric Little, PhD
VP Data Science
OSTHUS, Inc.
eric.little@osthus.com
(M) 321-480-4818
www.linkedin.com/pub/eric-little

Mais conteúdo relacionado

Mais procurados

Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...James Miranda
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionDr. Haxel Consult
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & VisualizationTreparel
 
Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel
 
IC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointIC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointDr. Haxel Consult
 
Writing a Databases Research Paper
Writing a Databases Research PaperWriting a Databases Research Paper
Writing a Databases Research PaperDamian T. Gordon
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data securitybalasahebcomp
 
Managing sensitive applications in the public cloud
Managing sensitive applications in the public cloudManaging sensitive applications in the public cloud
Managing sensitive applications in the public cloudieeepondy
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —swethaT16
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaBabasab Patil
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 

Mais procurados (20)

Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
Paper 192. in CISTI 2021: OntoDRE: An Ontology For The Requirements...
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Paper presentation
Paper presentationPaper presentation
Paper presentation
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization
 
Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014Treparel - KMX Patent Analytics 2014
Treparel - KMX Patent Analytics 2014
 
IC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage PointIC-SDV 2019: Search Technology / Vantage Point
IC-SDV 2019: Search Technology / Vantage Point
 
Writing a Databases Research Paper
Writing a Databases Research PaperWriting a Databases Research Paper
Writing a Databases Research Paper
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data security
 
Managing sensitive applications in the public cloud
Managing sensitive applications in the public cloudManaging sensitive applications in the public cloud
Managing sensitive applications in the public cloud
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
SciBite
SciBiteSciBite
SciBite
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 

Semelhante a Reasoning over big data

Reinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & FasterReinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & FasterOSTHUS
 
Why Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies PossesWhy Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies PossesOSTHUS
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the datamark madsen
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming DatacentricTimothy Cook
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...WiMLDSMontreal
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfGraceOkeke3
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration StackPierre Brunelle
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration James Hendler
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6varshakumar21
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...PhD Assistance
 
How to Start Doing Data Science
How to Start Doing Data ScienceHow to Start Doing Data Science
How to Start Doing Data ScienceAyodele Odubela
 

Semelhante a Reasoning over big data (20)

Reinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & FasterReinventing Laboratory Data To Be Bigger, Smarter & Faster
Reinventing Laboratory Data To Be Bigger, Smarter & Faster
 
Why Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies PossesWhy Data is Becoming the Most Valuable Asset Companies Posses
Why Data is Becoming the Most Valuable Asset Companies Posses
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the data
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration Stack
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration
The Rensselaer IDEA: Data Exploration
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 
How to Start Doing Data Science
How to Start Doing Data ScienceHow to Start Doing Data Science
How to Start Doing Data Science
 

Mais de OSTHUS

The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data OSTHUS
 
Challenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life SciencesChallenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life SciencesOSTHUS
 
From allotrope to reference master data management
From allotrope to reference master data management From allotrope to reference master data management
From allotrope to reference master data management OSTHUS
 
Early AI Adoption Via Advanced Analytics
Early AI Adoption Via  Advanced AnalyticsEarly AI Adoption Via  Advanced Analytics
Early AI Adoption Via Advanced AnalyticsOSTHUS
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
 
Why paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labWhy paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labOSTHUS
 
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016OSTHUS
 
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...OSTHUS
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
 
Best Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data CurationBest Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data CurationOSTHUS
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...OSTHUS
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS
 
Data Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy dataData Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy dataOSTHUS
 
Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?OSTHUS
 

Mais de OSTHUS (14)

The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data The Fast Track to Fair Lab Data
The Fast Track to Fair Lab Data
 
Challenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life SciencesChallenges & Opportunities of Implementation FAIR in Life Sciences
Challenges & Opportunities of Implementation FAIR in Life Sciences
 
From allotrope to reference master data management
From allotrope to reference master data management From allotrope to reference master data management
From allotrope to reference master data management
 
Early AI Adoption Via Advanced Analytics
Early AI Adoption Via  Advanced AnalyticsEarly AI Adoption Via  Advanced Analytics
Early AI Adoption Via Advanced Analytics
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
Why paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart labWhy paperless lab is just the first step towards a smart lab
Why paperless lab is just the first step towards a smart lab
 
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016
 
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
Semantics for Integrated Analytical Laboratory Processes – the Allotrope Pers...
 
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...Semantics for integrated laboratory analytical processes - The Allotrope Pers...
Semantics for integrated laboratory analytical processes - The Allotrope Pers...
 
Best Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data CurationBest Practice Reference Architecture for Data Curation
Best Practice Reference Architecture for Data Curation
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
 
Data Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy dataData Quality- How to clean up your legacy data
Data Quality- How to clean up your legacy data
 
Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?Data Quality- How to clean up your legacy data?
Data Quality- How to clean up your legacy data?
 

Último

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 

Último (20)

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 

Reasoning over big data

  • 1. Reasoning Over Big Data Stores Eric Little, PhD VP Data Science Polytechnic School of Engineering - NYU eric.little@osthus.com
  • 2. Slide 2 Who We Are & What We Do OSTHUS, Inc. is the U.S. subsidiary of OSTHUS GmbH Global presence - offices in Germany, U.S. & China Provide advanced solutions, consulting and technology services for Pharmaceutical and Biotech R&D  Technology provider for the Allotrope effort, globally aligning several pharma and biotech companies
  • 3. Slide 3 Semantic Technologies – Smart Data Piece Semantic Technologies  Provide several important features for emerging new technologies • Controlled vocabularies • Taxonomies • Metadata structures • Ontology models • Logical inference Data today continues to evolve and grow in both size and complexity. We need hybrid solutions that can provide real insights  Analytics is growing into a new kind of field – Data Science  Is data science about interacting with machines or humans?  Must be able to strike a balance between complexity of the data and simplicity of the presentation to the user
  • 4. Slide 4 Metadata, Reference Data & Master Data • While often lumped together, these are distinct kinds of data • Semantic Technologies can help with the organization of these kinds of data – but should not be done in isolation • Scalability is achieved using complementary approaches Increasedconceptualcomplexity IncreasedScalabilityIssues
  • 5. Slide 5 Graphs are good for information – not so good for high-bandwidth applications where speed and scalability are the primary drivers. Can require highly specialized hardware, software techniques or engineers Semantics should be confined to the metadata aspects of the problem – use other tech for the rest Where Semantics Can Fall Short
  • 6. Slide 6 Big Data is a real challenge – but starting to become a buzz word  Many “Big Data Problems” can be reduced to smaller data problems Applications exist that require complex inferencing over very large data sets  A current client has lab readings from 40,000+ devices How to do this effectively? The Big Data Problem
  • 7. Slide 7 Why Not Just Build the Data Lake? Data lakes are fine when you are gathering and storing the data  What happens later on when a lot of data is in there? The benefits are that data can stay in its original form – no real ETL But running analytics across disparate stores is very challenging “Without metadata, every subsequent use of data means analysts start from scratch.” (Gartner 2014)
  • 8. Slide 8 Reasoning Over Big Data Is A Growing Topic There has been an inordinate amount of time and energy spent on just queries.  This is not reasoning though – it is just retrieval What is Reasoning?  More than just automated query sets run in sequence or parallel  Reasoning is about inferring new information that isn’t in the raw data.  It is a heuristic – where one discovers or learns something new for themselves  Deductive, Inductive, Abductive
  • 9. Slide 9 Logical Reasoning (does not always assume set theory) Mathematical Reasoning (which is logical reasoning, but assumes set theory as the basis) 9 Types of Reasoning One Can Use
  • 11. Slide 11 Types of Semantic Inference (Forward and Backward Chaining) Uses Modus Ponens Finds a T consequent and affirms related antecedent (verifies connection) Uses Modus Ponens Finds a T antecedent & affirms a related consequent (new knowledge)
  • 12. Slide 12 Ontology Layering Is Important for Scale Data Source Models Multi- & Single-Source Data Integration Models Domain Models (Objs, Attributes, Process & Relations) System Lvl Models (Rules) DataTraceability(Provenance) UserDrivenOntologies Upper-Lvl Models Meta-data Levels (Human Concepts) Data-centric Levels (Machine Language) Metaphysics – not just data models Data Sources connected directly to higher classifications Federation allows for improved scale
  • 13. Slide 13 Get your semantics experts and your big data scientists on the same page  Utilize tables where possible – avoid multi-node graph hops  Use graphs for metadata – leave instance data in place when possible  Large graphs should be avoided  Lots of columns and rows are fine – joins across tables are not  Break graph information into other formats wherever possible Pre-compute phases are important  Pre-compute multi-table joins based on SME input, known semantic patterns, business rules/logics, etc.  Use statistical methods to cluster data (e.g., normalcy calcs) Use the tech that is right for the job Combining Semantics and NoSQL
  • 14. Slide 14 One Example of Using RDF in Cloud-scalable Applications Example of a current approach being used – there are others Can scale across multiple cloud nodes (where TS’s have issues) Triples are indexed items
  • 15. THANK YOU – QUESTIONS? Eric Little, PhD VP Data Science OSTHUS, Inc. eric.little@osthus.com (M) 321-480-4818 www.linkedin.com/pub/eric-little