SlideShare a Scribd company logo
1 of 4
Download to read offline
healthDB: A Primer
                                  Parag Patel, Shahid Shah



                                        Overview

healthDB is a incrementally scalable, fault-tolerant, ACID compliant, key/value
document based database designed to hold huge amounts of data and has high
throughput read/writes and high availability. It is based on an open source project
called couchDB. It is designed to be a data warehouse for the disparate systems that
might be part of a healthcare practice or hospital. Due to it!s semi-structured data
storage nature, it can hold data of any type. The end user need not worry about
structuring the data in the data warehouse; the data will be stored in the warehouse for
future extraction and structuring as the user sees fit. Future versions of healthDB will
help the end user structure data from the semi-structured state it is in. Conceptually
one can think of lazy evaluation in scheme, lisp, haskell. Once the user knows the
structure they want to put the data in, it will be a cinch to implement the structure in
healthDB.

The design of database encompasses a “just works” philosophy. The database should
work as advertised. The end user should only have to worry about building their
application or service, instead of worrying about the storage of there data and
performance. Most of the traditional work that a DBA has done will be done by
healthDB. All the end user has to do is start it up initially and add additional servers as
the healthDB will dictate in order scale. HealthDB will have a connector engine, that will
connect to common interfaces such as HL7, JMS, ODBC, various delimited file formats,
and has the ability to develop custom connectors to connect to unusual interfaces.
HealthDB will support in the future health query language (HQL) (as an external or
internal component tbd), will allow them to search all their structured and semi-
structured data to find knowledge they seek in a health domain. HealthDB will come
with some sample applications to show end users just the power it holds.

                                      Architecture

healthDB uses couchDB to primarily take care of the low level storage. It
communicates to couchDB (couchDB might need to be modified for encryption) using
encrypted REST. A diagram shows the basic outline of healthDB.
healthDB



                                healthDB engine




                                    couchDB




The healthDB engine is the main control unit of the healthDB. It has a job of ensuring
the user can store data in a seamless fashion. It takes care of such task as automatic
partitioning, replication, encryption of the data, automatic load balancing, automatic
system backup, error logging.



                                   healthDB engine

The healthDB engine is made of up various components such as the partitioner,
replicator, connector engine, healthCPU, security, and healthDB API (healthSearch will
be additional component, it is undetermined whether it should sit in the healthDB engine
or couchdB. We shall look at each component of the engine briefly. Note: additional
components maybe added, components maybe merged or deleted.

healthDB API

Provides the healthDB interface to the outside world. It will be the only way to
communicate with the database, Multiple API should be developed such as python,
ruby, java, C#, REST.

connector engine

This connector engine allows data from a variety of different formats to be converted to
a format that healthDB can understand while preserving integrity.
healthCPU

This is the brain of the healthDB database. It controls when the healthDB should
replicate data and when it should partition data. It does the job of the looking up data in
the datastore (couchDB), formatting, structuring, and semi-structuring data that will be
stored in the datastore. It ensures that data HIPPA compliant, by having he security
component encrypt it. HealthCPU also maintains which nodes are alive and what the
status is. It does the job of load balancing. Filters out data based on the users
permissions.

security

This performs the encryption, authentication, and tells the healthCPU the user has
permission to certain data or not.

replicator

Creates a new database replication based on what the healthCPU tells it.

partitioner

Creates new partitions on the data and places the data on server(s) the healthCPU
specifies.

Diagram of the healthDB engine below.




                           healthDB engine


                             healthDB API



                           connector engine



                              healthCPU



                               security



              replicator                      partitioner
Storage Structure

The healthCPU will store unstructured data as follows. It will have a series of
documents that keep track of data from various sources. Each source will have its own
document(s). The document will contain (key,values) for
(hash(document_sourcesystem_objectID),document_sourcesystem_objectID). A record
from a source system will be store in its own separate document which will have system
values such as last modified date, and the actual data itself. The record will be called a
DBobject. The document name will be used to identify the DBobject.

Other entities like DBobject can be created. We might have a person entity, which
would be identified by document_person_personID. Very similar to the DBobject
concept in which a series of documents contain references or indexes to the actual
records.

More Related Content

What's hot

Applied systems
Applied systemsApplied systems
Applied systems
yuarchu
 

What's hot (18)

3 dw architectures
3 dw architectures3 dw architectures
3 dw architectures
 
Managing data resources
Managing  data resourcesManaging  data resources
Managing data resources
 
JPJ1421 Facilitating Document Annotation Using Content and Querying Value
JPJ1421  Facilitating Document Annotation Using Content and Querying ValueJPJ1421  Facilitating Document Annotation Using Content and Querying Value
JPJ1421 Facilitating Document Annotation Using Content and Querying Value
 
Facilitating document annotation using content and querying value
Facilitating document annotation using content and querying valueFacilitating document annotation using content and querying value
Facilitating document annotation using content and querying value
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
facilitating document annotation using content and querying value
facilitating document annotation using content and querying valuefacilitating document annotation using content and querying value
facilitating document annotation using content and querying value
 
Cedar Data Lake
Cedar Data LakeCedar Data Lake
Cedar Data Lake
 
Metadata
MetadataMetadata
Metadata
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Mis chapter 5
Mis chapter 5Mis chapter 5
Mis chapter 5
 
Mysql certification in chennai
Mysql certification in chennaiMysql certification in chennai
Mysql certification in chennai
 
Data mining
Data miningData mining
Data mining
 
A Comparison between Relational Databases and NoSQL Databases
A Comparison between Relational Databases and NoSQL DatabasesA Comparison between Relational Databases and NoSQL Databases
A Comparison between Relational Databases and NoSQL Databases
 
Cs437 lecture 14_15
Cs437 lecture 14_15Cs437 lecture 14_15
Cs437 lecture 14_15
 
IT6701-Information management question bank
IT6701-Information management question bankIT6701-Information management question bank
IT6701-Information management question bank
 
Applied systems
Applied systemsApplied systems
Applied systems
 
Data Analytics | How it Works
Data Analytics | How it WorksData Analytics | How it Works
Data Analytics | How it Works
 
hbase lab
hbase labhbase lab
hbase lab
 

Viewers also liked (9)

Enterprise Osgi
Enterprise OsgiEnterprise Osgi
Enterprise Osgi
 
Зачем нужна Scala?
Зачем нужна Scala?Зачем нужна Scala?
Зачем нужна Scala?
 
Никита Вельмаскин - Интерпретатор или думаем над скриптовым движком для Ваше...
Никита Вельмаскин -  Интерпретатор или думаем над скриптовым движком для Ваше...Никита Вельмаскин -  Интерпретатор или думаем над скриптовым движком для Ваше...
Никита Вельмаскин - Интерпретатор или думаем над скриптовым движком для Ваше...
 
Pragmatic Real-World Scala
Pragmatic Real-World ScalaPragmatic Real-World Scala
Pragmatic Real-World Scala
 
Scale up your thinking
Scale up your thinkingScale up your thinking
Scale up your thinking
 
All about scala
All about scalaAll about scala
All about scala
 
Concurrency in Scala - the Akka way
Concurrency in Scala - the Akka wayConcurrency in Scala - the Akka way
Concurrency in Scala - the Akka way
 
HTML5 with Play Scala, CoffeeScript and Jade - UberConf 2012
HTML5 with Play Scala, CoffeeScript and Jade - UberConf 2012HTML5 with Play Scala, CoffeeScript and Jade - UberConf 2012
HTML5 with Play Scala, CoffeeScript and Jade - UberConf 2012
 
Scala at HUJI PL Seminar 2008
Scala at HUJI PL Seminar 2008Scala at HUJI PL Seminar 2008
Scala at HUJI PL Seminar 2008
 

Similar to Health Db Primer

Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
Scott Gray
 

Similar to Health Db Primer (20)

paper
paperpaper
paper
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
SAP BI/BW
SAP BI/BWSAP BI/BW
SAP BI/BW
 
Database Management Systems (Mcom Ecommerce)
Database Management Systems (Mcom Ecommerce)Database Management Systems (Mcom Ecommerce)
Database Management Systems (Mcom Ecommerce)
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Database Performance Management in Cloud
Database Performance Management in CloudDatabase Performance Management in Cloud
Database Performance Management in Cloud
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
A N S I S P A R C Architecture
A N S I  S P A R C  ArchitectureA N S I  S P A R C  Architecture
A N S I S P A R C Architecture
 
Data Base
Data BaseData Base
Data Base
 
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
 
Database Management Systems
Database Management SystemsDatabase Management Systems
Database Management Systems
 
Types of Databases.pptx
Types of Databases.pptxTypes of Databases.pptx
Types of Databases.pptx
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
 
Bigdata
BigdataBigdata
Bigdata
 
Librarymanagement 140315062611-phpapp02
Librarymanagement 140315062611-phpapp02Librarymanagement 140315062611-phpapp02
Librarymanagement 140315062611-phpapp02
 
Librarymanagement 140315062611-phpapp02
Librarymanagement 140315062611-phpapp02Librarymanagement 140315062611-phpapp02
Librarymanagement 140315062611-phpapp02
 
Library management
Library managementLibrary management
Library management
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Health Db Primer

  • 1. healthDB: A Primer Parag Patel, Shahid Shah Overview healthDB is a incrementally scalable, fault-tolerant, ACID compliant, key/value document based database designed to hold huge amounts of data and has high throughput read/writes and high availability. It is based on an open source project called couchDB. It is designed to be a data warehouse for the disparate systems that might be part of a healthcare practice or hospital. Due to it!s semi-structured data storage nature, it can hold data of any type. The end user need not worry about structuring the data in the data warehouse; the data will be stored in the warehouse for future extraction and structuring as the user sees fit. Future versions of healthDB will help the end user structure data from the semi-structured state it is in. Conceptually one can think of lazy evaluation in scheme, lisp, haskell. Once the user knows the structure they want to put the data in, it will be a cinch to implement the structure in healthDB. The design of database encompasses a “just works” philosophy. The database should work as advertised. The end user should only have to worry about building their application or service, instead of worrying about the storage of there data and performance. Most of the traditional work that a DBA has done will be done by healthDB. All the end user has to do is start it up initially and add additional servers as the healthDB will dictate in order scale. HealthDB will have a connector engine, that will connect to common interfaces such as HL7, JMS, ODBC, various delimited file formats, and has the ability to develop custom connectors to connect to unusual interfaces. HealthDB will support in the future health query language (HQL) (as an external or internal component tbd), will allow them to search all their structured and semi- structured data to find knowledge they seek in a health domain. HealthDB will come with some sample applications to show end users just the power it holds. Architecture healthDB uses couchDB to primarily take care of the low level storage. It communicates to couchDB (couchDB might need to be modified for encryption) using encrypted REST. A diagram shows the basic outline of healthDB.
  • 2. healthDB healthDB engine couchDB The healthDB engine is the main control unit of the healthDB. It has a job of ensuring the user can store data in a seamless fashion. It takes care of such task as automatic partitioning, replication, encryption of the data, automatic load balancing, automatic system backup, error logging. healthDB engine The healthDB engine is made of up various components such as the partitioner, replicator, connector engine, healthCPU, security, and healthDB API (healthSearch will be additional component, it is undetermined whether it should sit in the healthDB engine or couchdB. We shall look at each component of the engine briefly. Note: additional components maybe added, components maybe merged or deleted. healthDB API Provides the healthDB interface to the outside world. It will be the only way to communicate with the database, Multiple API should be developed such as python, ruby, java, C#, REST. connector engine This connector engine allows data from a variety of different formats to be converted to a format that healthDB can understand while preserving integrity.
  • 3. healthCPU This is the brain of the healthDB database. It controls when the healthDB should replicate data and when it should partition data. It does the job of the looking up data in the datastore (couchDB), formatting, structuring, and semi-structuring data that will be stored in the datastore. It ensures that data HIPPA compliant, by having he security component encrypt it. HealthCPU also maintains which nodes are alive and what the status is. It does the job of load balancing. Filters out data based on the users permissions. security This performs the encryption, authentication, and tells the healthCPU the user has permission to certain data or not. replicator Creates a new database replication based on what the healthCPU tells it. partitioner Creates new partitions on the data and places the data on server(s) the healthCPU specifies. Diagram of the healthDB engine below. healthDB engine healthDB API connector engine healthCPU security replicator partitioner
  • 4. Storage Structure The healthCPU will store unstructured data as follows. It will have a series of documents that keep track of data from various sources. Each source will have its own document(s). The document will contain (key,values) for (hash(document_sourcesystem_objectID),document_sourcesystem_objectID). A record from a source system will be store in its own separate document which will have system values such as last modified date, and the actual data itself. The record will be called a DBobject. The document name will be used to identify the DBobject. Other entities like DBobject can be created. We might have a person entity, which would be identified by document_person_personID. Very similar to the DBobject concept in which a series of documents contain references or indexes to the actual records.