SlideShare a Scribd company logo
1 of 37
Visual Tools for Queries and
Display of Quantitative Information
  in a Cancer Research Database

     JESSE STEWART and JERZY W. JAROMCZYK
         Department of Computer Science
        University of Kentucky, Lexington KY
The Kentucky Cancer Registry

• The Markey Cancer has the singular mission to eliminate the
  morbidity and mortality of cancer
• Since its founding, the Markey Cancer Center and the UK
  Chandler hospital have served 2000-2200 new patients a
  year and is one of the few institutions nationwide that
  address both clinical care as well as cancer research.
• The KCR’s case count exceeded 30,000 annually as of 2009
• The KCR houses a wealth of historical data for hundreds of
  cancer variants, associated treatments, and their relative
  success across the state of Kentucky.
Data Collection


Patient    Abstracting   Internet   Registry DB
Events     CPDMS.NET     HTTPS        MySQL
Cancer Abstracts
• A cancer abstract contains up to 240 different
  elements ranging from patient demographics
  to staging information to therapy history
• KCR alone stores tens of thousands of unique
  abstracts
• Each abstract is created by a registrar, a
  professional trained to understand cancer
  data standards, formats and coding rules
Accelerating Cancer Research



                        Discover
Develop   Visualize    Important
Queries   Data Sets   Correlations
Registry Databases and Research
                 Valuable Information
                 •Survival Trends
                 •Incidence Rates
                 •Behavioral and
                 Geographical Correlation

    Challenges in Research
    •Coded Data
    •SQL
    •Complex DB Schemas
    •Access Control
    •Visualization
Software Solutions

• Define Queries (Data Sets)
  –   Intuitive: no programming required
  –   Flexible: allow any data set to be explored
  –   Accessible: Visual cross-browser application
  –   Re-use: Save, modify and combine Data Sets
• Data Analysis and Visualization:
  – Context-specific diagrams
  – Compare data sets singularly or side-by-side
  – Customizable appearance
The Query Builder
• Presents a high-level abstraction of the
  Registry Database
• Patient, Case, Therapy data variables are
  easily recognizable and categorized
• Separates the user from the actual database
  structure and coded information
  – Example: Treatment is encoded as:
     • No Treatment=0, Treatment=1, Surveillance=2
The Query Builder

• Translates a question about cancer data into
  SQL (Structured Query Language) which can be
  understood by the computer system
• Parses and stores the query for modification and
  reuse later
Example Query
• Patients diagnosed between Jan 1, 2005 and Dec
  31, 2008
• Patients diagnosed in Kentucky
• Patients treated with immunotherapy

• SQL may be complex

case_data.diagdate >= 20050101 and case_data.diagdate <= 20081231 and
   case_tx.txtype = ‘I’ and case_data.diagstate = ‘KY’ from case_data, case_tx
   where case_tx.hospkey = case_data.hospkey and case_tx.patkey =
   case_data.patkey and case_data.incomplete = 0;
Interface Design
• To make writing a query like the previous
  example simple, the Query Builder must
  provide intuitive controls permitting a user to
  define each query component
• Variable names and coded values should be
  descriptive and easy to locate
• Conditions should be combined in a natural
  way with Boolean operators
• Tree-like layout chosen to represent queries
Query Builder in Action
Custom UI Controls
• For each variable, DB schema information is
  used to display a customized UI control, eg:
  – Dates: date fields or ranges
  – Discrete variables: drop-down list or multiselect
  – Variable with many values: autofill field
Syntax Tree
Internal Representation
• Program maintains an abstract syntax tree for
  the query as it is created
• Captures the essential structure of the query
  but omits SQL-specific syntax
• This data structure serves as an intermediary
  between the interface and the database
  system
• Permits two code-generation targets: JSON
  and SQL
Serialization and Storage
• Each query once created by the user may be
  saved for future analysis or manipulation
• The program stores the AST for the query as a
  JavaScript object, which can then be serialized
  into JSON (JavaScript Object Notation) and
  then stored.
• Deserialization and conversion to SQL is
  performed later for analysis
Query Management
Query Storage
• Queries are often referred to as ‘study groups’
  by researchers
• The serialized queries and associated
  metadata is stored in a database table

   study_groups: id | Name | Query| User | LastModified | LastUsed


• MySQL database was chosen for convenience since
  registry data is stored using this system
Visualization Tools
– Scaled Venn Diagrams
   • User can quickly ascertain relative size of data sets and
     their relationship to one another
– Bar and Histogram Charts
   • Flexible view of variable distribution for different sets
– Survival Trends
   • View and compare survival rates over time
– Statistics
   • Common descriptive statistics
   • Comparison with Chi-square, Log rank, T-, Z-tests
Venn Diagrams
• Venn diagrams show logical relationships
  between a number of sets
• Subset of Euler diagrams – all possible subsets
  must be displayed
• Can quickly convey how data sets overlap and
  relate to one another
Area Proportionality
• Area-proportional venn diagrams show the
  relative size of datasets and their intersections
• Very useful for rapid exploration of data sets such
  as cancer data
• Although typical venn digrams often display 3
  sets, area-proportional diagrams cannot always
  be drawn with circles for more than 2 sets [1]
• The vast majority of research needs involve
  comparison of two data sets
Drawing To Scale




                            Circle-intersection problem
Triangle(C1,C2,A) = Triangle(C1,C2,B)
Triangle(C1,C2,A) + Triangle(C1,C2,B) + Lens = Sector(C1,A,B) + Sector(C2,A,B)
Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A)
Drawing to Scale
         Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A)

• By applying formulas for the area of a circular sector and triangle, we
  arrive at this result for the distance between the circles’ centers:




• The value must be approximated, to do so the Root-bisection method was
  used in implementation.
Visualization: Venn Diagrams
Visualization: Venn Diagrams
Reports
• Several customizable reports were
  implemented to further leverage the query
  builder’s utility.
• Each is implemented in PHP, and produces an
  SQL query using the saved criteria and the
  settings selected by the user for the report
Data List Tool
Cross-Tab Analysis
Graph Settings Interface
Visualization: Histogram
Visualization: Survival Trends
Chi-square Analysis
Censored Life Table
Success
• The Visual Query Builder and Data Analysis tools have
  become an integral part of CPDMS.NET – the online
  abstracting system developed at the KCR.
• Over 5000 study groups have been created by users of
  the system.
• Features have been added and improved resulting
  from feedback given by researchers and registrars
  (cancer data professionals).
• Future developments may include:
   – Wider array of statistical tests
   – Functions to analyze more than two data sets at once
References
•   The Kentucky Cancer Registry – A History
•   http://www.kcr.uky.edu/about.php

•   F. Ruskey and M. Weston – A Survey of Venn Diagrams
•   http://www.combinatorics.org/Surveys/ds5/VennEJC.html

•   S. Chow and F. Ruskey, Drawing Area-Proportional Venn and Euler Diagrams

•   Circle-Circle Intersection Problem
•   http://mathworld.wolfram.com/Circle-CircleIntersection.html
Acknowledgements


  Eric Durbin, Kentucky Cancer Registry

Dr. Jerzy Jaromczyk, UK Computer Science
Software

More Related Content

Viewers also liked

Compositional rules slideshare
Compositional rules slideshareCompositional rules slideshare
Compositional rules slideshareheyitsjulia
 
Agapito letras sonidos1
Agapito letras sonidos1Agapito letras sonidos1
Agapito letras sonidos1Liliana Rangel
 
Valconao
ValconaoValconao
ValconaoNavrit
 
What is Cloud Security, and Can I Have Some?
What is Cloud Security, and Can I Have Some?What is Cloud Security, and Can I Have Some?
What is Cloud Security, and Can I Have Some?John Kinsella
 
Kermode bear
Kermode bearKermode bear
Kermode bearNavrit
 
Compositional Rules
Compositional RulesCompositional Rules
Compositional Rulesheyitsjulia
 
C C N A Tieng Viet
C C N A Tieng  VietC C N A Tieng  Viet
C C N A Tieng VietHuy Le
 
2011 11 05 05 ifrc presentation 112011-2
2011 11 05 05 ifrc presentation 112011-22011 11 05 05 ifrc presentation 112011-2
2011 11 05 05 ifrc presentation 112011-2John Stringer
 
Visual tools for databade queries and analysis
Visual tools for databade queries and analysisVisual tools for databade queries and analysis
Visual tools for databade queries and analysismoochm
 
Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...
Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...
Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...John Kinsella
 
Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...
Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...
Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...Office de Tourisme Luberon Durance
 
Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...
Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...
Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...Office de Tourisme Luberon Durance
 
Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...
Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...
Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...Office de Tourisme Luberon Durance
 
Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...
Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...
Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...Office de Tourisme Luberon Durance
 

Viewers also liked (20)

Practica en hospital almenara (1)
Practica en hospital almenara (1)Practica en hospital almenara (1)
Practica en hospital almenara (1)
 
Compositional rules slideshare
Compositional rules slideshareCompositional rules slideshare
Compositional rules slideshare
 
Agapito letras sonidos1
Agapito letras sonidos1Agapito letras sonidos1
Agapito letras sonidos1
 
Valconao
ValconaoValconao
Valconao
 
Test
TestTest
Test
 
What is Cloud Security, and Can I Have Some?
What is Cloud Security, and Can I Have Some?What is Cloud Security, and Can I Have Some?
What is Cloud Security, and Can I Have Some?
 
Kermode bear
Kermode bearKermode bear
Kermode bear
 
Compositional Rules
Compositional RulesCompositional Rules
Compositional Rules
 
CloudStack Secured
CloudStack SecuredCloudStack Secured
CloudStack Secured
 
Présentation du CBE Ludovic LAFFITTE OT Luberon Durance
Présentation du CBE Ludovic LAFFITTE OT Luberon DurancePrésentation du CBE Ludovic LAFFITTE OT Luberon Durance
Présentation du CBE Ludovic LAFFITTE OT Luberon Durance
 
C C N A Tieng Viet
C C N A Tieng  VietC C N A Tieng  Viet
C C N A Tieng Viet
 
2011 11 05 05 ifrc presentation 112011-2
2011 11 05 05 ifrc presentation 112011-22011 11 05 05 ifrc presentation 112011-2
2011 11 05 05 ifrc presentation 112011-2
 
Visual tools for databade queries and analysis
Visual tools for databade queries and analysisVisual tools for databade queries and analysis
Visual tools for databade queries and analysis
 
Test
TestTest
Test
 
Test
TestTest
Test
 
Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...
Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...
Truly Secure: The Steps a Security Practitioner Took to Build a Secure Public...
 
Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...
Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...
Atelier numérique n°2 de l'Office de tourisme Luberon Durance : Facebook comm...
 
Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...
Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...
Atelier numérique n°4 de l'Office de tourisme Luberon Durance: Mailchimp / Em...
 
Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...
Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...
Atelier numérique n°5 de l'Office de tourisme Luberon Durance: L'importance d...
 
Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...
Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...
Atelier numérique n°1 de l'Office de tourisme Luberon Durance. Où en êtes vou...
 

Similar to Visual tools for databade queries and analysis

Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Lokukaluge Prasad Perera
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataJoel Saltz
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
Data base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somyaData base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somyaMukesh Jaiswal
 
Database Systems(DBS) Or DATABASE MANAGEMENT SYSTEM
Database Systems(DBS) Or DATABASE MANAGEMENT SYSTEMDatabase Systems(DBS) Or DATABASE MANAGEMENT SYSTEM
Database Systems(DBS) Or DATABASE MANAGEMENT SYSTEMmoronfolabukunmi
 
CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)Dilawar Khan
 
Iscram 2008 presentation
Iscram 2008 presentationIscram 2008 presentation
Iscram 2008 presentationbdemchak
 
dbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodungadbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodungaVaradKadtan1
 
Utsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnetUtsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnetUtsav Mahendra
 
Rich Feeds for RESCUE and PALMS
Rich Feeds for RESCUE and PALMSRich Feeds for RESCUE and PALMS
Rich Feeds for RESCUE and PALMSbdemchak
 
Cp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkCp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkDr Geetha Mohan
 
Introduction to Database Management Systems (DBMS)
Introduction to Database Management Systems (DBMS)Introduction to Database Management Systems (DBMS)
Introduction to Database Management Systems (DBMS)Vijayananda Ratnam Ch
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...Institute of Information Systems (HES-SO)
 
9a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc29a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc2Mukund Trivedi
 

Similar to Visual tools for databade queries and analysis (20)

Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor Data
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Data base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somyaData base and data entry presentation by mj n somya
Data base and data entry presentation by mj n somya
 
Database Systems(DBS) Or DATABASE MANAGEMENT SYSTEM
Database Systems(DBS) Or DATABASE MANAGEMENT SYSTEMDatabase Systems(DBS) Or DATABASE MANAGEMENT SYSTEM
Database Systems(DBS) Or DATABASE MANAGEMENT SYSTEM
 
CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)CS3270 - DATABASE SYSTEM - Lecture (2)
CS3270 - DATABASE SYSTEM - Lecture (2)
 
Iscram 2008 presentation
Iscram 2008 presentationIscram 2008 presentation
Iscram 2008 presentation
 
dbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodungadbms Unit 1.pdf arey bhai teri maa chodunga
dbms Unit 1.pdf arey bhai teri maa chodunga
 
CINET: A CyberInfrastructure for Network Science
CINET: A CyberInfrastructure for Network ScienceCINET: A CyberInfrastructure for Network Science
CINET: A CyberInfrastructure for Network Science
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Dbms rlde.ppt
Dbms rlde.pptDbms rlde.ppt
Dbms rlde.ppt
 
Utsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnetUtsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnet
 
Rich Feeds for RESCUE and PALMS
Rich Feeds for RESCUE and PALMSRich Feeds for RESCUE and PALMS
Rich Feeds for RESCUE and PALMS
 
Cp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkCp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -network
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
Introduction to Database Management Systems (DBMS)
Introduction to Database Management Systems (DBMS)Introduction to Database Management Systems (DBMS)
Introduction to Database Management Systems (DBMS)
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
 
Unit 1 dbms
Unit 1 dbmsUnit 1 dbms
Unit 1 dbms
 
9a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc29a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc2
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 

Recently uploaded

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Visual tools for databade queries and analysis

  • 1. Visual Tools for Queries and Display of Quantitative Information in a Cancer Research Database JESSE STEWART and JERZY W. JAROMCZYK Department of Computer Science University of Kentucky, Lexington KY
  • 2. The Kentucky Cancer Registry • The Markey Cancer has the singular mission to eliminate the morbidity and mortality of cancer • Since its founding, the Markey Cancer Center and the UK Chandler hospital have served 2000-2200 new patients a year and is one of the few institutions nationwide that address both clinical care as well as cancer research. • The KCR’s case count exceeded 30,000 annually as of 2009 • The KCR houses a wealth of historical data for hundreds of cancer variants, associated treatments, and their relative success across the state of Kentucky.
  • 3. Data Collection Patient Abstracting Internet Registry DB Events CPDMS.NET HTTPS MySQL
  • 4. Cancer Abstracts • A cancer abstract contains up to 240 different elements ranging from patient demographics to staging information to therapy history • KCR alone stores tens of thousands of unique abstracts • Each abstract is created by a registrar, a professional trained to understand cancer data standards, formats and coding rules
  • 5. Accelerating Cancer Research Discover Develop Visualize Important Queries Data Sets Correlations
  • 6. Registry Databases and Research Valuable Information •Survival Trends •Incidence Rates •Behavioral and Geographical Correlation Challenges in Research •Coded Data •SQL •Complex DB Schemas •Access Control •Visualization
  • 7. Software Solutions • Define Queries (Data Sets) – Intuitive: no programming required – Flexible: allow any data set to be explored – Accessible: Visual cross-browser application – Re-use: Save, modify and combine Data Sets • Data Analysis and Visualization: – Context-specific diagrams – Compare data sets singularly or side-by-side – Customizable appearance
  • 8. The Query Builder • Presents a high-level abstraction of the Registry Database • Patient, Case, Therapy data variables are easily recognizable and categorized • Separates the user from the actual database structure and coded information – Example: Treatment is encoded as: • No Treatment=0, Treatment=1, Surveillance=2
  • 9. The Query Builder • Translates a question about cancer data into SQL (Structured Query Language) which can be understood by the computer system • Parses and stores the query for modification and reuse later
  • 10. Example Query • Patients diagnosed between Jan 1, 2005 and Dec 31, 2008 • Patients diagnosed in Kentucky • Patients treated with immunotherapy • SQL may be complex case_data.diagdate >= 20050101 and case_data.diagdate <= 20081231 and case_tx.txtype = ‘I’ and case_data.diagstate = ‘KY’ from case_data, case_tx where case_tx.hospkey = case_data.hospkey and case_tx.patkey = case_data.patkey and case_data.incomplete = 0;
  • 11. Interface Design • To make writing a query like the previous example simple, the Query Builder must provide intuitive controls permitting a user to define each query component • Variable names and coded values should be descriptive and easy to locate • Conditions should be combined in a natural way with Boolean operators • Tree-like layout chosen to represent queries
  • 13. Custom UI Controls • For each variable, DB schema information is used to display a customized UI control, eg: – Dates: date fields or ranges – Discrete variables: drop-down list or multiselect – Variable with many values: autofill field
  • 15. Internal Representation • Program maintains an abstract syntax tree for the query as it is created • Captures the essential structure of the query but omits SQL-specific syntax • This data structure serves as an intermediary between the interface and the database system • Permits two code-generation targets: JSON and SQL
  • 16. Serialization and Storage • Each query once created by the user may be saved for future analysis or manipulation • The program stores the AST for the query as a JavaScript object, which can then be serialized into JSON (JavaScript Object Notation) and then stored. • Deserialization and conversion to SQL is performed later for analysis
  • 18. Query Storage • Queries are often referred to as ‘study groups’ by researchers • The serialized queries and associated metadata is stored in a database table study_groups: id | Name | Query| User | LastModified | LastUsed • MySQL database was chosen for convenience since registry data is stored using this system
  • 19. Visualization Tools – Scaled Venn Diagrams • User can quickly ascertain relative size of data sets and their relationship to one another – Bar and Histogram Charts • Flexible view of variable distribution for different sets – Survival Trends • View and compare survival rates over time – Statistics • Common descriptive statistics • Comparison with Chi-square, Log rank, T-, Z-tests
  • 20. Venn Diagrams • Venn diagrams show logical relationships between a number of sets • Subset of Euler diagrams – all possible subsets must be displayed • Can quickly convey how data sets overlap and relate to one another
  • 21. Area Proportionality • Area-proportional venn diagrams show the relative size of datasets and their intersections • Very useful for rapid exploration of data sets such as cancer data • Although typical venn digrams often display 3 sets, area-proportional diagrams cannot always be drawn with circles for more than 2 sets [1] • The vast majority of research needs involve comparison of two data sets
  • 22. Drawing To Scale Circle-intersection problem Triangle(C1,C2,A) = Triangle(C1,C2,B) Triangle(C1,C2,A) + Triangle(C1,C2,B) + Lens = Sector(C1,A,B) + Sector(C2,A,B) Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A)
  • 23. Drawing to Scale Lens = Sector(C1,A,B) + Sector(C2,A,B) - 2*Triangle(C1,C2,A) • By applying formulas for the area of a circular sector and triangle, we arrive at this result for the distance between the circles’ centers: • The value must be approximated, to do so the Root-bisection method was used in implementation.
  • 26. Reports • Several customizable reports were implemented to further leverage the query builder’s utility. • Each is implemented in PHP, and produces an SQL query using the saved criteria and the settings selected by the user for the report
  • 34. Success • The Visual Query Builder and Data Analysis tools have become an integral part of CPDMS.NET – the online abstracting system developed at the KCR. • Over 5000 study groups have been created by users of the system. • Features have been added and improved resulting from feedback given by researchers and registrars (cancer data professionals). • Future developments may include: – Wider array of statistical tests – Functions to analyze more than two data sets at once
  • 35. References • The Kentucky Cancer Registry – A History • http://www.kcr.uky.edu/about.php • F. Ruskey and M. Weston – A Survey of Venn Diagrams • http://www.combinatorics.org/Surveys/ds5/VennEJC.html • S. Chow and F. Ruskey, Drawing Area-Proportional Venn and Euler Diagrams • Circle-Circle Intersection Problem • http://mathworld.wolfram.com/Circle-CircleIntersection.html
  • 36. Acknowledgements Eric Durbin, Kentucky Cancer Registry Dr. Jerzy Jaromczyk, UK Computer Science

Editor's Notes

  1. Patient Data is collected by Medical facilities across the state of KY.Abstractors read paper/electronic records and code the data as a cancer abstract according to standards.Abstracting is performed using the KCR’s custom CPDMS.NET reporting system.The abstract is transmitted across the internet and stored in the registry database.
  2. Take KCR’s data into something a computer can process and analyze quicklyCreate the tools for analysisDevelop useful ways to present the results of analysisPresent the information in a user friendly manner
  3. Many valuable statistics and trends are hidden in the registry database.Retrieving this information is an arduous task, especially for those without knowledge of SQL
  4. When this information can be analyzed and visualized, life-saving discoveries may be uncovered by research experts. Advancing the understanding of cancer and toward the development of new models and modes of intervention in malignant processes.Take this old mine of information and simplify it visually and numerically;It is hoped that this may help advance the understanding of cancer, and in turn help science fight one of its biggest battles: to better treat and prevent disease.
  5. The Query Builder tool aims to solve the aforementioned problems by providing a visual interface forconstructing database queries without the need to understand the underlying structure of the database orwrite formal SQL expressions.1) Provide access to important registry database objects including Patient, Case, and Therapy information.2) Provide a list of important attributes/fields associated with each object.3) Allow search criteria be entered with minimal effort, and no knowledge of SQL language.4) Show descriptive database field values where appropriate - in addition to or in lieu of coded values.a. Display an appropriate input field for different data types like dates, numbers, and lists.5) Allow the user to construct arbitrarily complex searches by adding as many criteria as needed tothe query.6) Support a set of Boolean operators: AND, OR, XOR, NOT - so search criteria can be joined invarious ways.7) Allow searches to be saved for later use.
  6. Direct interaction with the database system involves the use of a structured query language (SQL)used by most relational database systems. This includes operations like reading, adding, removing, andmodifying data stored by the system. Although this language is readable by humans, special understandingof the syntax and structure of an SQL statement is required for a user to “talk” to the database systemAnd find what he or she is looking for. This can at the very least be cumbersome or nearly impossiblefor those without much experience with programming languages or similar, especially when one tries todescribe a very specific data set.There are several factors that contribute to the disparity between a database language like SQL, and anatural language such as English, each reason of course being related to the way a computer stores andprocesses information in a digital form.Encoding of Each Attribute: helps reduce the database storage space required andincrease performance. Unfortunately the trade-off of is that any SQL statement describing such a recordmust use the coded version of the attribute data rather than a natural textual description. For example,a person’s assigned treatment could be encoded as No Treatment=0, Treatment=1, Surveillance=2. Normalize the Data: avoid duplicatin information and wasting storage space, records are often split up into multiple tablesand associated with one another.
  7. Each condition of the query can be entered with several mouse clicksThe conditions may be joined with Boolean operators AND, OR, etcEncoded values are shown with descriptive translationsThe Query Builder shows a data-type sensitive input for each variableSeparates researchers from data encoding
  8. Syntax Tree is generated from the query and stored in serialized form for later use.Once the user is satisfied with the query, it can be given a title and saved for analysis!
  9. Queries are saved indefinitely for later for each user account.Metadata showing the last modified and edited times are displayedStudy groups can be copied, deleted, edited or created from this interface
  10. Compare the survival distributions of two samples. Nonparametric test – used with data that is censoredUsed frequently in clinical trials applications