Evaluating a row-store data model for full-content dicom management

Evaluating a row-store data model for
full-content DICOM management
Alexandre Savaris, Theo Härder, Aldo von Wangenheim
University of Kaiserslautern – Dept. of Computer Science – Kaiserslautern – Germany
Federal University of Paraná (UFPR) – Dept. of Informatics – Curitiba – PR – Brazil
National Institute for Digital Convergence (INCoD) – Florianópolis – SC – Brazil

Evaluating a row-store data model for full-content DICOM management 2 / 28

DICOM content is:
• Structured at tag level
– Group/element ordered pair
– VR (Value Representation)
– VM (Value Multiplicity)
Modality
– (0008,0060)
– CS (Code String): 16 bytes maximum, accepting
uppercase characters, “0”-”9”, the SPACE character,
and underscore (“_”)
– 1 (a single value per tag)

DICOM content is:
• Semi-structured at image level
– Tags are known at the evaluation (parsing) time
– The number/combination of tags varies according
to the data available at the examination time
to the examination modality
to the equipment manufacturer

DICOM content is:
Metadata + image
Metadata
Metadata
Metadata Patient
Study
Series
Image Image
Series
Image
Study
Series
Image

Storage in File Systems
(0010,0020) PatientID

(0020,000D) StudyInstanceUID

(0020,000E) SeriesInstanceUID

(0008,0018) SOPInstanceUID

+ Easy to organize and deploy
+ Easy to distribute over the network
+ Mounting points using NFS, for example
- Restrictive for query/retrieval
- Only the hierarchical level IDs are known without
file content evaluation
- Lack of indexes built over tag values

Storage in RDBMSs

Storage in RDBMSs
+ Easy to map the DICOM hierarchy into a set
of relations/relationships
+ Use of SQL for maintenance
+ Performance boost through indexes
- Need of a predefined DB schema
- Usually, composed by a restricted number of tags
- Scalability is “unnatural”
- Works well for single-node instances
- Multi-node instances are possible, but demand
considerable administrative efforts

What about NoSQL?

What about NoSQL?
• Native scalability
• Configurable partitioning/replication
• Loose constraints when compared to the
relational model (e.g., schemas, foreign
keys, referential integrity)
• Projected to work in the “huge” level
– Huge volumes of data, huge number of users, …

Two questions to be answered
1. Is it possible to manage full-content DICOM
images at tag level, using a data model built
over a row-store, NoSQL database?
2. Despite its close relationship with big
volumes of data, does a row-store, NoSQL
database perform well in scenarios of small
datasets when compared to known
approaches, i.e., relational databases?

NoSQL: partitioned row-stores

Experimental data model

Experimental setup and datasets
Setup Processor Memory Storage
Operating
System
Stand-alone
Intel® CoreTM i7
- 2,7GHz
4GB DDR3 500GB SATA
OS X
10.8.3
Cluster
Node 1
Intel® Xeon®
X3440 - 2,53GHz
(x8) (shared
through
virtualization)
4GB DDR3
(per node)
859GB SATA
(per node)
Ubuntu
10.04.1
Node 2
Node 3
Node 4
Node 5
Examination modality
Tags per
file
(average)
Average size per file (bytes) Size on
disk
(MB)Metadata tags Image tags
Computed Radiography (CR) 80 802 2278594 14
X-Ray Angiography (XA) 120 5662 1442097 83
Secondary Capture (SC) 64 932 168897 151
Positron Emission Tomography (PET) 161 3085 16211 111
Magnetic Resonance (MR) 159 2704 72006 363
Computed Tomography (CT) 132 3888 109054 3272

Results – Storage

• Results include the time needed to parse/extract
individual tags from image files
• Storage time is derived from a combination of two
characteristics:
– The dataset size
– The file content complexity
• SA = 89.8% faster than CL (in cumulative query time)
– Communication and replication issues
• In CL, parallel writes are a solution to performance
improvement
– Speedup of 77.9% when compared to single writers
Results – Storage

Results – Query

• Queries are executed by hierarchical level,
selecting values from tags related to each
level
• The row-store performs better when:
– There is high selectivity (the image level)
– The number of selected tags is minimal (the series
level)
• In general, RDBMS outperforms row-store
– 8.9% faster than SA
– 19.2% faster than CL
Results – Query

Results – Retrieval

• Retrieval operations are executed by
hierarchical level, returning sets of full-content
(metadata + pixel data) images
• Retrieval time decreases as selectivity increases
– CL setup for the row-store performs better than SA setup
– Partition by patientid contributes in routing retrieval
operations to single nodes
• In general, RDBMS outperforms row-store
– 81.7% faster than SA
– 83.2% faster than CL
Results – Retrieval

Conclusions
1. Is it possible to manage full-content DICOM images at tag
level, using a data model built over a row-store, NoSQL
database?
– Yes. Row-stores are flexible enough to manage combinations of
DICOM tags in a consistent way.
2. Despite its close relationship with big volumes of data,
does a row-store, NoSQL database perform well in
scenarios of small datasets when compared to known
approaches, i.e., relational databases?
– According to the experiments performed in this work, no. The
row-store setups were outperformed by the RDBMS in the
overall evaluation.
– Other data models, however, can be better suited for the task.

Evaluating a row-store data model for
full-content DICOM management

Evaluating a row-store data model for full-content dicom management

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Evaluating a row-store data model for full-content dicom management

Semelhante a Evaluating a row-store data model for full-content dicom management (20)

Último

Último (20)

Evaluating a row-store data model for full-content dicom management