인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (하)

l m
q v
l m m
q z 2 S ee V
q
q
l x m% w m
q RURae gV VRc X
q t
q RURae gV VRc X ' RUgVcdRc R ReeRT
Issues

qB B m !KVURdjd" (),
q $
q $
q ()+ ><9t $ $ • (),
q )')( ! ((( gd )-(% (( "
q ‘ z
q 2 oB Bt

l
q 2 t
l t
q
qm t7n
qm t t7n
qm t y 7n
l

NCCN Guidelines Version 4.2014
Non-Small Cell Lung Cancer
NCCN Guidelines Index
NSCLC Table of Contents
Discussion
Version 4.2014, 06/05/14 © National Comprehensive Cancer Network, Inc. 2014, All rights reserved. The NCCN Guidelines®
and this illustration may not be reproduced in any form without the express written permission of NCCN®
.
Note: All recommendations are category 2A unless otherwise indicated.
Clinical Trials: NCCN believes that the best management of any cancer patient is in a clinical trial. Participation in clinical trials is especially encouraged.
NSCL-2
dT3, N0 related to size or satellite nodules.
fTesting is not listed in order of priority and is dependent upon clinical
circumstances, institutional processes, and judicious use of resources.
gMethods for evaluation include mediastinoscopy, mediastinotomy, EBUS, EUS,
and CT-guided biopsy.
hPositive PET/CT scan findings for distant disease need pathologic or other
radiologic confirmation. If PET/CT scan is positive in the mediastinum, lymph
node status needs pathologic confirmation.
iSee Principles of Surgical Therapy (NSCL-B).
jSee Principles of Radiation Therapy (NSCL-C).
kSee Chemotherapy Regimens for Neoadjuvant and Adjuvant Therapy (NSCL-D).
lExamples of high-risk factors may include poorly differentiated tumors (including
lung neuroendocrine tumors [excluding well-differentiated neuroendocrine tumors]),
vascular invasion, wedge resection, tumors >4 cm, visceral pleural involvement,
and incomplete lymph node sampling (Nx). These factors independently may not
be an indication and may be considered when determining treatment with adjuvant
chemotherapy.
mSee Chemotherapy Regimens Used with Radiation Therapy (NSCL-E).
CLINICAL ASSESSMENT PRETREATMENT EVALUATIONf INITIAL TREATMENT
Stage IA
(peripheral T1ab, N0)
Stage IB
(peripheral T2a, N0)
Stage I
(central T1ab–T2a, N0)
Stage II
(T1ab–2ab, N1; T2b, N0)
Stage IIB
(T3, N0)d
• PFTs (if not previously
done)
• Bronchoscopy
(intraoperative
preferred)
• Pathologic mediastinal
lymph node evaluationg
(category 2B)
• PET/CT scanh (if not
previously done)
• PFTs (if not previously
done)
• Bronchoscopy
• Pathologic mediastinal
lymph node evaluationg
• PET/CT scanh (if not
previously done)
• Brain MRI (Stage II,
Stage IB [category 2B])
Negative
mediastinal
nodes
Positive
mediastinal
nodes
Operable
Medically
inoperable
Negative
mediastinal
nodes
Positive
mediastinal
nodes
Operable
Medically
inoperable
Surgical exploration and
resectioni + mediastinal lymph
node dissection or systematic
lymph node sampling
Definitive RT including stereotactic
ablative radiotherapyj (SABR)
See Stage IIIA (NSCL-8) or Stage IIIB (NSCL-11)
Surgical exploration and
resectioni + mediastinal lymph
node dissection or systematic
lymph node sampling
N0
N1
See Stage IIIA (NSCL-8) or Stage IIIB (NSCL-11)
Definitive RT
including SABRj
Definitive chemoradiationj,m
See Adjuvant
Treatment (NSCL-3)
See Adjuvant
Treatment (NSCL-3)
Consider adjuvant
chemotherapyk
(category 2B) for
high-risk stages IB-IIl
Printed by yoon sup choi on 6/19/2014 8:23:15 PM. For personal use only. Not approved for distribution. Copyright © 2014 National Comprehensive Cancer Network, Inc., All Rights Reserved.

NCCN Guidelines Version 4.2014
Non-Small Cell Lung Cancer
NCCN Guidelines Index
NSCLC Table of Contents
Discussion
Version 4.2014, 06/05/14 © National Comprehensive Cancer Network, Inc. 2014, All rights reserved. The NCCN Guidelines®
and this illustration may not be reproduced in any form without the express written permission of NCCN®
.
Note: All recommendations are category 2A unless otherwise indicated.
Clinical Trials: NCCN believes that the best management of any cancer patient is in a clinical trial. Participation in clinical trials is especially encouraged.
NSCL-8
hPositive PET/CT scan findings for distant disease need pathologic or other
radiologic confirmation. If PET/CT scan is positive in the mediastinum, lymph
node status needs pathologic confirmation.
iSee Principles of Surgical Therapy (NSCL-B).
jSee Principles of Radiation Therapy (NSCL-C).
kSee Chemotherapy Regimens for Neoadjuvant and Adjuvant Therapy (NSCL-D).
mSee Chemotherapy Regimens Used with Radiation Therapy (NSCL-E).
nR0 = no residual tumor, R1 = microscopic residual tumor, R2 = macroscopic
residual tumor.
sPatients likely to receive adjuvant chemotherapy may be treated with induction
chemotherapy as an alternative.
MEDIASTINAL BIOPSY
FINDINGS
INITIAL TREATMENT ADJUVANT TREATMENT
T1-3, N0-1
(including T3
with multiple
nodules in
same lobe)
Surgeryi,s
Resectable
Medically
inoperable
Surgical resectioni
+ mediastinal lymph
node dissection or
systematic lymph
node sampling
See Treatment
according to clinical
stage (NSCL-2)
N0–1
N2
See NSCL-3
Margins
negative (R0)n
Sequential chemotherapyk
(category 1) + RTj
Margins
positiven
Surveillance
(NSCL-14)
R1n
R2n
Chemoradiationj
(sequentialk or concurrentm)
Surveillance
(NSCL-14)
Concurrent
chemoradiationj,m
Surveillance
(NSCL-14)
T1-2,
T3 (≥7 cm),
N2 nodes
positivei
• Brain MRI
• PET/CT
scan,h
if not
previously
done
Negative for
M1 disease
Positive
Definitive concurrent
chemoradiationj,m
(category 1)
or
Induction
chemotherapyk ± RTj
See Treatment for Metastasis
solitary site (NSCL-13) or
distant disease (NSCL-15)
No apparent
progression
Progression
Surgeryi ± chemotherapyk (category 2B)
± RTj (if not given)
RTj (if not given)
± chemotherapykLocal
Systemic
T3
(invasion),
N2 nodes
positive
• Brain MRI
• PET/CT
scan,h
if not
previously
done
Negative for
M1 disease
Positive
Definitive concurrent
chemoradiationj,m
Printed by yoon sup choi on 6/19/2014 8:23:15 PM. For personal use only. Not approved for distribution. Copyright © 2014 National Comprehensive Cancer Network, Inc., All Rights Reserved.

l %q
l  
l o o
q@f R e fTY
q $ w$ TRcV r  
l %
q
q s

Over the course of a career, an oncologist may impart bad news an average of 20,000 times,
but most practicing oncologists have never received any formal training to help them
prepare for such conversations.

High levels of empathy in primary care physicians correlate with  
better clinical outcomes for their patients with diabetes

We identiﬁed 384 empathic opportunities and found that physicians  
had responded empathically to 39 (10%) of them

In 398 conversations, the total empathic opportunities was 292.
When they occurred, oncologists responded with continuers 22% of the time.

h ‘ i p 3
q t )2 w &
q t 2 w &
q t +2 t &

h ‘ i p 3
• "시험 성적만 우수하면 의대에 들어오고 시험만 통과하면 전문
의 자격증을 취득할 수 있는 의사양성체계는 문제가 있다.”
• "의사를 업으로 삼으려면 서비스 마인드와 공감능력 및 의사소
통 분야에 뛰어나야 하는데 기질적으로 이와 맞지 않는 학생이
입학하게 되면 방황할 수 밖에 없다”
• "사실 환자와 많이 접촉하는 전공의 시절에 환자안전, 환자와
의 의사소통 등 의학 이외의 것에 대한 교육이 절실하지만 당
장 눈앞에 환자를 진료하기에 급급해 교육을 받지 못하고 있다"

l p ) y
q
q $ $
l p
)
0 p p
Hojat M et al. Acad Med. 2009
differences in empathy scores between
the two groups varied from a low of 0.05
(in year 0) to a maximum of 0.75 (in year
3). The effect size of the decline in empathy
from year 0 to year 3 was more than double
for those who chose technology-oriented
specialties (d ϭ 1.01) compared with their
counterparts in people-oriented specialties
(d ϭ 0.44).
Discussion
The results of this study showed a
significant decline in mean empathy
scores in th
The pattern
men and wo
pursued the
people-orie
specialties.
findings, ou
obtained a h
than men,11
people-orie
their counte
specialties.1
It is interest
magnitude
the effect si
men compa
those who p
careers com
in people-or
aforementio
with lower e
of medical s
interested in
specialties) l
medical sch
higher empa
pattern of fi
“at-risk” me
vulnerable t
What happ
medical sch
heart by wh
generates a
Figure 2 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of
medical school for 56 men and 65 women who identified themselves at all five administrations of
the JSPE (“matched cohort”) at Jefferson Medical College, Philadelphia, Pennsylvania, 2002–2008.
Empathy
matriculants entering in 2002. This is also
reflected in the total matched cohort.
Figure 1 shows a graphical presentation
of the changes in mean empathy scores
for the matched and unmatched cohorts.
As shown in the figure, the patterns of
changes are very similar in the matched
and unmatched cohorts.
Gender differences
We compared changes in empathy scores
during medical school for men (n ϭ 56)
and women (n ϭ 65) in the matched
cohort. Results are depicted in Figure 2.
As shown in the figure, women
consistently outscored men in every year
of medical school. Gender differences in
all of the test administrations were
statistically significant (P Ͻ .05, by t test).
As shown in Figure 2, although the
pattern of change in empathy scores for
women paralleled that of men, the effect
size estimates of these changes varied
from a low of 0.37 (in year 2) to a high of
0.79 (in year 3). The effect size of the
decline in empathy between year 0 and
year 3 was much larger for men (d ϭ
0.79) than for women (d ϭ 0.56).
Differences across specialties
Changes in empathy scores were
compared for 85 graduates in the
matched cohort who pursued their
residency training in “people-oriented”
specialties (e.g., family medicine,
internal medicine, pediatrics,
emergency medicine, psychiatry,
obstetrics–gynecology) and 36
who pursued their training in
“technology-oriented” specialties (e.g.,
anesthesiology, pathology, radiology,
surgery, orthopedic surgery, etc.).
Results appear in Figure 3. As shown in
the figure, those who pursued
people-oriented specialties consistently
scored higher in all years of medical
school than did their counterparts who
pursued technology-oriented
specialties. However, the difference in
empathy scores between the two groups
became statistically significant starting
from year 2 of medical school (P Ͻ .05,
by t test). The effect size estimates of
Figure 1 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of
medical school for the matched cohort (n ϭ 121), who identified themselves at all five
administrations of the JSPE, and the unmatched cohort (n ϭ 335) at Jefferson Medical College,
Philadelphia, Pennsylvania, 2002–2008.
medical school.
†
F(4,296) ϭ 14.4; P Ͻ .001.
‡
F(4,179) ϭ 11.7; P Ͻ .001.
¶
F(4,479) ϭ 25.5; P Ͻ .001.
Academic Medicine, Vol. 84, No. 9 / September 2009 1187

l
q t
q
q s  
l o ” w m
l x w

40
50
60
70
80
9 :
40
50
60
70
80
9   :  
69.5%
63%
49.5%
72.5%
57.5%
!
"
! "
b !
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
q 2 ((
q 92 !-(( "
q :2 ! ( "
q 2 !)0 $ , "
q 2 NMFG
o
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com

v ! "
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
4 5

o p m
q 2 ((
q 92 !-(( "
q :2 ! ( "
q 2 !)0 $ , "
q 2 NMFG

https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/

p ! o
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
l p s /($/ ..$+
q80>H2 >H 0z $ w
q>JG;2 >H )',$ )' $ )$ $ ,$ 0z w
q $ >H $ w t
l o p -) t &&
q v • •
q $ $ w z t

modeled separately. For micrometastases, sensitivity was signiﬁcantly higher with
Negative
(Specificity)
Micromet
(Sensitivity)
Macromet
(Sensitivity)
0.7
0.5
0.6
0.8
0.9
1.0
p=0.02
A B
Performance
Unassisted
Assisted
FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image
category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for
negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual
pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic
curve of the algorithm. AUC indicates area under the curve.
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5
• 민감도
• 인공지능을 사용한 경우 Micromet의 경우에 유의미하게 상승
• Negative와 Macromet은 유의미하지 않음
• AUC
• 병리학 전문의 혼자 or 인공지능 혼자보다,
• 병리학 전문의+인공지능이 조금 더 높음

isolated diagnostic tasks. Underlying these exciting advances,
however, is the important notion that these algorithms do not
replace the breadth and contextual knowledge of human
pathologists and that even the best algorithms would need to
from 83% to 91% and resulted in higher overall diagnostic
accuracy than that of either unassisted pathologist inter-
pretation or the computer algorithm alone. Although deep
learning algorithms have been credited with comparable
Unassisted Assisted
TimeofReview(seconds)
Timeofreviewperimage(seconds)
Negative ITC Micromet Macromet
A B
p=0.002
p=0.02
Unassisted
Assisted
Micrometastases
FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists
analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance.
Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance.
Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance
modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that
were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent
quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi-
crometastasis; macromet, macrometastasis.
8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.
• 판독 시간 (per image)
• 인공지능의 보조를 받으면, Negative와 Micromet 은 유의미하게 감소
• 특히, Micromet은 약 2분에서 1분으로 절반 가량 감소
• ITC(Isolated Tumor Cell)와 Negative는 유의미하지 않음

Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만
(p value) 의사+인공지능
의사 vs. 의사+인공지능
(p value)
영상의학과 1년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
•인공지능을 second reader로 활용하면 정확도가 개선
•classification: 17 of 18 명이 개선 (15 of 18, P<0.05)
•nodule detection: 18 of 18 명이 개선 (14 of 18, P<0.05)

Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
의사
인공지능 vs. 의사만
(p value) 의사+인공지능
의사 vs. 의사+인공지능
(p value)
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
인공지능 0.91 0.885
•“인공지능 혼자” 한 것이 “영상의학과 전문의+인공지능”보다 대부분 더 정확
•classification: 9명 중 6명보다 나음
•nodule detection: 9명 전원보다 나음

/ & /+& /,& /.&
l
l (
l
l
l
l
l (
l
l
l
l (
l
l
l (

j m m w $
u o $k
James Albaugh, Boeing, 2011

!U d X W eYV ;cVh"
w $
q.. “
q /+/
q
o

!U d X W eYV ;cVh"
w $
q.. “
q /+/
q
o
‘
o

http://dimg.donga.com/wps/NEWS/IMAGE/2017/06/19/84945511.1.edit.jpg
(출처: 동아일보)

l m
q v
l m m
q z 2 S ee V
q
q
l x m% w m
q t
q t $ $ t
q RURae gV VRc X ' RUgVcdRc R ReeRT
Issues

m m
qt w $ t
q t v t &
q5 Y ?UZQ0 o m $ 
q 7
q t t
q t

qt w $ t
q t v t &
q5 Y ?UZQ0 o m $ 
q 7
q t t
q t
m m

http://www.mobihealthnews.com/content/fda-issues-three-guidances-including-long-awaited-cds-guidelines
q;<K PQS QQ R TaYMZ UZb bQYQZ
q m 67F $ t

l(& . 974 7E m
l<7d 7E0 G O Z AJ &&
l s 7E
l m
l YU P 7E PQ QO U Z x m
l( YU P 7E w r (s v  
l
l &s /&&
l p m nn .-$ ./$+ =4 4
l974m PQ Z b QYM WQ QbUQc M TcMe

l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
qacV%dTcVV X2 9A$ eYV Yf R U Te c
qU fS V cVRU X2 9A Yf R U Te c
qU fS V TYVT !dVT U a "2 Yf R U Te c$
eYV 9A

Assisting Pathologists in Detecting Cancer
with Deep Learning
윤곽선, 색상, heat map 등으로 표시 질병의 유무, 질병의 중증도 등을 제시

THEBLACKBOX
2 0 | N A T U R E | V O L 5 3 8 | 6 O C T O B E R 2 0 1 6
THEBLACKBOX OFAI

Animal Intelligence: Clever Hans

Animal Intelligence: Clever Hans
• Clever Hans was an horse that was claimed to have been able to perform arithmetic.
• After a formal investigation in 1907, psychologist Oskar Pfungst demonstrated that the horse was  
 
not actually performing these mental tasks, but was watching the reactions of his human observers.
• The trainer was entirely unaware that he was providing such cues.

https://namkugkim.wordpress.com/2017/05/15/
“실제로 서울아산병원 흉부 X-ray를 학습시켰을때, 다른 질환과
다른게 Cardiomegaly (심장 비대)의 경우 학습결과는 좋았으나 전
혀 다른 것을 학습한다는 것을 이것을 통해 알게 되었다.
Cardiomegaly는 심장이 커지는 것을 X-ray로 진단하는 것인데, 딥
러닝이 실제로 심장의 크기를 보는 것이 아니라, 이런 질환을 가진
환자의 X-ray에 있는 특징인 수술자국을 보고 있다는 것을 CAM으
로 확인하였다."
- 서울아산병원 김남국 교수님 블로그

l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
q Q O QQZUZS2 9A s eYV Yf R U Te c
qP aN Q QMPUZS2 9A Yf R U Te c
qP aN Q OTQOW2 Yf R U Te c s 9A

The new engl and jour nal of medicine
original article
Single Reading with Computer-Aided
Detection for Screening Mammography
Fiona J. Gilbert, F.R.C.R., Susan M. Astley, Ph.D., Maureen G.C. Gillan, Ph.D.,
Olorunsola F. Agbaje, Ph.D., Matthew G. Wallis, F.R.C.R.,
Jonathan James, F.R.C.R., Caroline R.M. Boggis, F.R.C.R.,
and Stephen W. Duffy, M.Sc., for the CADET II Group*
From the Aberdeen Biomedical Imaging
Centre, University of Aberdeen, Aberdeen
(F.J.G., M.G.C.G.); the Department of Im-
aging Science and Biomedical Engineer-
ing,UniversityofManchester,Manchester
(S.M.A.); the Department of Epidemiolo-
gy, Mathematics, and Statistics, Wolfson
Institute of Preventive Medicine, London
(O.F.A., S.W.D.); the Cambridge Breast
Unit, Addenbrookes Hospital, Cambridge
(M.G.W.); the Nottingham Breast Insti-
tute, Nottingham City Hospital, Notting-
ham (J.J.); and the Nightingale Breast
Screening Unit, Wythenshawe Hospital,
Manchester (C.R.M.B.) — all in the Unit-
ed Kingdom. Address reprint requests to
Dr. Gilbert at the Aberdeen Biomedical
Imaging Centre, University of Aberdeen,
Lilian Sutton Bldg., Foresterhill, Aberdeen
AB25 2ZD, Scotland, United Kingdom, or
at f.j.gilbert@abdn.ac.uk.
*The members of the Computer-Aided
Detection Evaluation Trial II (CADET II)
group are listed in the Appendix.
This article (10.1056/NEJMoa0803545)
was published at www.nejm.org on Oc-
tober 1, 2008.
N Engl J Med 2008;359:1675-84.
Copyright © 2008 Massachusetts Medical Society.
ABSTR ACT
Background
The sensitivity of screening mammography for the detection of small breast can-
cers is higher when the mammogram is read by two readers rather than by a single
reader. We conducted a trial to determine whether the performance of a single reader
using a computer-aided detection system would match the performance achieved by
two readers.
Methods
The trial was designed as an equivalence trial, with matched-pair comparisons be-
tween the cancer-detection rates achieved by single reading with computer-aided de-
tection and those achieved by double reading. We randomly assigned 31,057 women
undergoing routine screening by film mammography at three centers in England to
double reading, single reading with computer-aided detection, or both double read-
ing and single reading with computer-aided detection, at a ratio of 1:1:28. The pri-
mary outcome measures were the proportion of cancers detected according to regi-
men and the recall rates within the group receiving both reading regimens.
Results
The proportion of cancers detected was 199 of 227 (87.7%) for double reading and
198 of 227 (87.2%) for single reading with computer-aided detection (P=0.89). The
overall recall rates were 3.4% for double reading and 3.9% for single reading with
computer-aided detection; the difference between the rates was small but significant
(P<0.001). The estimated sensitivity, specificity, and positive predictive value for single
reading with computer-aided detection were 87.2%, 96.9%, and 18.0%, respectively.
The corresponding values for double reading were 87.7%, 97.4%, and 21.1%. There
were no significant differences between the pathological attributes of tumors de-
tected by single reading with computer-aided detection alone and those of tumors
detected by double reading alone.
Conclusions
Single reading with computer-aided detection could be an alternative to double read-
ing and could improve the rate of detection of cancer from screening mammograms
read by a single reader. (ClinicalTrials.gov number, NCT00450359.)
Mammography
• single reading+CAD vs. double reading
• Outcome: Cancer detection rate / Recall rate

The new engl and jour nal of medicine
original article
Single Reading with Computer-Aided
Detection for Screening Mammography
Fiona J. Gilbert, F.R.C.R., Susan M. Astley, Ph.D., Maureen G.C. Gillan, Ph.D.,
Olorunsola F. Agbaje, Ph.D., Matthew G. Wallis, F.R.C.R.,
Jonathan James, F.R.C.R., Caroline R.M. Boggis, F.R.C.R.,
and Stephen W. Duffy, M.Sc., for the CADET II Group*
From the Aberdeen Biomedical Imaging
Centre, University of Aberdeen, Aberdeen
(F.J.G., M.G.C.G.); the Department of Im-
aging Science and Biomedical Engineer-
ing,UniversityofManchester,Manchester
(S.M.A.); the Department of Epidemiolo-
gy, Mathematics, and Statistics, Wolfson
Institute of Preventive Medicine, London
(O.F.A., S.W.D.); the Cambridge Breast
Unit, Addenbrookes Hospital, Cambridge
(M.G.W.); the Nottingham Breast Insti-
tute, Nottingham City Hospital, Notting-
ham (J.J.); and the Nightingale Breast
Screening Unit, Wythenshawe Hospital,
Manchester (C.R.M.B.) — all in the Unit-
ed Kingdom. Address reprint requests to
Dr. Gilbert at the Aberdeen Biomedical
Imaging Centre, University of Aberdeen,
Lilian Sutton Bldg., Foresterhill, Aberdeen
AB25 2ZD, Scotland, United Kingdom, or
at f.j.gilbert@abdn.ac.uk.
*The members of the Computer-Aided
Detection Evaluation Trial II (CADET II)
group are listed in the Appendix.
This article (10.1056/NEJMoa0803545)
was published at www.nejm.org on Oc-
tober 1, 2008.
N Engl J Med 2008;359:1675-84.
Copyright © 2008 Massachusetts Medical Society.
ABSTR ACT
Background
The sensitivity of screening mammography for the detection of small breast can-
cers is higher when the mammogram is read by two readers rather than by a single
reader. We conducted a trial to determine whether the performance of a single reader
using a computer-aided detection system would match the performance achieved by
two readers.
Methods
The trial was designed as an equivalence trial, with matched-pair comparisons be-
tween the cancer-detection rates achieved by single reading with computer-aided de-
tection and those achieved by double reading. We randomly assigned 31,057 women
undergoing routine screening by film mammography at three centers in England to
double reading, single reading with computer-aided detection, or both double read-
ing and single reading with computer-aided detection, at a ratio of 1:1:28. The pri-
mary outcome measures were the proportion of cancers detected according to regi-
men and the recall rates within the group receiving both reading regimens.
Results
The proportion of cancers detected was 199 of 227 (87.7%) for double reading and
198 of 227 (87.2%) for single reading with computer-aided detection (P=0.89). The
overall recall rates were 3.4% for double reading and 3.9% for single reading with
computer-aided detection; the difference between the rates was small but significant
(P<0.001). The estimated sensitivity, specificity, and positive predictive value for single
reading with computer-aided detection were 87.2%, 96.9%, and 18.0%, respectively.
The corresponding values for double reading were 87.7%, 97.4%, and 21.1%. There
were no significant differences between the pathological attributes of tumors de-
tected by single reading with computer-aided detection alone and those of tumors
detected by double reading alone.
Conclusions
Single reading with computer-aided detection could be an alternative to double read-
ing and could improve the rate of detection of cancer from screening mammograms
read by a single reader. (ClinicalTrials.gov number, NCT00450359.)
Table 1
double reading single reading & CAD
proportion of cancers detected 87.7% 87.2%
overall recall rates 3.4% 3.9%
sensitivity 87.2% 87.8%
speciﬁcity 96.9% 97.7%
positive predicted value 18.0% 21.1%
Conclusion: Single reading with computer-aided detection could be an
alternative to double reading and could improve the rate of
detection of cancer from screening mammograms read by a single
reader.

Copyright 2015 American Medical Association. All rights reserved.
Diagnostic Accuracy of Digital Screening Mammography
With and Without Computer-Aided Detection
Constance D. Lehman, MD, PhD; Robert D. Wellman, MS; Diana S. M. Buist, PhD; Karla Kerlikowske, MD;
Anna N. A. Tosteson, ScD; Diana L. Miglioretti, PhD; for the Breast Cancer Surveillance Consortium
IMPORTANCE After the US Food and Drug Administration (FDA) approved computer-aided
detection (CAD) for mammography in 1998, and the Centers for Medicare and Medicaid
Services (CMS) provided increased payment in 2002, CAD technology disseminated rapidly.
Despite sparse evidence that CAD improves accuracy of mammographic interpretations and
costs over $400 million a year, CAD is currently used for most screening mammograms in the
United States.
OBJECTIVE To measure performance of digital screening mammography with and without
CAD in US community practice.
DESIGN, SETTING, AND PARTICIPANTS We compared the accuracy of digital screening
mammography interpreted with (n = 495 818) vs without (n = 129 807) CAD from 2003
through 2009 in 323 973 women. Mammograms were interpreted by 271 radiologists from
66 facilities in the Breast Cancer Surveillance Consortium. Linkage with tumor registries
identified 3159 breast cancers in 323 973 women within 1 year of the screening.
MAIN OUTCOMES AND MEASURES Mammography performance (sensitivity, specificity, and
screen-detected and interval cancers per 1000 women) was modeled using logistic
regression with radiologist-specific random effects to account for correlation among
examinations interpreted by the same radiologist, adjusting for patient age, race/ethnicity,
time since prior mammogram, examination year, and registry. Conditional logistic regression
was used to compare performance among 107 radiologists who interpreted mammograms
both with and without CAD.
RESULTS Screening performance was not improved with CAD on any metric assessed.
Mammography sensitivity was 85.3% (95% CI, 83.6%-86.9%) with and 87.3% (95% CI,
84.5%-89.7%) without CAD. Specificity was 91.6% (95% CI, 91.0%-92.2%) with and 91.4%
(95% CI, 90.6%-92.0%) without CAD. There was no difference in cancer detection rate (4.1 in
1000 women screened with and without CAD). Computer-aided detection did not improve
intraradiologist performance. Sensitivity was significantly decreased for mammograms
interpreted with vs without CAD in the subset of radiologists who interpreted both with and
without CAD (odds ratio, 0.53; 95% CI, 0.29-0.97).
CONCLUSIONS AND RELEVANCE Computer-aided detection does not improve diagnostic
accuracy of mammography. These results suggest that insurers pay more for CAD with no
established benefit to women.
JAMA Intern Med. 2015;175(11):1828-1837. doi:10.1001/jamainternmed.2015.5231
Published online September 28, 2015.
Invited Commentary
page 1837
Author Affiliations: Department of
Radiology, Massachusetts General
Hospital, Boston (Lehman); Group
Health Research Institute, Seattle,
Washington (Wellman, Buist,
Miglioretti); Departments of
Medicine and Epidemiology and
Biostatistics, University of California,
San Francisco, San Francisco
(Kerlikowske); Norris Cotton Cancer
Center, Geisel School of Medicine at
Dartmouth, Dartmouth College,
Lebanon, New Hampshire (Tosteson);
Department of Public Health
Sciences, School of Medicine,
University of California, Davis
(Miglioretti).
Corresponding Author: Constance
D. Lehman, MD, PhD, Department of
Radiology, Massachusetts General
Hospital, Avon Comprehensive Breast
Evaluation Center, 55 Fruit St, WAC
240, Boston, MA 02114 (clehman
@mgh.harvard.edu).
Research
Original Investigation | LESS IS MORE
1828 (Reprinted) jamainternalmedicine.com
Copyright 2015 American Medical Association. All rights reserved.
647 R MYY S M Te UZ HF
q )110 ><9
q (( ;V eVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK"
q v ,((

q (( ;V eVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK"
q () R XcR 0+
t ;9<cer diagnosis within the follow-up period. True-positive
examination results were defined as those with a positive
examination assessment and breast cancer diagnosis. False-
positive examination results were examinations with a posi-
Mammography performan
using logistic regression, includ
diologist-specific random effect
tion among examinations read b
dom effects were allowed to vary
the reading. Performance measu
dian of the random effects distrib
specific relative performance wa
(OR) with 95% CIs comparing C
for patient age at diagnosis, time
year of examination, and the BC
Receiver operating characte
mated from 135 radiologists wh
mogram associated with a cance
cal logistic regression model tha
accuracy parameters to depend o
ing examination interpretation.
racy among radiologists for exa
the same condition (with or wi
threshold for recall to vary acro
mally distributed, radiologist-spe
ied by whether the radiologist us
We estimated the normalized
mary ROC curves across the obs
rates from this model.26
We plo
the false-positive rate and supe
curves.
Two separate main sensitiv
in subsets of total examinations
Figure 1. Screening Mammography Patterns From 2000 to 2012
in US Community Practices in the Breast Cancer Surveillance Consortium
(BCSC)
100
80
60
40
20
0
TypeofMammography,%
Year
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Film Digital with CAD
Type
Digital without CAD
Data are provided from the larger BCSC population including all screening
mammograms (5.2 million mammograms) for the indicated time period.
Research Original Investigation Digital Screening Mammog
CMS 보험 혜택
5%
83%
74%

Diagnostic accuracy was not improved with CAD
on any performance metric assessed
w/ CAD w/o CAD
sensitivity 85.3% 87.3%
sensitivity for invasive cancer 82.1% 85.0%
sensitivity for DCIS 93.2% 94.3%
speciﬁcity 91.6% 91.4%
Detection Rate (Overall) 4.1 per 1000 4.1 per 1000
Detection Rate in DCIS 1.2 per 1000 0.9 per 1000
<
<
<
>

From the ROC analysis, the accuracy of mammographic
interpretations with CAD was significantly lower than for
those without CAD (P = .002). The normalized partial area
under the summary ROC curve was 0.84 for interpretations
with CAD and 0.88 for interpretations without CAD
(Figure 2). In this subset of 135 radi
at least 1 mammogram associated
sensitivity of mammography was
86.9%) with and 89.3%% (95% CI
CAD. Specificity of mammograp
90.4%-91.8%) with and 91.3% (95%
out CAD.
Differences by Age, Breast Density, Men
and Time Since Last Mammogram
We found no differences in diagnos
graphic interpretations with and w
subgroups assessed, including pat
menopausal status, and time si
(Table 3).
Intraradiologist Performance Measures f
With and Without CAD
Among 107 radiologists who interpr
with and without CAD, intraradiolog
improved with CAD, and CAD was a
sensitivity. Sensitivity of mammogr
81.0%-85.6%) with and 89.6% (95%
out CAD. Specificity of mammogra
89.8%-91.7%) with and 89.6% (95%
out CAD. The OR for specificity b
interpreted with CAD and those inte
the same radiologist was 1.02 (95% C
was significantly decreased for ma
Figure 2. Receiver Operating Characteristic Curves for Digital Screening
Mammography With and Without the Use of CAD, Estimated
From 135 Radiologists Who Interpreted at Least 1 Examination
Associated With Cancer
100
80
60
40
20
0
0 403020
True-PositiveRate,%
False-Positive Rate, %
10
No CAD use (PAUC, 0.88)
CAD use (PAUC, 0.84)
Each circle represents the true-positive or false-positive rate for a single
radiologist, for examinations interpreted with (orange) or without (blue)
computer-aided detection (CAD). Circle size is proportional to the number of
mammograms associated with cancer interpreted by that radiologist with or
without CAD. PAUC indicates partial area under the curve.
DCIS, ductal carcinoma in situ; exam, examination.
a
Odds ratio for CAD vs No CAD adjusted for site, age group, race/ethnicity, time
since prior mammogram, and calendar year of the examination using
with CAD use.
b
The 95% CIs for sensitivity and specificity are
The accuracy of mammographic interpretations with CAD
was signiﬁcantly lower than for those without CAD (P = .002)

SPECIAL CONTRIBUTION
J Korean Med Assoc 2018 December; 61(12):765-775
pISSN 1975-8456 / eISSN 2093-5951
https://doi.org/10.5124/jkma.2018.61.12.765
첨단 디지털 의료기기 765
첨단 디지털 헬스케어 의료기기를 진료에 도입할 때
평가원칙
박 성 호1
・도 경 현1
・최 준 일2
・심 정 석3
・양 달 모4
・어 홍5
・우 현 식6
・이 정 민7
・정 승 은2
・오 주 형8
| 1
울산대학교 의과대학 서
울아산병원 영상의학과, 2
가톨릭대학교 의과대학 서울성모병원 영상의학과, 3
위드심의원, 4
강동경희대학교병원 영상의학과, 5
삼성서울병원
영상의학과, 6
서울대학교 의과대학 서울특별시보라매병원 영상의학과, 7
서울대학교 의과대학 서울대학교병원 영상의학과, 8
경희대학교 의
과대학 경희의료원 영상의학과
Principles for evaluating the clinical
implementation of novel digital healthcare devices
Seong Ho Park, MD1
· Kyung-Hyun Do, MD1
· Joon-Il Choi, MD2
· Jung Suk Sim, MD3
· Dal Mo Yang, MD4
· Hong Eo, MD5
· Hyunsik Woo,
MD6
· Jeong Min Lee, MD7
· Seung Eun Jung, MD2
· Joo Hyeong Oh, MD8
1
Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul;
2
Department of Radiology, Seoul St. Mary's Hospital, The Catholic University of Korea College of Medicine, Seoul; 3
Withsim Clinic,
Seongnam; 4
Department of Radiology, Kyung Hee University Hospital at Gangdong, Seoul; 5
Department of Radiology and Center
for Imaging Science, Samsung Medical Center, Seoul; 6
Department of Radiology, SMG-SNU Boramae Medical Center, Seoul National
University College of Medicine, Seoul; 7
Department of Radiology, Seoul National University Hospital, Seoul National University College
of Medicine, Seoul; 8
Department of Radiology, Kyung Hee University Hospital, Kyung Hee University College of Medicine, Seoul,
Korea
With growing interest in novel digital healthcare devices, such as artificial intelligence (AI) software for medical
diagnosis and prediction, and their potential impacts on healthcare, discussions have taken place regarding
the regulatory approval, coverage, and clinical implementation of these devices. Despite their potential, ‘digital
exceptionalism’ (i.e., skipping the rigorous clinical validation of such digital tools) is creating significant concerns
for patients and healthcare stakeholders. This white paper presents the positions of the Korean Society of
Radiology, a leader in medical imaging and digital medicine, on the clinical validation, regulatory approval,
coverage decisions, and clinical implementation of novel digital healthcare devices, especially AI software
for medical diagnosis and prediction, and explains the scientific principles underlying those positions. Mere
regulatory approval by the Food and Drug Administration of Korea, the United States, or other countries should
be distinguished from coverage decisions and widespread clinical implementation, as regulatory approval only
indicates that a digital tool is allowed for use in patients, not that the device is beneficial or recommended
for patient care. Coverage or widespread clinical adoption of AI software tools should require a thorough
clinical validation of safety, high accuracy proven by robust external validation, documented benefits for patient
outcomes, and cost-effectiveness. The Korean Society of Radiology puts patients first when considering novel
digital healthcare tools, and as an impartial professional organization that follows scientific principles and
evidence, strives to provide correct information to the public, make reasonable policy suggestions, and build
collaborative partnerships with industry and government for the good of our patients.
Key Words:Software validation; Device approval; Insurance coverage; Artificial intelligence
REVIEWSANDCOMMENTARYnREVIEW
Radiology: Volume 000: Number 0—᭿ ᭿ ᭿ 2018 n radiology.rsna.org 1
1
From the Department of Radiology and Research Institute
of Radiology, University of Ulsan College of Medicine, Asan
Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul
05505, South Korea (S.H.P.); and Department of Radiology,
Research Institute of Radiological Science, Yonsei
University College of Medicine, Seoul, South Korea (K.H.).
Received August 11, 2017; revision requested October 2;
revision received October 12; accepted October 24; final
version accepted November 3. Address correspondence
to S.H.P. (e-mail: parksh.radiology@gmail.com).
S.H.P. supported by the Industrial Strategic Technology
Development Program (grant 10072064) funded by the
Ministry of Trade, Industry and Energy.
q
RSNA, 2018
The use of artificial intelligence in medicine is currently an
issue of great interest, especially with regard to the diag-
nostic or predictive analysis of medical images. Adoption
of an artificial intelligence tool in clinical practice requires
careful confirmation of its clinical utility. Herein, the au-
thors explain key methodology points involved in a clinical
evaluation of artificial intelligence technology for use in
medicine, especially high-dimensional or overparameter-
ized diagnostic or predictive models in which artificial
deep neural networks are used, mainly from the stand-
points of clinical epidemiology and biostatistics. First, sta-
tistical methods for assessing the discrimination and cali-
bration performances of a diagnostic or predictive model
are summarized. Next, the effects of disease manifesta-
tion spectrum and disease prevalence on the performance
results are explained, followed by a discussion of the dif-
ference between evaluating the performance with use of
internal and external datasets, the importance of using
an adequate external dataset obtained from a well-de-
fined clinical cohort to avoid overestimating the clinical
performance as a result of overfitting in high-dimensional
or overparameterized classification model and spectrum
bias, and the essentials for achieving a more robust clini-
cal evaluation. Finally, the authors review the role of clin-
ical trials and observational outcome studies for ultimate
clinical verification of diagnostic or predictive artificial in-
telligence tools through patient outcomes, beyond perfor-
mance metrics, and how to design such studies.
q
RSNA, 2018
Seong Ho Park, MD, PhD
Kyunghwa Han, PhD
Methodologic Guide for
Evaluating Clinical Performance
and Effect of Artificial Intelligence
Technology for Medical Diagnosis
and Prediction1
This copy is for personal use only. To order printed copies, contact reprints@rsna.org

l % m
q t ! ()/ ) "
q t7 t t7
q !A:E ORed W c G T Xj ' ?V Td"
luq % %
q t 2 t7 t7
q t t7
q t t7
x w m% m x

AI기반 의료기술(영상의학 분야)의
급여여부 평가 가이드라인 마련 연구
건강보험심사평가원
대 한 영 상 의 학 회
l
l
l m m
l x • $$

- 20 -
· ·
·
·
·
l % v x •
l 4< v m •
l u h mi t
l uq m

l?QbQ
l m % o p
l v
l
l =?
l m m m
- 22 -

l?QbQ (
l 5UM
l 4< •
l 4< % v t
l m w ?QbQ )
- 22 -

l?QbQ )
l 5UM
l 4< “ m ?QbQ )
l ! m ?QbQ
- 22 -

4<
•검사 행위 별 가산료
• 기존 수가에 가산료를 지급하는 방안
• ‘영상의학과 전문의가 판독하는 경우 vs 다른 의사가 판독하는 경우’와 유사 
•간접 보상
• 의료기관 전체에 대한 간접적인 보상(의료기관 인증, 요양급여적정성 평가...)
• 일부 환자는 이득, 일부 환자는 피해를 보는 경우 (ex. Brain CT의 판독 prioritization) 
•별도 행위 신설
• 기존 검사에서 제공하지 않던 완전히 새로운 진단 정보를 제공하는 경우
• 대응되는 기존의 급여/비급여 검사가 있는 경우 or 없는 경우  
•의사 업무량의 ‘일부’에 해당하는 수가 신설
• 기존의 ‘판독료’는 영상의학과 의사의 ‘전체 판독 프로세스’의 일부에 불과
• AI가 일부 업무만을 수행하는 경우, 그에 맞는 적절한 보상 규모 산정

FM 7 F R cM Q M M QPUOM 7QbUOQ
l t h i
q ! "
q $ JO< $ $
q $ KRE<
lC Q 6Q 0 m

FM 7 F R cM Q M M QPUOM 7QbUOQ
의료기기
엑스레이 기기

혈압계
디지털 헬스케어
핏빗

런키퍼

슬립싸이클
디지털
치료제
페어 알킬리
엠페티카
얼라이브코어
프로테우스
인공지능
*SaMD: Software as a Medical Device
SaMD*
복잡한
데이터
의료  
영상 시그널
영상의학
병리
안과
피부
3 m30 Z Ne

lh i0 4PM UbQ ?QM ZUZS
q KRE< m ! T VU"n t
q 9A !Vi& v "
q ORed W c G T Xj2 '
q 2 ecfV R Rc ' WR dV R Rc v
q 2 ' ' eV UVU W c fdV
x w m

•Initial premarket review 시에, 향후 수정에 대한 관리 계획을 제출
•SaMD Pre-Specification (SPS)
• 제조사가 예상/계획하는 출시 이후 performance / input / intended use 상의 변화
• 최초 기기에서 향후 변화할 범주 (region of potential changes)를 정의
•Algorithm Change Protocol (ACP)
• SPS에서 정의된 변화에 대한 risk 를 컨트롤하기 위한 구체적인 방법
• 변화 이후에 safety/effectiveness 유지 되도록 data/procedure 에 대해 단계별 기술
C QPQ Q YUZQP OTMZSQ O Z MZ

46C!9 X c eY ;YR XV Hc e T "general overview of the components of an ACP. This is "how" the algorithm will learn and change while
remaining safe and effective.
Scope and limitations for establishing SPS and ACP: The FDA acknowledges that the types of changes
that could be pre-specified in a SPS and managed through an ACP may necessitate individual
Figure 4: Algorithm Change Protocol components
Y h eYV R X c eY h VRc R U TYR XV hY V cV R X dRWV R U VWWVTe gV&

m FCF 46Cm x m3
modifications guidance results in either 1) submission of a new 510(k) for premarket review or 2)
documentation of the modification and the analysis in the risk management and 510(k) files. If, for
AI/ML SaMD with an approved SPS and ACP, modifications are within the bounds of the SPS and the
ACP, this proposed framework suggests that manufacturers would document the change in their change
history and other appropriate records, and file for reference, similar to the “document” approach
outlined in the software modifications guidance.
Figure 5: Approach to modifications to previously approved SaMD with SPS and ACP. This flowchart should only be
considered in conjunction with the accompanying text in this white paper.

l y 4PbQ M UM 4 MOW m
q $ t
q ! v "
q ! " u $
q J Sfde Vdd gd& HVcW c R TV2
h yi m
Science 2019

l y 4PbQ M UM 4 MOW m
q $ t
q ! v "
q ! " u $
q J Sfde Vdd gd& HVcW c R TV2
h yi m
Science 2019
INSIGHTS | POLICY FORUM
case of structured data such as billing codes,
adversarial techniques could be used to au-
tomate the discovery of code combinations
that maximize reimbursement or minimize
the probability of claims rejection.
Because adversarial attacks have been
demonstrated for virtually every class of ma-
chine-learning algorithms ever studied, from
simple and readily interpretable methods
such as logistic regression to more compli-
cated methods such as deep neural networks
(1), this is not a problem specific to medicine,
and every domain of machine-learning ap-
plication will need to contend with it. Re-
searchers have sought to develop algorithms
that are resilient to adversarial attacks, such
as by training algorithms with exposure to
adversarial examples or using clever data
processing to mitigate potential tampering
(1). Early efforts in this area are promising,
and we hope that the pursuit of fully robust
machine-learning models will catalyze the
development of algorithms that learn to
make decisions for consistently explainable
and appropriate reasons. Nevertheless, cur-
rent general-use defensive techniques come
at a material degeneration of accuracy, even
if sometimes at improved explainability (10).
Thus, the models that are both highly accu-
rate and robust to adversarial examples re-
main an open problem in computer science.
These challenges are compounded in the
medical context. Medical information tech-
nology (IT) systems are notoriously difficult
to update, so any new defenses could be diffi-
cult to roll out. In addition, the ground truth
in medical diagnoses is often ambiguous,
meaning that for many cases no individual
human can definitively assign the true label At the extreme of this tactical shaping of mends billing for codes corresponding to
Benign
Malignant
Model confidence
Benign
Malignant
Model confidence
The patient has a history of
back pain and chronic alcohol
abuse and more recently has
been seen in several...
277.7 Metabolic syndrome
429.9 Heart disease, unspecified
278.00 Obesity, unspecified
401.0 Benign essential hypertension
272.0 Hypercholesterolemia
272.2 Hyperglyceridemia
429.9 Heart disease,unspecified
278.00 Obesity,unspecified
The patient has a history of
lumbago and chronic alcohol
dependence and more recently
has been seen in several...
Dermatoscopic image of a benign
melanocytic nevus, along with the
diagnostic probability computed
by a deep neural network.
Perturbation computed
by a common adversarial
attack technique.
See (7) for details.
Combined image of nevus and
attack perturbation and the
diagnostic probabilities from
the same deep neural network.
Original image
Diagnosis: Benign Diagnosis: Malignant
Opioid abuse risk: High
Reimbursement: Denied Reimbursement: Approved
Opioid abuse risk: Low
Adversarial noise
Adversarial
rotation (8)
Adversarial example
Adversarial
text substitution (9)
Adversarial
coding (13)
+ 0.04 =
The anatomy of an adversarial attack
Demonstration of how adversarial attacks against various medical AI systems might be
executed without requiring any overtly fraudulent misrepresentation of the data.
onMarhttp://science.sciencemag.org/Downloadedfrom

Sep 2018, Health 2.0 @Santa Clara

March 2019, the Future of Individual Medicine @San Diego

40
50
60
70
80
9 :
40
50
60
70
80
9   :  
69.5%
63%
49.5%
72.5%
57.5%
!

!
b !
q 2 ((
q 92 !-((
q :2 ! (
q 2 !)0 $ ,
q 2 NMFG
o
s

Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
b $
bM aQ !
b $ !
bM aQ
)
+
,
,
,
/
0
.
)+
1
s

https://blogs.nvidia.com/blog/2016/09/19/deep-learning-breast-cancer-diagnosis/
s

modeled separately. For micrometastases, sensitivity was signiﬁcantly higher with
Negative
(Specificity)
Micromet
(Sensitivity)
Macromet
(Sensitivity)
0.7
0.5
0.6
0.8
0.9
1.0
p=0.02
A B
Performance
Unassisted
Assisted
FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image
category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for
negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual
pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic
curve of the algorithm. AUC indicates area under the curve.
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5
l p
l UO YQ x
l AQSM UbQ MO YQ •
l 4H6
l
l !
s

1Wang P, et al. Gut 2019;0:1–7. doi:10.1136/gutjnl-2018-317500
Endoscopy
ORIGINAL ARTICLE
Real-time automatic detection system increases
colonoscopic polyp and adenoma detection rates: a
prospective randomised controlled study
Pu Wang, 1
Tyler M Berzin, 2
Jeremy Romek Glissen Brown, 2
Shishira Bharadwaj,2
Aymeric Becq,2
Xun Xiao,1
Peixi Liu,1
Liangping Li,1
Yan Song,1
Di Zhang,1
Yi Li,1
Guangre Xu,1
Mengtian Tu,1
Xiaogang Liu 1
To cite: Wang P, Berzin TM,
Glissen Brown JR, et al. Gut
Epub ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317500
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317500).
1
Department of
Gastroenterology, Sichuan
Academy of Medical Sciences
Sichuan Provincial People’s
Hospital, Chengdu, China
2
Center for Advanced
Endoscopy, Beth Israel
Deaconess Medical Center and
Harvard Medical School, Boston,
Massachusetts, USA
Correspondence to
Xiaogang Liu, Department
of Gastroenterology Sichuan
Academy of Medical Sciences
and Sichuan Provincial People’s
Hospital, Chengdu, China;
Gary.samsph@gmail.com
Received 30 August 2018
Revised 4 February 2019
Accepted 13 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective The effect of colonoscopy on colorectal
cancer mortality is limited by several factors, among them
a certain miss rate, leading to limited adenoma detection
rates (ADRs).We investigated the effect of an automatic
polyp detection system based on deep learning on polyp
detection rate and ADR.
Design In an open, non-blinded trial, consecutive
patients were prospectively randomised to undergo
diagnostic colonoscopy with or without assistance of a
real-time automatic polyp detection system providing
a simultaneous visual notice and sound alarm on polyp
detection.The primary outcome was ADR.
Results Of 1058 patients included, 536 were
randomised to standard colonoscopy, and 522 were
randomised to colonoscopy with computer-aided
diagnosis.The artificial intelligence (AI) system
significantly increased ADR (29.1%vs20.3%, p0.001)
and the mean number of adenomas per patient
(0.53vs0.31, p0.001).This was due to a higher number
of diminutive adenomas found (185vs102; p0.001),
while there was no statistical difference in larger
adenomas (77vs58, p=0.075). In addition, the number
of hyperplastic polyps was also significantly increased
(114vs52, p0.001).
Conclusions In a low prevalent ADR population, an
automatic polyp detection system during colonoscopy
resulted in a significant increase in the number of
diminutive adenomas detected, as well as an increase in
the rate of hyperplastic polyps.The cost–benefit ratio of
such effects has to be determined further.
Trial registration number ChiCTR-DDD-17012221;
Results.
INTRODUCTION
Colorectal cancer (CRC) is the second and third-
leading causes of cancer-related deaths in men and
women respectively.1
Colonoscopy is the gold stan-
dard for screening CRC.2 3
Screening colonoscopy
has allowed for a reduction in the incidence and
mortality of CRC via the detection and removal
of adenomatous polyps.4–8
Additionally, there is
evidence that with each 1.0% increase in adenoma
detection rate (ADR), there is an associated 3.0%
decrease in the risk of interval CRC.9 10
However,
polyps can be missed, with reported miss rates of
up to 27% due to both polyp and operator charac-
teristics.11 12
Unrecognised polyps within the visual field is
an important problem to address.11
Several studies
have shown that assistance by a second observer
increases the polyp detection rate (PDR), but such a
strategy remains controversial in terms of increasing
the ADR.13–15
Ideally, a real-time automatic polyp detec-
tion system, with performance close to that of
expert endoscopists, could assist the endosco-
pist in detecting lesions that might correspond to
adenomas in a more consistent and reliable way
Significance of this study
What is already known on this subject?
► Colorectal adenoma detection rate (ADR)
is regarded as a main quality indicator of
(screening) colonoscopy and has been shown
to correlate with interval cancers. Reducing
adenoma miss rates by increasing ADR has
been a goal of many studies focused on
imaging techniques and mechanical methods.
► Artificial intelligence has been recently
introduced for polyp and adenoma detection
as well as differentiation and has shown
promising results in preliminary studies.
What are the new findings?
► This represents the first prospective randomised
controlled trial examining an automatic polyp
detection during colonoscopy and shows an
increase of ADR by 50%, from 20% to 30%.
► This effect was mainly due to a higher rate of
small adenomas found.
► The detection rate of hyperplastic polyps was
also significantly increased.
How might it impact on clinical practice in the
foreseeable future?
► Automatic polyp and adenoma detection could
be the future of diagnostic colonoscopy in order
to achieve stable high adenoma detection rates.
► However, the effect on ultimate outcome is
still unclear, and further improvements such as
polyp differentiation have to be implemented.
on17March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317500on27February2019.Downloadedfrom
l e PQ QO U Z MPQZ YM PQ QO U Z ‘
q Hc daVTe gV J;L ! 5)(-03 deR URcU5-+.$ ;95-
q
q MPQZ YM PQ QO U Z M Q m2 1)
gd (+
!a4((()
q PQ QO U Z MPQZ YM m2 (-+ gd (+) !a4((()
q PUYUZa UbQ MPQZ YM 2 )0- gd )( !a4((()
q Te Q M UO e m2 )), gd - !a4((()
Endoscopy
0.5cm and polyps in all segments of the colon with the excep-
tion of the caecum and the ascending colon (table 3).
Outcomes in excellent bowel preparation (BBPS ≥7)
In the situation of excellent bowel preparation, ADR in the
CADe group showed a trend of 6% increase superior to that of
the routine group. However, due to the inadequate sample size
of the subgroup analysis, it failed to show a statistically signifi-
cant difference. Other outcomes, including the mean number of
detected adenomas, mean number of detected polyps and PDR
were all significantly increased in the CADe group (table 4).
can still be missed. Studies have also reported that some polyps
are missed by the endoscopist despite being within the visual
field.31 32
Several hypotheses have been proposed to explain
the mechanism by which polyps may be missed. These include
differences in endoscopist skill level, differences in endoscopist
tracking patterns, ‘inattentional blindness’, wherein an observer
fails to process an image on the screen due to distraction, and
‘change blindness’, wherein changes are missed during inter-
ruptions in visual scanning or during eye movements13.33–37
Distraction caused by fatigue or emotional factors may also
contribute. A second party such as a nurse or a trainee observing
may improve PDR. While several studies have shown that this
increases PDR, controversy remains regarding ADR.13–15
It is
Table 3 Polyp and adenoma detection
Routine colonoscopy
(n=536)
CADe colonoscopy
(n=522) P value* FC/OR 95%CI
PDR 0.291 0.4502 0.001 1.995† 1.532 to 2.544
ADR 0.2034 0.2912 0.001 1.61† 1.213 to 2.135
Mean number of detected polyp 0.5019 0.954 0.001 1.89‡ 1.63 to 2.192
Mean number of detected adenoma 0.3097 0.5326 0.001 1.72‡ 1.419 to 2.084
*P value from χ2
test (or Fisher’s exact test, as appropriate) or t-test.
†OR.
‡FC.
ADR, adenoma detection rate; FC, fold change; PDR, polyp detection rate.
s

1Wu L, et al. Gut 2019;0:1–9. doi:10.1136/gutjnl-2018-317366
Endoscopy
ORIGINAL ARTICLE
Randomised controlled trial of WISENSE, a real-time
quality improving system for monitoring blind spots
during esophagogastroduodenoscopy
Lianlian Wu,1,2,3
Jun Zhang,1,2,3
Wei Zhou,1,2,3
Ping An,1,2,3
Lei Shen,1,2,3
Jun Liu,1,3
Xiaoda Jiang,1,2,3
Xu Huang,1,2,3
Ganggang Mu,1,2,3
Xinyue Wan, 1,2,3
Xiaoguang Lv,1,2,3
Juan Gao,1,3
Ning Cui,1,2,3
Shan Hu,4
Yiyun Chen,4
Xiao Hu,4
Jiangjie Li,4
Di Chen,1,2,3
Dexin Gong,1,2,3
Xinqi He,1,2,3
Qianshan Ding,1,2,3
Xiaoyun Zhu,1,2,3
Suqin Li,1,2,3
Xiao Wei,1,2,3
Xia Li,1,2,3
Xuemei Wang,1,2,3
Jie Zhou,1,2,3
Mengjiao Zhang,1,2,3
Hong Gang Yu 1,2,3
To cite: Wu L, Zhang J,
Zhou W, et al. Gut Epub
ahead of print: [please
include Day Month Year].
doi:10.1136/
gutjnl-2018-317366
► Additional material is
published online only.To view
please visit the journal online
(http://dx.doi.org/10.1136/
gutjnl-2018-317366).
For numbered affiliations see
end of article.
Correspondence to
Professor Hong
Gang Yu, Department of
Gastroenterology, Renmin
Hospital of Wuhan University,
Wuhan 430060, China;
yuhonggang1968@163.com
LW, JZ and WZ contributed
equally.
Received 10 August 2018
Revised 28 January 2019
Accepted 17 February 2019
© Author(s) (or their
employer(s)) 2019. Re-use
permitted under CC BY-NC. No
commercial re-use. See rights
and permissions. Published
by BMJ.
ABSTRACT
Objective Esophagogastroduodenoscopy (EGD)
is the pivotal procedure in the diagnosis of upper
gastrointestinal lesions. However, there are significant
variations in EGD performance among endoscopists,
impairing the discovery rate of gastric cancers and
precursor lesions.The aim of this study was to construct a
real-time quality improving system,WISENSE, to monitor
blind spots, time the procedure and automatically
generate photodocumentation during EGD and thus raise
the quality of everyday endoscopy.
Design WISENSE system was developed using the
methods of deep convolutional neural networks and
deep reinforcement learning. Patients referred because
of health examination, symptoms, surveillance were
recruited from Renmin hospital of Wuhan University.
Enrolled patients were randomly assigned to groups
that underwent EGD with or without the assistance
of WISENSE.The primary end point was to ascertain if
there was a difference in the rate of blind spots between
WISENSE-assisted group and the control group.
Results WISENSE monitored blind spots with an accuracy
of 90.40% in real EGD videos.A total of 324 patients were
recruited and randomised. 153 and 150 patients were
analysed in the WISENSE and control group, respectively.
Blind spot rate was lower in WISENSE group compared
with the control (5.86% vs 22.46%, p0.001), and
the mean difference was −15.39% (95% CI −19.23 to
−11.54).There was no significant adverse event.
Conclusions WISENSE significantly reduced blind spot
rate of EGD procedure and could be used to improve the
quality of everyday endoscopy.
Trial registration number ChiCTR1800014809;
Results.
INTRODUCTION
Esophagogastroduodenoscopy (EGD) is the pivotal
procedure in the diagnosis of upper gastrointestinal
lesions.1
High-quality endoscopy delivers better health
outcomes.2
However, there are significant variations in
EGD performance among endoscopists, impairing the
discovery rate of gastric cancers (GC) and precursor
lesions.3
The diagnosis rate of early GC in China is still
Significance of this study
What is already known on this subject?
► The past decades have seen remarkable progress
of deep convolutional neural network (DCNN)
in the field of endoscopy. Recent studies have
successfully used DCNN to achieve accurate
prediction of early gastric cancer in endoscopic
images and real-time histological classification
of colon polyps in unprocessed videos. However,
it has yet not been investigated whether DCNN
could be used in monitoring quality of everyday
endoscopy.
What are the new findings?
► In the present study,WISENSE, a real-time
quality improving system based on the DCNN
and deep reinforcement learning (DRL) for
monitoring blind spots, timing the procedure
and generating photodocumentation during
esophagogastroduodenoscopy (EGD) was
developed.The performance ofWISENSE was
verified in EGD videos.A single-centre randomised
controlled trial was conducted to evaluate the
hypothesis thatWISENSE would reduce the rate
of blind spots during EGD.To the best of our
knowledge, this is the first study using deep
learning in the field of assuring endoscopy
completeness and using DRL in making medical
decisions in human body environment and also
the first study validating the efficiency of a deep
learning system in a randomised controlled
trial.
How might it impact on clinical practice in the
foreseeable future?
► WISENSE greatly reduced blind spot rate,
increased inspection time and improved the
completeness of photodocumentation of EGD
in the randomised controlled trial. It could be
a powerful assistant tool for mitigating skill
variations among endoscopists and improving the
quality of everyday endoscopy.
on26March2019byguest.Protectedbycopyright.http://gut.bmj.com/Gut:firstpublishedas10.1136/gutjnl-2018-317366on11March2019.Downloadedfrom
8:7 s

puted as new data became avail
when his or her score crossed t
dation set, the AUC obtained f
0.81 to 0.85) (Fig. 2). At a spec
of 0.33], TREWScore achieved a s
a median of 28.2 hours (IQR, 10
Identification of patients b
A critical event in the developme
related organ dysfunction (seve
been shown to increase after th
more than two-thirds (68.8%) o
were identified before any sepsi
tients were identified a median
(Fig. 3B).
Comparison of TREWScore
Weevaluatedtheperformanceof
methods for the purpose of provid
use of TREWScore. We first com
to MEWS, a general metric used
of catastrophic deterioration (17
oped for tracking sepsis, MEWS
tion of patients at risk for severe
Fig. 2. ROC for detection of septic shock before onset in the validation
set. The ROC curve for TREWScore is shown in blue, with the ROC curve for
MEWS in red. The sensitivity and specificity performance of the routine
screening criteria is indicated by the purple dot. Normal 95% CIs are shown
for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate.
R E S E A R C H A R T I C L E
A targeted real-time early warning score (TREWScore)
for septic shock
AUC=0.83
At a speciﬁcity of 0.67,TREWScore achieved a sensitivity of 0.85  
and identiﬁed patients a median of 28.2 hours before onset.

q • t !9$ :
q92 =OK ++(
$ E=OK (+

q:2 =OK , /
$ E=OK ,(

(source: VUNO)
APPH(Alarms Per Patients Per Hour)
(source: VUNO)
Less False Alarm
’

ARTICLE OPEN
Scalable and accurate deep learning with electronic health
records
Alvin Rajkomar 1,2
, Eyal Oren1
, Kai Chen1
, Andrew M. Dai1
, Nissan Hajaj1
, Michaela Hardt1
, Peter J. Liu1
, Xiaobing Liu1
, Jake Marcus1
,
Mimi Sun1
, Patrik Sundberg1
, Hector Yee1
, Kun Zhang1
, Yi Zhang1
, Gerardo Flores1
, Gavin E. Duggan1
, Jamie Irvine1
, Quoc Le1
,
Kurt Litsch1
, Alexander Mossin1
, Justin Tansuwan1
, De Wang1
, James Wexler1
, Jimbo Wilson1
, Dana Ludwig2
, Samuel L. Volchenboum3
,
Katherine Chou1
, Michael Pearson1
, Srinivasan Madabushi1
, Nigam H. Shah4
, Atul J. Butte2
, Michael D. Howell1
, Claire Cui1
,
Greg S. Corrado1
and Jeffrey Dean1
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare
quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR
data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation
of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that
deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple
centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic
medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR
data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for
tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day
unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge
diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases.
We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case
study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the
patient’s chart.
npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1
INTRODUCTION
The promise of digital medicine stems in part from the hope that,
by digitizing health data, we might more easily leverage computer
information systems to understand and improve care. In fact,
routinely collected patient healthcare data are now approaching
the genomic scale in volume and complexity.1
Unfortunately,
most of this information is not yet used in the sorts of predictive
statistical models clinicians might use to improve care delivery. It
is widely suspected that use of such efforts, if successful, could
provide major benefits not only for patient safety and quality but
also in reducing healthcare costs.2–6
In spite of the richness and potential of available data, scaling
the development of predictive models is difficult because, for
traditional predictive modeling techniques, each outcome to be
predicted requires the creation of a custom dataset with specific
variables.7
It is widely held that 80% of the effort in an analytic
model is preprocessing, merging, customizing, and cleaning
nurses, and other providers are included. Traditional modeling
approaches have dealt with this complexity simply by choosing a
very limited number of commonly collected variables to consider.7
This is problematic because the resulting models may produce
imprecise predictions: false-positive predictions can overwhelm
physicians, nurses, and other providers with false alarms and
concomitant alert fatigue,10
which the Joint Commission identified
as a national patient safety priority in 2014.11
False-negative
predictions can miss significant numbers of clinically important
events, leading to poor clinical outcomes.11,12
Incorporating the
entire EHR, including clinicians’ free-text notes, offers some hope
of overcoming these shortcomings but is unwieldy for most
predictive modeling techniques.
Recent developments in deep learning and artificial neural
networks may allow us to address many of these challenges and
unlock the information in the EHR. Deep learning emerged as the
preferred machine learning approach in machine perception
www.nature.com/npjdigitalmed

Table 3 List of variants identified as actionable by 3 different platforms
Gene Variant
Identified variant Identified associated drugs
NYGC WGA FO NYGC WGA FO
CDKN2A Deletion Yes Yes Yes Palbociclib, LY2835219
LEE001
Palbociclib LY2835219 Clinical trial
CDKN2B Deletion Yes Yes Yes Palbociclib, LY2835219
LEE002
Palbociclib LY2835219 Clinical trial
EGFR Gain (whole arm) Yes — — Cetuximab — —
ERG Missense P114Q Yes Yes — RI-EIP RI-EIP —
FGFR3 Missense L49V Yes VUS — TK-1258 — —
MET Amplification Yes Yes Yes INC280 Crizotinib, cabozantinib Crizotinib, cabozantinib
MET Frame shift R755fs Yes — — INC280 — —
MET Exon skipping Yes — — INC280 — —
NF1 Deletion Yes — — MEK162 — —
NF1 Nonsense R461* Yes Yes Yes MEK162 MEK162, cobimetinib,
trametinib, GDC-0994
Everolimus, temsirolimus,
trametinib
PIK3R1 Insertion
R562_M563insI
Yes Yes — BKM120 BKM120, LY3023414 —
PTEN Loss (whole arm) Yes — — Everolimus, AZD2014 — —
STAG2 Frame shift R1012 fs Yes Yes Yes Veliparib, clinical trial Olaparib —
DNMT3A Splice site 2083-1G.C — — Yes — — —
TERT Promoter-146C.T Yes — Yes — — —
ABL2 Missense D716N Germline NA VUS
mTOR Missense H1687R Germline NA VUS
NPM1 Missense E169D Germline NA VUS
NTRK1 Missense G18E Germline NA VUS
PTCH1 Missense P1250R Germline NA VUS
TSC1 Missense G1035S Germline NA VUS
Abbreviations: FO 5 FoundationOne; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; WGA 5 Watson Genomic Analytics; WGS 5 whole-
genome sequencing.
Genes, variant description, and, where appropriate, candidate clinically relevant drugs are listed. Variants identified by the FO as variants of uncertain
significance (VUS) were identified by the NYGC as germline variants.

v !
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
4 5

o p m
q 2 ((
q 92 !-((
q :2 ! (
q 2 !)0 $ ,
q 2 NMFG

isolated diagnostic tasks. Underlying these exciting advances,
however, is the important notion that these algorithms do not
replace the breadth and contextual knowledge of human
pathologists and that even the best algorithms would need to
from 83% to 91% and resulted in higher overall diagnostic
accuracy than that of either unassisted pathologist inter-
pretation or the computer algorithm alone. Although deep
learning algorithms have been credited with comparable
Unassisted Assisted
TimeofReview(seconds)
Timeofreviewperimage(seconds)
Negative ITC Micromet Macromet
A B
p=0.002
p=0.02
Unassisted
Assisted
Micrometastases
FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists
analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance.
Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance.
Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance
modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that
were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent
quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi-
crometastasis; macromet, macrometastasis.
8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.
l o Q UYMSQ
l AQSM UbQ UO YQ x p
l UO YQ ( m p
l G6 M QP GaY 6Q AQSM UbQ •

referred from Dr. Eric Topol on Twitter

Feedback/Questions
• Email: yoonsup.choi@gmail.com
• Blog: http://www.yoonsupchoi.com
• Facebook: Yoon Sup Choi

Supervised autonomous robotic soft tissue surgery
;Y UcV nd FRe R @VR eY
KjdeV z
K Rce L ddfV 9fe fd
J S e !KL9J
Azad Shademan et al. Sci Transl Med 2016

Supervised autonomous robotic soft tissue surgery

l8d bUb UZ bUb QZP QZP FG4E
q t !GH=F
q…x ! RaRc dT aj$ D9H
q J9K
l
q v !daRT X
q t …x t ! VR acVddfcV
q ! f SVc W deR Vd
q v !T a Ve e V
q x ! f V cVUfTe
suture placement compared to other techniques (table S1). Moreover,
leak pressure reflects the functional quality of suturing. The linear
closure from STAR was able to withstand a higher average leak pressure
than all other techniques (Fig. 2B).
suturing tool maneuvers before piercing. Using the NIRF markers as
reference points, the plan interpolated intermediate suture placements
on the bowel and adjusted placement of each suture, knot, and corner
slidetoaccommodatedeformationsandinducedscenerotations(Fig.1F).
Fig. 2. Ex vivo linear suturing under deformations. The experiment con-
sisted of closing a longitudinal cut along pig intestine, whereas the tissue
was deformed by pulling on stay sutures. Five samples were tested per tech-
nique (OPEN, LAP, RAS, and STAR). (A) Suture spacing. Central mark is the
median; box edges are the 25th and 75th percentiles, error bars are the range
excluding outliers, and red dots are outliers. The whiskers represent the range
not including outliers. There is a different N number for each boxplot because
eachsurgeonuseda different number of sutures [OPEN (n =174), LAP(n= 128),
RAS (n = 176), and STAR (n = 206)]. These data are presented numerically in
table S2, including the SDs. P values determined by ANOVA with post hoc
Games-Howell. (B and C) Leak pressures and number of mistakes (reposi-
tioned stitches or robot reboot). Data are from individual tissue samples
(n = 5) with averages marked by a horizontal line. P values determined by
independent samples t test. (D) Completion times separated into knot-tying
and suturing, and other time was spent restaging or changing sutures. Data
are averages (n = 5). P values determined by independent samples t test.
www.ScienceTranslationalMedicine.org 4 May 2016 Vol 8 Issue 337 337ra64 3
nMay7,2016

maining circumference (Fig. 3
ing that different levels of auto
be used effectively for differ
Overall, 57.8% of the procedu
fully autonomously with no a
Alternatively, in the current s
autonomous mode without an
teraction would require suture
in 42.2% of sutures placed, m
corners. The completion tim
also included supervisory ac
surgeon, which accounted f
the total time (7% for suture
justment, 3.3% for confirmati
location, and 2.6% for mistake
In vivo end-to-end anast
Finally, we performed in vivo
autonomous surgery in pig in
cessed through a laparotomy
(n = 4) and compared these a
an OPEN control (n = 1) (fig
used the same suture algorith
ex vivo trials (Fig. 1, G and
OPEN control, the surgeon us
surgical hand tools to open th
exposed the intestine, and sutu
a transverse incision. The av
STAR procedure time was 50.0
where 77.4% was anastomos
22.6% was restaging time be
and front walls, which inclu
2.16 min for marking the tis
B and E, and Table 1). Al
OPEN timewasonly 8min,th
was comparable to the averag
laparoscopic anastomoses that
30 min for vesicourethral (25
for aortic (26), to 90 min for
constructions (27).
No complications were obs
Fig. 3. End-to-end anastomosis ex vivo. The experiment
consisted of closing a transverse cut in pig intestine. Five
samples were tested per technique (OPEN, LAP, RAS, and STAR).
(A) Suture spacing. Central mark is the median; box edges are
the 25th and 75th percentiles; and red dots are outliers. The
whiskers represent the range not including outliers. There is a
different N number for each boxplot because each surgeon
used a different number of sutures [OPEN (n = 138), LAP (n =
98), RAS (n = 132), and STAR (n = 180)]. The average spacing
betweenconsecutive sutures was calculated and compared be-
tween STAR and other modalities. The variance of suture
spacing is presented numerically in table S2, including the SD.
P values determined by ANOVA with post hoc Games-Howell. (B) Exvivo end-to-end anastomosis leak pressures.
Dataareindividualtissuesamples,withmeansdisplayedashorizontallines(n=4to5).Onesample was sutured
closed and thus could not be tested for leak pressure. P values determined by independent samples t test.
(C) The leak pressure as a function of maximum suture spacing. Data are individual tissue samples that were fit
to a rational function (y = 0.854/x) (n = 4 to 5). (D) Number of mistakes (repositioned stitches or robot reboot).
Data are individual tissue samples with means displayed as horizontal lines (n = 5). P values determined by
independent samples t test. (E) Ex vivo end-to-end anastomosis completion times. Average times for n = 5
tissue samples per procedure are divided into subtasks of knots and running sutures. “Other” time was spent
restaging and changing sutures. Pvalues determined byindependent samplesttest.(F) Percentreductionin
l8d bUb UZ bUb QZP QZP FG4E
q t !GH=F
q…x ! RaRc dT aj$ D9H
q J9K
l
q v !daRT X
q t …x t ! VR acVddfcV
q ! f SVc W deR Vd
q v !T a Ve e V
q x ! f V cVUfTe

BeyondVerbal: Reading emotions from voices

http://www.wsj.com/articles/SB10001424052702303824204579421242295627138

BeyondVerbal
q t w 7
q 2 ' ' w
q t
q 9Ve R ()
q .
q

• linguistic
• identiﬁcation and extraction of
word instances (unigrams) and
word-pair instances (bi-grams)
from the transcriptions
• acoustic
• vocal dynamics
• voice quality
• vocal tract resonance frequencies
• pause lengths
A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
• “Do you have hope?”
• “Do you have any fear?”
• “Do you have any secrets?”
• “Are you angry?”
• “Does it hurt emotionally?”
Pestian, Suicide and Life-Threatening Behavior, 2016

A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
SensitivitySensitivity
1.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
SUICIDE THOUGHT MARKERS
SensitivitySensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally ill (middle), and
SensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally
suicide versus mentally ill with control. The ROC curves for adolescents (blue), adults (red), and a
generated where the nonsuicidal population is controls (top), mentally ill (middle), and mentally
using linguistic and acoustic features. The gray line is the AROC curve for a baseline (random) cla
TABLE 2
The AROC for the Machine Learning Algorithm. The Nonsuicidal Group Comprises of Either Mentally Ill and Control Subjects. Classiﬁcation
Performances are Shown for Adolescents, Adults, and the Combined Adolescent and Adult Cohorts
Suicidal versus Controls Suicidal versus Mentally Ill Suicidal versus Mentally Ill and Controls
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Linguistics 0.87 (0.04) 0.91 (0.02) 0.93 (0.02) 0.82 (0.05) 0.77 (0.04) 0.79 (0.03) 0.82 (0.04) 0.84 (0.03) 0.87 (0.02)
Acoustics 0.74 (0.05) 0.82 (0.03) 0.79 (0.03) 0.69 (0.06) 0.74 (0.04) 0.76 (0.03) 0.74 (0.05) 0.80 (0.03) 0.76 (0.03)
Linguistics +
Acoustics
0.83 (0.05) 0.93 (0.02) 0.92 (0.02) 0.80 (0.05) 0.77 (0.04) 0.82 (0.03) 0.81 (0.04) 0.84 (0.03) 0.87 (0.02)
PESTIANETAL.
Suicidal vs. Control Suicidal vs. Mentally Ill Suicidal vs. Mentally Ill and Controls
adolescents
adults
Pestian, Suicide and Life-Threatening Behavior, 2016

l
q $ $ w
q aY X TR WVRefcV
qY XY V VcXj h cU2 x ' t 
q ,(.
q )  
l/ QOU U Z m

Detection of the Prodromal Phase of Bipolar Disorder from
Psychological and Phonological Aspects in Social Media
Yen-Hao Huang
National Tsing Hua University
Hsinchu, Taiwan
yenhao0218@gmail.com
Lin-Hung Wei
Hsinchu, Taiwan
adeline80916@gmail.com
Yi-Shin Chen
Hsinchu, Taiwan
yishin@gmail.com
ABSTRACT
Seven out of ten people with bipolar disorder are initially
misdiagnosed and thirty percent of individuals with bipolar
disorder will commit suicide. Identifying the early phases of
the disorder is one of the key components for reducing the
full development of the disorder. In this study, we aim at
leveraging the data from social media to design predictive
models, which utilize the psychological and phonological fea-
tures, to determine the onset period of bipolar disorder and
provide insights on its prodrome. This study makes these dis-
coveries possible by employing a novel data collection process,
coined as Time-specific Subconscious Crowdsourcing, which
helps collect a reliable dataset that supplements diagnosis
information from people suffering from bipolar disorder. Our
experimental results demonstrate that the proposed models
could greatly contribute to the regular assessments of people
with bipolar disorder, which is important in the primary care
setting.
KEYWORDS
Bipolar Disorder Detection, Mental Disorder, Prodromal
Phrase, Emotion Analysis, Sentiment Analysis, Phonology,
Social Media
1 INTRODUCTION
Bipolar disorder (BD) is a common mental illness charac-
terized by recurrent episodes of mania/hypomania and de-
pression, which is found among all ages, races, ethnic groups
and social classes. The regular assessment of people with
BD is an important part of its treatment, though it may be
very time-consuming [21]. There are many beneficial treat-
ments for the patients, particularly for delaying relapses. The
identification of early symptoms is significant for allowing
early intervention and reducing the multiple adverse conse-
quences of a full-blown episode. Despite the importance of
the detection of prodromal symptoms, there are very few
studies that have actually examined the ability of relatives to
detect these symptoms in BD patients. [20] For the purpose
of early treatment, the challenge leads to: how to identify
the prodrome period of BD. Current studies are thus
aimed at detecting prodromes and analyzing the prodromal
symptoms of manic recurrence in clinics.
With regards to the symptom of social isolation, people
are increasingly turning to popular social media, such as
Facebook and Twitter, to share their illness experiences or
seek advice from others with similar mental health conditions.
As the information is being shared in public, people are
subconsciously providing rich contents about their states
of mind. In this paper, we refer to this sharing and data
collection as time-specific subconscious crowdsourcing.
In this study, we carefully look at patients who have been
diagnosed with BD and who explicitly indicate the diagnosis
and time of diagnosis on Twitter. Our goal is to both predict
whether BD rises on a given period of time, and to discover
the prodromal period for BD. It’s important to clarify that
our goal doesn’t seek to offer a diagnosis but rather to make
a prediction of which users are likely to be suffering from the
BD. The main contributions of our work are:
• Introducing the concept of time-specific subconscious
crowdsourcing, which can aid in locating the social
network behavior data of BD patients with the corre-
sponding time of diagnosis.
• A BD assessment mechanism that differentiates be-
tween prodromal symptoms and acute symptoms.
• Introducing the phonological features into the assess-
ment mechanism, which allows for the possibility to
assess patients through text only.
• An automatic recognition approach that detects the
possible prodromal period for BD.
2 RELATED WORK
Social media resources have been widely utilized by researchers
to study mental health issues. The following literature em-
phasizes on data collection and feature engineering, including
subject recruitment, manual data collection, data collection
applications, keyword matching, and combined approaches.
The clinical approach for mental disorders and prodrome
studies are also discussed in this section.
Subject recruitment: Based on customized question-
naires and contact with subjects, Park et al. [15] recruited
participants for the Center for Epidemiologic Studies Depres-
sion scale(CES-D) [17] and provided their Twitter data. By
analyzing the information contained in tweets, participants
were divided into normal and depressive groups based on
their scores on CES-D. An approach like this one requires ex-
pensive costs to acquire data and conduct the questionnaire.
Manual and automatic data collecting: Moreno et
al. [14] collected data via the Facebook profiles of college stu-
dents reviewed by two investigators. They aimed at revealing
the relationship between demographic factors and depression.
Similarly, in our work, we invest on manual efforts to collect
and properly annotate our dataset. In addition, there are
many applications built on top of social networks that provide
free services where users may need to input their credentials
arXiv:1712.09183v1[cs.IR]26Dec2017

인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (하)

인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (하)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a 인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (하)

Semelhante a 인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (하) (20)

Mais de Yoon Sup Choi

Mais de Yoon Sup Choi (20)

Último

Último (20)

인공지능은 의료를 어떻게 혁신하는가 (2019년 7월) (하)