17. l %q
l
l o o
q@f R e fTY
q $ w$ TRcV r
l %
q
q s
18. Over the course of a career, an oncologist may impart bad news an average of 20,000 times,
but most practicing oncologists have never received any formal training to help them
prepare for such conversations.
19. High levels of empathy in primary care physicians correlate with
better clinical outcomes for their patients with diabetes
24. h ‘ i p 3
• "시험 성적만 우수하면 의대에 들어오고 시험만 통과하면 전문
의 자격증을 취득할 수 있는 의사양성체계는 문제가 있다.”
• "의사를 업으로 삼으려면 서비스 마인드와 공감능력 및 의사소
통 분야에 뛰어나야 하는데 기질적으로 이와 맞지 않는 학생이
입학하게 되면 방황할 수 밖에 없다”
• "사실 환자와 많이 접촉하는 전공의 시절에 환자안전, 환자와
의 의사소통 등 의학 이외의 것에 대한 교육이 절실하지만 당
장 눈앞에 환자를 진료하기에 급급해 교육을 받지 못하고 있다"
25. l p ) y
q
q $ $
l p
)
0 p p
Hojat M et al. Acad Med. 2009
differences in empathy scores between
the two groups varied from a low of 0.05
(in year 0) to a maximum of 0.75 (in year
3). The effect size of the decline in empathy
from year 0 to year 3 was more than double
for those who chose technology-oriented
specialties (d ϭ 1.01) compared with their
counterparts in people-oriented specialties
(d ϭ 0.44).
Discussion
The results of this study showed a
significant decline in mean empathy
scores in th
The pattern
men and wo
pursued the
people-orie
specialties.
findings, ou
obtained a h
than men,11
people-orie
their counte
specialties.1
It is interest
magnitude
the effect si
men compa
those who p
careers com
in people-or
aforementio
with lower e
of medical s
interested in
specialties) l
medical sch
higher empa
pattern of fi
“at-risk” me
vulnerable t
What happ
medical sch
heart by wh
generates a
Figure 2 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of
medical school for 56 men and 65 women who identified themselves at all five administrations of
the JSPE (“matched cohort”) at Jefferson Medical College, Philadelphia, Pennsylvania, 2002–2008.
Empathy
matriculants entering in 2002. This is also
reflected in the total matched cohort.
Figure 1 shows a graphical presentation
of the changes in mean empathy scores
for the matched and unmatched cohorts.
As shown in the figure, the patterns of
changes are very similar in the matched
and unmatched cohorts.
Gender differences
We compared changes in empathy scores
during medical school for men (n ϭ 56)
and women (n ϭ 65) in the matched
cohort. Results are depicted in Figure 2.
As shown in the figure, women
consistently outscored men in every year
of medical school. Gender differences in
all of the test administrations were
statistically significant (P Ͻ .05, by t test).
As shown in Figure 2, although the
pattern of change in empathy scores for
women paralleled that of men, the effect
size estimates of these changes varied
from a low of 0.37 (in year 2) to a high of
0.79 (in year 3). The effect size of the
decline in empathy between year 0 and
year 3 was much larger for men (d ϭ
0.79) than for women (d ϭ 0.56).
Differences across specialties
Changes in empathy scores were
compared for 85 graduates in the
matched cohort who pursued their
residency training in “people-oriented”
specialties (e.g., family medicine,
internal medicine, pediatrics,
emergency medicine, psychiatry,
obstetrics–gynecology) and 36
who pursued their training in
“technology-oriented” specialties (e.g.,
anesthesiology, pathology, radiology,
surgery, orthopedic surgery, etc.).
Results appear in Figure 3. As shown in
the figure, those who pursued
people-oriented specialties consistently
scored higher in all years of medical
school than did their counterparts who
pursued technology-oriented
specialties. However, the difference in
empathy scores between the two groups
became statistically significant starting
from year 2 of medical school (P Ͻ .05,
by t test). The effect size estimates of
Figure 1 Changes in mean Jefferson Scale of Physician Empathy (JSPE) scores in different years of
medical school for the matched cohort (n ϭ 121), who identified themselves at all five
administrations of the JSPE, and the unmatched cohort (n ϭ 335) at Jefferson Medical College,
Philadelphia, Pennsylvania, 2002–2008.
medical school.
†
F(4,296) ϭ 14.4; P Ͻ .001.
‡
F(4,179) ϭ 11.7; P Ͻ .001.
¶
F(4,479) ϭ 25.5; P Ͻ .001.
Academic Medicine, Vol. 84, No. 9 / September 2009 1187
34. v ! "
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
4 5
o p m
q 2 ((
q 92 !-(( "
q :2 ! ( "
q 2 !)0 $ , "
q 2 NMFG
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
36. p ! o
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
l p s /($/ ..$+
q80>H2 >H 0z $ w
q>JG;2 >H )',$ )' $ )$ $ ,$ 0z w
q $ >H $ w t
l o p -) t &&
q v • •
q $ $ w z t
70. l(& . 974 7E m
l<7d 7E0 G O Z AJ &&
l s 7E
l m
l YU P 7E PQ QO U Z x m
l( YU P 7E w r (s v
l
l &s /&&
l p m nn .-$ ./$+ =4 4
l974m PQ Z b QYM WQ QbUQc M TcMe
71. l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
qacV%dTcVV X2 9A$ eYV Yf R U Te c
qU fS V cVRU X2 9A Yf R U Te c
qU fS V TYVT !dVT U a "2 Yf R U Te c$
eYV 9A
72.
73.
74. Assisting Pathologists in Detecting Cancer
with Deep Learning
윤곽선, 색상, heat map 등으로 표시 질병의 유무, 질병의 중증도 등을 제시
75. l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
qacV%dTcVV X2 9A$ eYV Yf R U Te c
qU fS V cVRU X2 9A Yf R U Te c
qU fS V TYVT !dVT U a "2 Yf R U Te c$
eYV 9A
80. Animal Intelligence: Clever Hans
• Clever Hans was an horse that was claimed to have been able to perform arithmetic.
• After a formal investigation in 1907, psychologist Oskar Pfungst demonstrated that the horse was
not actually performing these mental tasks, but was watching the reactions of his human observers.
• The trainer was entirely unaware that he was providing such cues.
81. https://namkugkim.wordpress.com/2017/05/15/
“실제로 서울아산병원 흉부 X-ray를 학습시켰을때, 다른 질환과
다른게 Cardiomegaly (심장 비대)의 경우 학습결과는 좋았으나 전
혀 다른 것을 학습한다는 것을 이것을 통해 알게 되었다.
Cardiomegaly는 심장이 커지는 것을 X-ray로 진단하는 것인데, 딥
러닝이 실제로 심장의 크기를 보는 것이 아니라, 이런 질환을 가진
환자의 X-ray에 있는 특징인 수술자국을 보고 있다는 것을 CAM으
로 확인하였다."
- 서울아산병원 김남국 교수님 블로그
82. l $
q 2 ' ' ! ' ' "
q ' 2 ' c
q v z
q Q O QQZUZS2 9A s eYV Yf R U Te c
qP aN Q QMPUZS2 9A Yf R U Te c
qP aN Q OTQOW2 Yf R U Te c s 9A
83. l m
q v
l m m
q z 2 S ee V
q
q
l x m% w m
q t
q t $ $ t
q RURae gV VRc X ' RUgVcdRc R ReeRT
Issues
89. Copyright 2015 American Medical Association. All rights reserved.
Diagnostic Accuracy of Digital Screening Mammography
With and Without Computer-Aided Detection
Constance D. Lehman, MD, PhD; Robert D. Wellman, MS; Diana S. M. Buist, PhD; Karla Kerlikowske, MD;
Anna N. A. Tosteson, ScD; Diana L. Miglioretti, PhD; for the Breast Cancer Surveillance Consortium
IMPORTANCE After the US Food and Drug Administration (FDA) approved computer-aided
detection (CAD) for mammography in 1998, and the Centers for Medicare and Medicaid
Services (CMS) provided increased payment in 2002, CAD technology disseminated rapidly.
Despite sparse evidence that CAD improves accuracy of mammographic interpretations and
costs over $400 million a year, CAD is currently used for most screening mammograms in the
United States.
OBJECTIVE To measure performance of digital screening mammography with and without
CAD in US community practice.
DESIGN, SETTING, AND PARTICIPANTS We compared the accuracy of digital screening
mammography interpreted with (n = 495 818) vs without (n = 129 807) CAD from 2003
through 2009 in 323 973 women. Mammograms were interpreted by 271 radiologists from
66 facilities in the Breast Cancer Surveillance Consortium. Linkage with tumor registries
identified 3159 breast cancers in 323 973 women within 1 year of the screening.
MAIN OUTCOMES AND MEASURES Mammography performance (sensitivity, specificity, and
screen-detected and interval cancers per 1000 women) was modeled using logistic
regression with radiologist-specific random effects to account for correlation among
examinations interpreted by the same radiologist, adjusting for patient age, race/ethnicity,
time since prior mammogram, examination year, and registry. Conditional logistic regression
was used to compare performance among 107 radiologists who interpreted mammograms
both with and without CAD.
RESULTS Screening performance was not improved with CAD on any metric assessed.
Mammography sensitivity was 85.3% (95% CI, 83.6%-86.9%) with and 87.3% (95% CI,
84.5%-89.7%) without CAD. Specificity was 91.6% (95% CI, 91.0%-92.2%) with and 91.4%
(95% CI, 90.6%-92.0%) without CAD. There was no difference in cancer detection rate (4.1 in
1000 women screened with and without CAD). Computer-aided detection did not improve
intraradiologist performance. Sensitivity was significantly decreased for mammograms
interpreted with vs without CAD in the subset of radiologists who interpreted both with and
without CAD (odds ratio, 0.53; 95% CI, 0.29-0.97).
CONCLUSIONS AND RELEVANCE Computer-aided detection does not improve diagnostic
accuracy of mammography. These results suggest that insurers pay more for CAD with no
established benefit to women.
JAMA Intern Med. 2015;175(11):1828-1837. doi:10.1001/jamainternmed.2015.5231
Published online September 28, 2015.
Invited Commentary
page 1837
Author Affiliations: Department of
Radiology, Massachusetts General
Hospital, Boston (Lehman); Group
Health Research Institute, Seattle,
Washington (Wellman, Buist,
Miglioretti); Departments of
Medicine and Epidemiology and
Biostatistics, University of California,
San Francisco, San Francisco
(Kerlikowske); Norris Cotton Cancer
Center, Geisel School of Medicine at
Dartmouth, Dartmouth College,
Lebanon, New Hampshire (Tosteson);
Department of Public Health
Sciences, School of Medicine,
University of California, Davis
(Miglioretti).
Corresponding Author: Constance
D. Lehman, MD, PhD, Department of
Radiology, Massachusetts General
Hospital, Avon Comprehensive Breast
Evaluation Center, 55 Fruit St, WAC
240, Boston, MA 02114 (clehman
@mgh.harvard.edu).
Research
Original Investigation | LESS IS MORE
1828 (Reprinted) jamainternalmedicine.com
Copyright 2015 American Medical Association. All rights reserved.
647 R MYY S M Te UZ HF
q )110 ><9
q (( ;V eVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK"
q v ,((
90. q (( ;V eVcd W c EVU TRcV R U EVU TR U KVcg TVd !;EK"
q () R XcR 0+
t ;9<cer diagnosis within the follow-up period. True-positive
examination results were defined as those with a positive
examination assessment and breast cancer diagnosis. False-
positive examination results were examinations with a posi-
Mammography performan
using logistic regression, includ
diologist-specific random effect
tion among examinations read b
dom effects were allowed to vary
the reading. Performance measu
dian of the random effects distrib
specific relative performance wa
(OR) with 95% CIs comparing C
for patient age at diagnosis, time
year of examination, and the BC
Receiver operating characte
mated from 135 radiologists wh
mogram associated with a cance
cal logistic regression model tha
accuracy parameters to depend o
ing examination interpretation.
racy among radiologists for exa
the same condition (with or wi
threshold for recall to vary acro
mally distributed, radiologist-spe
ied by whether the radiologist us
We estimated the normalized
mary ROC curves across the obs
rates from this model.26
We plo
the false-positive rate and supe
curves.
Two separate main sensitiv
in subsets of total examinations
Figure 1. Screening Mammography Patterns From 2000 to 2012
in US Community Practices in the Breast Cancer Surveillance Consortium
(BCSC)
100
80
60
40
20
0
TypeofMammography,%
Year
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Film Digital with CAD
Type
Digital without CAD
Data are provided from the larger BCSC population including all screening
mammograms (5.2 million mammograms) for the indicated time period.
Research Original Investigation Digital Screening Mammog
CMS 보험 혜택
5%
83%
74%
91. Diagnostic accuracy was not improved with CAD
on any performance metric assessed
w/ CAD w/o CAD
sensitivity 85.3% 87.3%
sensitivity for invasive cancer 82.1% 85.0%
sensitivity for DCIS 93.2% 94.3%
specificity 91.6% 91.4%
Detection Rate (Overall) 4.1 per 1000 4.1 per 1000
Detection Rate in DCIS 1.2 per 1000 0.9 per 1000
<
<
<
>
92. From the ROC analysis, the accuracy of mammographic
interpretations with CAD was significantly lower than for
those without CAD (P = .002). The normalized partial area
under the summary ROC curve was 0.84 for interpretations
with CAD and 0.88 for interpretations without CAD
(Figure 2). In this subset of 135 radi
at least 1 mammogram associated
sensitivity of mammography was
86.9%) with and 89.3%% (95% CI
CAD. Specificity of mammograp
90.4%-91.8%) with and 91.3% (95%
out CAD.
Differences by Age, Breast Density, Men
and Time Since Last Mammogram
We found no differences in diagnos
graphic interpretations with and w
subgroups assessed, including pat
menopausal status, and time si
(Table 3).
Intraradiologist Performance Measures f
With and Without CAD
Among 107 radiologists who interpr
with and without CAD, intraradiolog
improved with CAD, and CAD was a
sensitivity. Sensitivity of mammogr
81.0%-85.6%) with and 89.6% (95%
out CAD. Specificity of mammogra
89.8%-91.7%) with and 89.6% (95%
out CAD. The OR for specificity b
interpreted with CAD and those inte
the same radiologist was 1.02 (95% C
was significantly decreased for ma
Figure 2. Receiver Operating Characteristic Curves for Digital Screening
Mammography With and Without the Use of CAD, Estimated
From 135 Radiologists Who Interpreted at Least 1 Examination
Associated With Cancer
100
80
60
40
20
0
0 403020
True-PositiveRate,%
False-Positive Rate, %
10
No CAD use (PAUC, 0.88)
CAD use (PAUC, 0.84)
Each circle represents the true-positive or false-positive rate for a single
radiologist, for examinations interpreted with (orange) or without (blue)
computer-aided detection (CAD). Circle size is proportional to the number of
mammograms associated with cancer interpreted by that radiologist with or
without CAD. PAUC indicates partial area under the curve.
DCIS, ductal carcinoma in situ; exam, examination.
a
Odds ratio for CAD vs No CAD adjusted for site, age group, race/ethnicity, time
since prior mammogram, and calendar year of the examination using
with CAD use.
b
The 95% CIs for sensitivity and specificity are
The accuracy of mammographic interpretations with CAD
was significantly lower than for those without CAD (P = .002)
93. SPECIAL CONTRIBUTION
J Korean Med Assoc 2018 December; 61(12):765-775
pISSN 1975-8456 / eISSN 2093-5951
https://doi.org/10.5124/jkma.2018.61.12.765
첨단 디지털 의료기기 765
첨단 디지털 헬스케어 의료기기를 진료에 도입할 때
평가원칙
박 성 호1
・도 경 현1
・최 준 일2
・심 정 석3
・양 달 모4
・어 홍5
・우 현 식6
・이 정 민7
・정 승 은2
・오 주 형8
| 1
울산대학교 의과대학 서
울아산병원 영상의학과, 2
가톨릭대학교 의과대학 서울성모병원 영상의학과, 3
위드심의원, 4
강동경희대학교병원 영상의학과, 5
삼성서울병원
영상의학과, 6
서울대학교 의과대학 서울특별시보라매병원 영상의학과, 7
서울대학교 의과대학 서울대학교병원 영상의학과, 8
경희대학교 의
과대학 경희의료원 영상의학과
Principles for evaluating the clinical
implementation of novel digital healthcare devices
Seong Ho Park, MD1
· Kyung-Hyun Do, MD1
· Joon-Il Choi, MD2
· Jung Suk Sim, MD3
· Dal Mo Yang, MD4
· Hong Eo, MD5
· Hyunsik Woo,
MD6
· Jeong Min Lee, MD7
· Seung Eun Jung, MD2
· Joo Hyeong Oh, MD8
1
Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul;
2
Department of Radiology, Seoul St. Mary's Hospital, The Catholic University of Korea College of Medicine, Seoul; 3
Withsim Clinic,
Seongnam; 4
Department of Radiology, Kyung Hee University Hospital at Gangdong, Seoul; 5
Department of Radiology and Center
for Imaging Science, Samsung Medical Center, Seoul; 6
Department of Radiology, SMG-SNU Boramae Medical Center, Seoul National
University College of Medicine, Seoul; 7
Department of Radiology, Seoul National University Hospital, Seoul National University College
of Medicine, Seoul; 8
Department of Radiology, Kyung Hee University Hospital, Kyung Hee University College of Medicine, Seoul,
Korea
With growing interest in novel digital healthcare devices, such as artificial intelligence (AI) software for medical
diagnosis and prediction, and their potential impacts on healthcare, discussions have taken place regarding
the regulatory approval, coverage, and clinical implementation of these devices. Despite their potential, ‘digital
exceptionalism’ (i.e., skipping the rigorous clinical validation of such digital tools) is creating significant concerns
for patients and healthcare stakeholders. This white paper presents the positions of the Korean Society of
Radiology, a leader in medical imaging and digital medicine, on the clinical validation, regulatory approval,
coverage decisions, and clinical implementation of novel digital healthcare devices, especially AI software
for medical diagnosis and prediction, and explains the scientific principles underlying those positions. Mere
regulatory approval by the Food and Drug Administration of Korea, the United States, or other countries should
be distinguished from coverage decisions and widespread clinical implementation, as regulatory approval only
indicates that a digital tool is allowed for use in patients, not that the device is beneficial or recommended
for patient care. Coverage or widespread clinical adoption of AI software tools should require a thorough
clinical validation of safety, high accuracy proven by robust external validation, documented benefits for patient
outcomes, and cost-effectiveness. The Korean Society of Radiology puts patients first when considering novel
digital healthcare tools, and as an impartial professional organization that follows scientific principles and
evidence, strives to provide correct information to the public, make reasonable policy suggestions, and build
collaborative partnerships with industry and government for the good of our patients.
Key Words:Software validation; Device approval; Insurance coverage; Artificial intelligence
REVIEWSANDCOMMENTARYnREVIEW
Radiology: Volume 000: Number 0— 2018 n radiology.rsna.org 1
1
From the Department of Radiology and Research Institute
of Radiology, University of Ulsan College of Medicine, Asan
Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul
05505, South Korea (S.H.P.); and Department of Radiology,
Research Institute of Radiological Science, Yonsei
University College of Medicine, Seoul, South Korea (K.H.).
Received August 11, 2017; revision requested October 2;
revision received October 12; accepted October 24; final
version accepted November 3. Address correspondence
to S.H.P. (e-mail: parksh.radiology@gmail.com).
S.H.P. supported by the Industrial Strategic Technology
Development Program (grant 10072064) funded by the
Ministry of Trade, Industry and Energy.
q
RSNA, 2018
The use of artificial intelligence in medicine is currently an
issue of great interest, especially with regard to the diag-
nostic or predictive analysis of medical images. Adoption
of an artificial intelligence tool in clinical practice requires
careful confirmation of its clinical utility. Herein, the au-
thors explain key methodology points involved in a clinical
evaluation of artificial intelligence technology for use in
medicine, especially high-dimensional or overparameter-
ized diagnostic or predictive models in which artificial
deep neural networks are used, mainly from the stand-
points of clinical epidemiology and biostatistics. First, sta-
tistical methods for assessing the discrimination and cali-
bration performances of a diagnostic or predictive model
are summarized. Next, the effects of disease manifesta-
tion spectrum and disease prevalence on the performance
results are explained, followed by a discussion of the dif-
ference between evaluating the performance with use of
internal and external datasets, the importance of using
an adequate external dataset obtained from a well-de-
fined clinical cohort to avoid overestimating the clinical
performance as a result of overfitting in high-dimensional
or overparameterized classification model and spectrum
bias, and the essentials for achieving a more robust clini-
cal evaluation. Finally, the authors review the role of clin-
ical trials and observational outcome studies for ultimate
clinical verification of diagnostic or predictive artificial in-
telligence tools through patient outcomes, beyond perfor-
mance metrics, and how to design such studies.
q
RSNA, 2018
Seong Ho Park, MD, PhD
Kyunghwa Han, PhD
Methodologic Guide for
Evaluating Clinical Performance
and Effect of Artificial Intelligence
Technology for Medical Diagnosis
and Prediction1
This copy is for personal use only. To order printed copies, contact reprints@rsna.org
94. l % m
q t ! ()/ ) "
q t7 t t7
q !A:E ORed W c G T Xj ' ?V Td"
luq % %
q t 2 t7 t7
q t t7
q t t7
x w m% m x
100. 4<
•검사 행위 별 가산료
• 기존 수가에 가산료를 지급하는 방안
• ‘영상의학과 전문의가 판독하는 경우 vs 다른 의사가 판독하는 경우’와 유사
•간접 보상
• 의료기관 전체에 대한 간접적인 보상(의료기관 인증, 요양급여적정성 평가...)
• 일부 환자는 이득, 일부 환자는 피해를 보는 경우 (ex. Brain CT의 판독 prioritization)
•별도 행위 신설
• 기존 검사에서 제공하지 않던 완전히 새로운 진단 정보를 제공하는 경우
• 대응되는 기존의 급여/비급여 검사가 있는 경우 or 없는 경우
•의사 업무량의 ‘일부’에 해당하는 수가 신설
• 기존의 ‘판독료’는 영상의학과 의사의 ‘전체 판독 프로세스’의 일부에 불과
• AI가 일부 업무만을 수행하는 경우, 그에 맞는 적절한 보상 규모 산정
101. FM 7 F R cM Q M M QPUOM 7QbUOQ
l t h i
q ! "
q $ JO< $ $
q $ KRE<
lC Q 6Q 0 m
102. FM 7 F R cM Q M M QPUOM 7QbUOQ
의료기기
엑스레이 기기
혈압계
디지털 헬스케어
핏빗
런키퍼
슬립싸이클
디지털
치료제
페어 알킬리
엠페티카
얼라이브코어
프로테우스
인공지능
*SaMD: Software as a Medical Device
SaMD*
복잡한
데이터
의료
영상 시그널
영상의학
병리
안과
피부
3 m30 Z Ne
103. lh i0 4PM UbQ ?QM ZUZS
q KRE< m ! T VU"n t
q 9A !Vi& v "
q ORed W c G T Xj2 '
q 2 ecfV R Rc ' WR dV R Rc v
q 2 ' ' eV UVU W c fdV
x w m
104. •Initial premarket review 시에, 향후 수정에 대한 관리 계획을 제출
•SaMD Pre-Specification (SPS)
• 제조사가 예상/계획하는 출시 이후 performance / input / intended use 상의 변화
• 최초 기기에서 향후 변화할 범주 (region of potential changes)를 정의
•Algorithm Change Protocol (ACP)
• SPS에서 정의된 변화에 대한 risk 를 컨트롤하기 위한 구체적인 방법
• 변화 이후에 safety/effectiveness 유지 되도록 data/procedure 에 대해 단계별 기술
C QPQ Q YUZQP OTMZSQ O Z MZ
105. 46C!9 X c eY ;YR XV Hc e T "general overview of the components of an ACP. This is "how" the algorithm will learn and change while
remaining safe and effective.
Scope and limitations for establishing SPS and ACP: The FDA acknowledges that the types of changes
that could be pre-specified in a SPS and managed through an ACP may necessitate individual
Figure 4: Algorithm Change Protocol components
Y h eYV R X c eY h VRc R U TYR XV hY V cV R X dRWV R U VWWVTe gV&
106. m FCF 46Cm x m3
modifications guidance results in either 1) submission of a new 510(k) for premarket review or 2)
documentation of the modification and the analysis in the risk management and 510(k) files. If, for
AI/ML SaMD with an approved SPS and ACP, modifications are within the bounds of the SPS and the
ACP, this proposed framework suggests that manufacturers would document the change in their change
history and other appropriate records, and file for reference, similar to the “document” approach
outlined in the software modifications guidance.
Figure 5: Approach to modifications to previously approved SaMD with SPS and ACP. This flowchart should only be
considered in conjunction with the accompanying text in this white paper.
107. l y 4PbQ M UM 4 MOW m
q $ t
q ! v "
q ! " u $
q J Sfde Vdd gd& HVcW c R TV2
h yi m
Science 2019
108. l y 4PbQ M UM 4 MOW m
q $ t
q ! v "
q ! " u $
q J Sfde Vdd gd& HVcW c R TV2
h yi m
Science 2019
INSIGHTS | POLICY FORUM
case of structured data such as billing codes,
adversarial techniques could be used to au-
tomate the discovery of code combinations
that maximize reimbursement or minimize
the probability of claims rejection.
Because adversarial attacks have been
demonstrated for virtually every class of ma-
chine-learning algorithms ever studied, from
simple and readily interpretable methods
such as logistic regression to more compli-
cated methods such as deep neural networks
(1), this is not a problem specific to medicine,
and every domain of machine-learning ap-
plication will need to contend with it. Re-
searchers have sought to develop algorithms
that are resilient to adversarial attacks, such
as by training algorithms with exposure to
adversarial examples or using clever data
processing to mitigate potential tampering
(1). Early efforts in this area are promising,
and we hope that the pursuit of fully robust
machine-learning models will catalyze the
development of algorithms that learn to
make decisions for consistently explainable
and appropriate reasons. Nevertheless, cur-
rent general-use defensive techniques come
at a material degeneration of accuracy, even
if sometimes at improved explainability (10).
Thus, the models that are both highly accu-
rate and robust to adversarial examples re-
main an open problem in computer science.
These challenges are compounded in the
medical context. Medical information tech-
nology (IT) systems are notoriously difficult
to update, so any new defenses could be diffi-
cult to roll out. In addition, the ground truth
in medical diagnoses is often ambiguous,
meaning that for many cases no individual
human can definitively assign the true label At the extreme of this tactical shaping of mends billing for codes corresponding to
Benign
Malignant
Model confidence
Benign
Malignant
Model confidence
The patient has a history of
back pain and chronic alcohol
abuse and more recently has
been seen in several...
277.7 Metabolic syndrome
429.9 Heart disease, unspecified
278.00 Obesity, unspecified
401.0 Benign essential hypertension
272.0 Hypercholesterolemia
272.2 Hyperglyceridemia
429.9 Heart disease,unspecified
278.00 Obesity,unspecified
The patient has a history of
lumbago and chronic alcohol
dependence and more recently
has been seen in several...
Dermatoscopic image of a benign
melanocytic nevus, along with the
diagnostic probability computed
by a deep neural network.
Perturbation computed
by a common adversarial
attack technique.
See (7) for details.
Combined image of nevus and
attack perturbation and the
diagnostic probabilities from
the same deep neural network.
Original image
Diagnosis: Benign Diagnosis: Malignant
Opioid abuse risk: High
Reimbursement: Denied Reimbursement: Approved
Opioid abuse risk: Low
Adversarial noise
Adversarial
rotation (8)
Adversarial example
Adversarial
text substitution (9)
Adversarial
coding (13)
+ 0.04 =
The anatomy of an adversarial attack
Demonstration of how adversarial attacks against various medical AI systems might be
executed without requiring any overtly fraudulent misrepresentation of the data.
onMarhttp://science.sciencemag.org/Downloadedfrom
123. puted as new data became avail
when his or her score crossed t
dation set, the AUC obtained f
0.81 to 0.85) (Fig. 2). At a spec
of 0.33], TREWScore achieved a s
a median of 28.2 hours (IQR, 10
Identification of patients b
A critical event in the developme
related organ dysfunction (seve
been shown to increase after th
more than two-thirds (68.8%) o
were identified before any sepsi
tients were identified a median
(Fig. 3B).
Comparison of TREWScore
Weevaluatedtheperformanceof
methods for the purpose of provid
use of TREWScore. We first com
to MEWS, a general metric used
of catastrophic deterioration (17
oped for tracking sepsis, MEWS
tion of patients at risk for severe
Fig. 2. ROC for detection of septic shock before onset in the validation
set. The ROC curve for TREWScore is shown in blue, with the ROC curve for
MEWS in red. The sensitivity and specificity performance of the routine
screening criteria is indicated by the purple dot. Normal 95% CIs are shown
for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate.
R E S E A R C H A R T I C L E
A targeted real-time early warning score (TREWScore)
for septic shock
AUC=0.83
At a specificity of 0.67,TREWScore achieved a sensitivity of 0.85
and identified patients a median of 28.2 hours before onset.
124. q • t !9$ :
q92 =OK ++(
$ E=OK (+
q:2 =OK , /
$ E=OK ,(
(source: VUNO)
APPH(Alarms Per Patients Per Hour)
(source: VUNO)
Less False Alarm
’
125. ARTICLE OPEN
Scalable and accurate deep learning with electronic health
records
Alvin Rajkomar 1,2
, Eyal Oren1
, Kai Chen1
, Andrew M. Dai1
, Nissan Hajaj1
, Michaela Hardt1
, Peter J. Liu1
, Xiaobing Liu1
, Jake Marcus1
,
Mimi Sun1
, Patrik Sundberg1
, Hector Yee1
, Kun Zhang1
, Yi Zhang1
, Gerardo Flores1
, Gavin E. Duggan1
, Jamie Irvine1
, Quoc Le1
,
Kurt Litsch1
, Alexander Mossin1
, Justin Tansuwan1
, De Wang1
, James Wexler1
, Jimbo Wilson1
, Dana Ludwig2
, Samuel L. Volchenboum3
,
Katherine Chou1
, Michael Pearson1
, Srinivasan Madabushi1
, Nigam H. Shah4
, Atul J. Butte2
, Michael D. Howell1
, Claire Cui1
,
Greg S. Corrado1
and Jeffrey Dean1
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare
quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR
data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation
of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that
deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple
centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic
medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR
data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for
tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day
unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge
diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases.
We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case
study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the
patient’s chart.
npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1
INTRODUCTION
The promise of digital medicine stems in part from the hope that,
by digitizing health data, we might more easily leverage computer
information systems to understand and improve care. In fact,
routinely collected patient healthcare data are now approaching
the genomic scale in volume and complexity.1
Unfortunately,
most of this information is not yet used in the sorts of predictive
statistical models clinicians might use to improve care delivery. It
is widely suspected that use of such efforts, if successful, could
provide major benefits not only for patient safety and quality but
also in reducing healthcare costs.2–6
In spite of the richness and potential of available data, scaling
the development of predictive models is difficult because, for
traditional predictive modeling techniques, each outcome to be
predicted requires the creation of a custom dataset with specific
variables.7
It is widely held that 80% of the effort in an analytic
model is preprocessing, merging, customizing, and cleaning
nurses, and other providers are included. Traditional modeling
approaches have dealt with this complexity simply by choosing a
very limited number of commonly collected variables to consider.7
This is problematic because the resulting models may produce
imprecise predictions: false-positive predictions can overwhelm
physicians, nurses, and other providers with false alarms and
concomitant alert fatigue,10
which the Joint Commission identified
as a national patient safety priority in 2014.11
False-negative
predictions can miss significant numbers of clinically important
events, leading to poor clinical outcomes.11,12
Incorporating the
entire EHR, including clinicians’ free-text notes, offers some hope
of overcoming these shortcomings but is unwieldy for most
predictive modeling techniques.
Recent developments in deep learning and artificial neural
networks may allow us to address many of these challenges and
unlock the information in the EHR. Deep learning emerged as the
preferred machine learning approach in machine perception
www.nature.com/npjdigitalmed
128. Table 3 List of variants identified as actionable by 3 different platforms
Gene Variant
Identified variant Identified associated drugs
NYGC WGA FO NYGC WGA FO
CDKN2A Deletion Yes Yes Yes Palbociclib, LY2835219
LEE001
Palbociclib LY2835219 Clinical trial
CDKN2B Deletion Yes Yes Yes Palbociclib, LY2835219
LEE002
Palbociclib LY2835219 Clinical trial
EGFR Gain (whole arm) Yes — — Cetuximab — —
ERG Missense P114Q Yes Yes — RI-EIP RI-EIP —
FGFR3 Missense L49V Yes VUS — TK-1258 — —
MET Amplification Yes Yes Yes INC280 Crizotinib, cabozantinib Crizotinib, cabozantinib
MET Frame shift R755fs Yes — — INC280 — —
MET Exon skipping Yes — — INC280 — —
NF1 Deletion Yes — — MEK162 — —
NF1 Nonsense R461* Yes Yes Yes MEK162 MEK162, cobimetinib,
trametinib, GDC-0994
Everolimus, temsirolimus,
trametinib
PIK3R1 Insertion
R562_M563insI
Yes Yes — BKM120 BKM120, LY3023414 —
PTEN Loss (whole arm) Yes — — Everolimus, AZD2014 — —
STAG2 Frame shift R1012 fs Yes Yes Yes Veliparib, clinical trial Olaparib —
DNMT3A Splice site 2083-1G.C — — Yes — — —
TERT Promoter-146C.T Yes — Yes — — —
ABL2 Missense D716N Germline NA VUS
mTOR Missense H1687R Germline NA VUS
NPM1 Missense E169D Germline NA VUS
NTRK1 Missense G18E Germline NA VUS
PTCH1 Missense P1250R Germline NA VUS
TSC1 Missense G1035S Germline NA VUS
Abbreviations: FO 5 FoundationOne; NYGC 5 New York Genome Center; RNA-seq 5 RNA sequencing; WGA 5 Watson Genomic Analytics; WGS 5 whole-
genome sequencing.
Genes, variant description, and, where appropriate, candidate clinically relevant drugs are listed. Variants identified by the FO as variants of uncertain
significance (VUS) were identified by the NYGC as germline variants.
129. v !
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
4 5
o p m
q 2 ((
q 92 !-((
q :2 ! (
q 2 !)0 $ ,
q 2 NMFG
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
137. Supervised autonomous robotic soft tissue surgery
;Y UcV nd FRe R @VR eY
KjdeV z
K Rce L ddfV 9fe fd
J S e !KL9J
Azad Shademan et al. Sci Transl Med 2016
138. Azad Shademan et al. Sci Transl Med 2016
Supervised autonomous robotic soft tissue surgery
139. Azad Shademan et al. Sci Transl Med 2016
l8d bUb UZ bUb QZP QZP FG4E
q t !GH=F
q…x ! RaRc dT aj$ D9H
q J9K
l
q v !daRT X
q t …x t ! VR acVddfcV
q ! f SVc W deR Vd
q v !T a Ve e V
q x ! f V cVUfTe
suture placement compared to other techniques (table S1). Moreover,
leak pressure reflects the functional quality of suturing. The linear
closure from STAR was able to withstand a higher average leak pressure
than all other techniques (Fig. 2B).
suturing tool maneuvers before piercing. Using the NIRF markers as
reference points, the plan interpolated intermediate suture placements
on the bowel and adjusted placement of each suture, knot, and corner
slidetoaccommodatedeformationsandinducedscenerotations(Fig.1F).
Fig. 2. Ex vivo linear suturing under deformations. The experiment con-
sisted of closing a longitudinal cut along pig intestine, whereas the tissue
was deformed by pulling on stay sutures. Five samples were tested per tech-
nique (OPEN, LAP, RAS, and STAR). (A) Suture spacing. Central mark is the
median; box edges are the 25th and 75th percentiles, error bars are the range
excluding outliers, and red dots are outliers. The whiskers represent the range
not including outliers. There is a different N number for each boxplot because
eachsurgeonuseda different number of sutures [OPEN (n =174), LAP(n= 128),
RAS (n = 176), and STAR (n = 206)]. These data are presented numerically in
table S2, including the SDs. P values determined by ANOVA with post hoc
Games-Howell. (B and C) Leak pressures and number of mistakes (reposi-
tioned stitches or robot reboot). Data are from individual tissue samples
(n = 5) with averages marked by a horizontal line. P values determined by
independent samples t test. (D) Completion times separated into knot-tying
and suturing, and other time was spent restaging or changing sutures. Data
are averages (n = 5). P values determined by independent samples t test.
www.ScienceTranslationalMedicine.org 4 May 2016 Vol 8 Issue 337 337ra64 3
nMay7,2016
140. Azad Shademan et al. Sci Transl Med 2016
maining circumference (Fig. 3
ing that different levels of auto
be used effectively for differ
Overall, 57.8% of the procedu
fully autonomously with no a
Alternatively, in the current s
autonomous mode without an
teraction would require suture
in 42.2% of sutures placed, m
corners. The completion tim
also included supervisory ac
surgeon, which accounted f
the total time (7% for suture
justment, 3.3% for confirmati
location, and 2.6% for mistake
In vivo end-to-end anast
Finally, we performed in vivo
autonomous surgery in pig in
cessed through a laparotomy
(n = 4) and compared these a
an OPEN control (n = 1) (fig
used the same suture algorith
ex vivo trials (Fig. 1, G and
OPEN control, the surgeon us
surgical hand tools to open th
exposed the intestine, and sutu
a transverse incision. The av
STAR procedure time was 50.0
where 77.4% was anastomos
22.6% was restaging time be
and front walls, which inclu
2.16 min for marking the tis
B and E, and Table 1). Al
OPEN timewasonly 8min,th
was comparable to the averag
laparoscopic anastomoses that
30 min for vesicourethral (25
for aortic (26), to 90 min for
constructions (27).
No complications were obs
Fig. 3. End-to-end anastomosis ex vivo. The experiment
consisted of closing a transverse cut in pig intestine. Five
samples were tested per technique (OPEN, LAP, RAS, and STAR).
(A) Suture spacing. Central mark is the median; box edges are
the 25th and 75th percentiles; and red dots are outliers. The
whiskers represent the range not including outliers. There is a
different N number for each boxplot because each surgeon
used a different number of sutures [OPEN (n = 138), LAP (n =
98), RAS (n = 132), and STAR (n = 180)]. The average spacing
betweenconsecutive sutures was calculated and compared be-
tween STAR and other modalities. The variance of suture
spacing is presented numerically in table S2, including the SD.
P values determined by ANOVA with post hoc Games-Howell. (B) Exvivo end-to-end anastomosis leak pressures.
Dataareindividualtissuesamples,withmeansdisplayedashorizontallines(n=4to5).Onesample was sutured
closed and thus could not be tested for leak pressure. P values determined by independent samples t test.
(C) The leak pressure as a function of maximum suture spacing. Data are individual tissue samples that were fit
to a rational function (y = 0.854/x) (n = 4 to 5). (D) Number of mistakes (repositioned stitches or robot reboot).
Data are individual tissue samples with means displayed as horizontal lines (n = 5). P values determined by
independent samples t test. (E) Ex vivo end-to-end anastomosis completion times. Average times for n = 5
tissue samples per procedure are divided into subtasks of knots and running sutures. “Other” time was spent
restaging and changing sutures. Pvalues determined byindependent samplesttest.(F) Percentreductionin
l8d bUb UZ bUb QZP QZP FG4E
q t !GH=F
q…x ! RaRc dT aj$ D9H
q J9K
l
q v !daRT X
q t …x t ! VR acVddfcV
q ! f SVc W deR Vd
q v !T a Ve e V
q x ! f V cVUfTe
146. • linguistic
• identification and extraction of
word instances (unigrams) and
word-pair instances (bi-grams)
from the transcriptions
• acoustic
• vocal dynamics
• voice quality
• vocal tract resonance frequencies
• pause lengths
A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
• “Do you have hope?”
• “Do you have any fear?”
• “Do you have any secrets?”
• “Are you angry?”
• “Does it hurt emotionally?”
Pestian, Suicide and Life-Threatening Behavior, 2016
147. A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
SensitivitySensitivity
1.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
SUICIDE THOUGHT MARKERS
SensitivitySensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally ill (middle), and
SensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally
suicide versus mentally ill with control. The ROC curves for adolescents (blue), adults (red), and a
generated where the nonsuicidal population is controls (top), mentally ill (middle), and mentally
using linguistic and acoustic features. The gray line is the AROC curve for a baseline (random) cla
TABLE 2
The AROC for the Machine Learning Algorithm. The Nonsuicidal Group Comprises of Either Mentally Ill and Control Subjects. Classification
Performances are Shown for Adolescents, Adults, and the Combined Adolescent and Adult Cohorts
Suicidal versus Controls Suicidal versus Mentally Ill Suicidal versus Mentally Ill and Controls
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Linguistics 0.87 (0.04) 0.91 (0.02) 0.93 (0.02) 0.82 (0.05) 0.77 (0.04) 0.79 (0.03) 0.82 (0.04) 0.84 (0.03) 0.87 (0.02)
Acoustics 0.74 (0.05) 0.82 (0.03) 0.79 (0.03) 0.69 (0.06) 0.74 (0.04) 0.76 (0.03) 0.74 (0.05) 0.80 (0.03) 0.76 (0.03)
Linguistics +
Acoustics
0.83 (0.05) 0.93 (0.02) 0.92 (0.02) 0.80 (0.05) 0.77 (0.04) 0.82 (0.03) 0.81 (0.04) 0.84 (0.03) 0.87 (0.02)
PESTIANETAL.
Suicidal vs. Control Suicidal vs. Mentally Ill Suicidal vs. Mentally Ill and Controls
adolescents
adults
Pestian, Suicide and Life-Threatening Behavior, 2016
148. l
q $ $ w
q aY X TR WVRefcV
qY XY V VcXj h cU2 x ' t
q ,(.
q )
l/ QOU U Z m
149. Detection of the Prodromal Phase of Bipolar Disorder from
Psychological and Phonological Aspects in Social Media
Yen-Hao Huang
National Tsing Hua University
Hsinchu, Taiwan
yenhao0218@gmail.com
Lin-Hung Wei
National Tsing Hua University
Hsinchu, Taiwan
adeline80916@gmail.com
Yi-Shin Chen
National Tsing Hua University
Hsinchu, Taiwan
yishin@gmail.com
ABSTRACT
Seven out of ten people with bipolar disorder are initially
misdiagnosed and thirty percent of individuals with bipolar
disorder will commit suicide. Identifying the early phases of
the disorder is one of the key components for reducing the
full development of the disorder. In this study, we aim at
leveraging the data from social media to design predictive
models, which utilize the psychological and phonological fea-
tures, to determine the onset period of bipolar disorder and
provide insights on its prodrome. This study makes these dis-
coveries possible by employing a novel data collection process,
coined as Time-specific Subconscious Crowdsourcing, which
helps collect a reliable dataset that supplements diagnosis
information from people suffering from bipolar disorder. Our
experimental results demonstrate that the proposed models
could greatly contribute to the regular assessments of people
with bipolar disorder, which is important in the primary care
setting.
KEYWORDS
Bipolar Disorder Detection, Mental Disorder, Prodromal
Phrase, Emotion Analysis, Sentiment Analysis, Phonology,
Social Media
1 INTRODUCTION
Bipolar disorder (BD) is a common mental illness charac-
terized by recurrent episodes of mania/hypomania and de-
pression, which is found among all ages, races, ethnic groups
and social classes. The regular assessment of people with
BD is an important part of its treatment, though it may be
very time-consuming [21]. There are many beneficial treat-
ments for the patients, particularly for delaying relapses. The
identification of early symptoms is significant for allowing
early intervention and reducing the multiple adverse conse-
quences of a full-blown episode. Despite the importance of
the detection of prodromal symptoms, there are very few
studies that have actually examined the ability of relatives to
detect these symptoms in BD patients. [20] For the purpose
of early treatment, the challenge leads to: how to identify
the prodrome period of BD. Current studies are thus
aimed at detecting prodromes and analyzing the prodromal
symptoms of manic recurrence in clinics.
With regards to the symptom of social isolation, people
are increasingly turning to popular social media, such as
Facebook and Twitter, to share their illness experiences or
seek advice from others with similar mental health conditions.
As the information is being shared in public, people are
subconsciously providing rich contents about their states
of mind. In this paper, we refer to this sharing and data
collection as time-specific subconscious crowdsourcing.
In this study, we carefully look at patients who have been
diagnosed with BD and who explicitly indicate the diagnosis
and time of diagnosis on Twitter. Our goal is to both predict
whether BD rises on a given period of time, and to discover
the prodromal period for BD. It’s important to clarify that
our goal doesn’t seek to offer a diagnosis but rather to make
a prediction of which users are likely to be suffering from the
BD. The main contributions of our work are:
• Introducing the concept of time-specific subconscious
crowdsourcing, which can aid in locating the social
network behavior data of BD patients with the corre-
sponding time of diagnosis.
• A BD assessment mechanism that differentiates be-
tween prodromal symptoms and acute symptoms.
• Introducing the phonological features into the assess-
ment mechanism, which allows for the possibility to
assess patients through text only.
• An automatic recognition approach that detects the
possible prodromal period for BD.
2 RELATED WORK
Social media resources have been widely utilized by researchers
to study mental health issues. The following literature em-
phasizes on data collection and feature engineering, including
subject recruitment, manual data collection, data collection
applications, keyword matching, and combined approaches.
The clinical approach for mental disorders and prodrome
studies are also discussed in this section.
Subject recruitment: Based on customized question-
naires and contact with subjects, Park et al. [15] recruited
participants for the Center for Epidemiologic Studies Depres-
sion scale(CES-D) [17] and provided their Twitter data. By
analyzing the information contained in tweets, participants
were divided into normal and depressive groups based on
their scores on CES-D. An approach like this one requires ex-
pensive costs to acquire data and conduct the questionnaire.
Manual and automatic data collecting: Moreno et
al. [14] collected data via the Facebook profiles of college stu-
dents reviewed by two investigators. They aimed at revealing
the relationship between demographic factors and depression.
Similarly, in our work, we invest on manual efforts to collect
and properly annotate our dataset. In addition, there are
many applications built on top of social networks that provide
free services where users may need to input their credentials
arXiv:1712.09183v1[cs.IR]26Dec2017