5. 최윤섭 지음
의료인공지능
표지디자인•최승협
컴퓨터공학, 생명과학, 의학의 융합을 통해 디지
털 헬스케어 분야의 혁신을 창출하고 사회적 가
치를 만드는 것을 화두로 삼고 있는 융합생명과학자, 미래의료학자,
기업가, 엔젤투자가, 에반젤리스트이다. 국내 디지털 헬스케어 분야
의 대표적인 전문가로, 활발한 연구, 저술 및 강연 등을 통해 국내에
이 분야를 처음 소개한 장본인이다.
포항공과대학교에서 컴퓨터공학과 생명과학을 복수전공하였으며
동 대학원 시스템생명공학부에서 전산생물학으로 이학박사 학위를
취득하였다. 스탠퍼드대학교 방문연구원, 서울의대 암연구소 연구
조교수, KT 종합기술원 컨버전스연구소 팀장, 서울대병원 의생명연
구원 연구조교수 등을 거쳤다. 『사이언스』를 비롯한 세계적인 과학
저널에 10여 편의 논문을 발표했다.
국내 최초로 디지털 헬스케어를 본격적으로 연구하는 연구소인 ‘최
윤섭 디지털 헬스케어 연구소’를 설립하여 소장을 맡고 있다. 또한
국내 유일의 헬스케어 스타트업 전문 엑셀러레이터 ‘디지털 헬스케
어 파트너스’의 공동 창업자 및 대표 파트너로 혁신적인 헬스케어
스타트업을 의료 전문가들과 함께 발굴, 투자, 육성하고 있다. 성균
관대학교 디지털헬스학과 초빙교수로도 재직 중이다.
뷰노, 직토, 3billion, 서지컬마인드, 닥터다이어리, VRAD, 메디히어,
소울링, 메디히어, 모바일닥터 등의 헬스케어 스타트업에 투자하고
자문을 맡아 한국에서도 헬스케어 혁신을 만들어내기 위해 노력하
고 있다. 국내 최초의 디지털 헬스케어 전문 블로그 『최윤섭의 헬스
케어 이노베이션』에 활발하게 집필하고 있으며, 『매일경제』에 칼럼
을 연재하고 있다. 저서로 『헬스케어 이노베이션: 이미 시작된 미래』
와 『그렇게 나는 스스로 기업이 되었다』가 있다.
•블로그_ http://www.yoonsupchoi.com/
•페이스북_ https://www.facebook.com/yoonsup.choi
•이메일_ yoonsup.choi@gmail.com
최윤섭
있다. 의료 인공지능의 빠른 발전과
들이 이해하기가 어려우며, 어디서부
과 적용, 그리고 의사와의 관계를 쉽
이 될 의학도와 젊은 의료인에게 유용
않는 사람은 거의 없다. 하지만 인공
별이다. 흔히 생각하는 만병통치약 같
능의 개발, 활용 및 가능성을 균형 있
역에 도전할 인공지능 연구자 모두에
이후 변하지 않은 현재의 의학 교육
한계를 절실히 느낀다. 저와 함께 의
미래 지향적 안목이 담긴 책이다. 인공
하는 학생과 학부모에게 추천한다.
하고 있다. 이 책은 다양한 사례와 깊
각을 제공하여, 인공지능이 의료에 본
상화된 10년 후 돌아보았을 때, 이 책
기대한다.
요하다. 단순히 인간의 일을 대신하는
이다. 따라서 인공지능을 균형있게 이
필요하다. 세계적으로 일어나고 있는
고 다양한 생각거리까지 주는 책이다.
근거에 기반하여 설득력 있게 제시하
최윤섭지음
의료인공지능
값 20,000원
ISBN 979-11-86269-99-2
미래의료학자 최윤섭 박사가 제시하는
의료 인공지능의 현재와 미래
의료 딥러닝과 IBM 왓슨의 현주소
인공지능은 의사를 대체하는가
값 20,000원
ISBN 979-11-86269-99-2
소울링, 메디히어, 모바일닥터 등의 헬스케어 스타트업에 투자하고
자문을 맡아 한국에서도 헬스케어 혁신을 만들어내기 위해 노력하
고 있다. 국내 최초의 디지털 헬스케어 전문 블로그 『최윤섭의 헬스
케어 이노베이션』에 활발하게 집필하고 있으며, 『매일경제』에 칼럼
을 연재하고 있다. 저서로 『헬스케어 이노베이션: 이미 시작된 미래』
와 『그렇게 나는 스스로 기업이 되었다』가 있다.
•블로그_ http://www.yoonsupchoi.com/
•페이스북_ https://www.facebook.com/yoonsup.choi
•이메일_ yoonsup.choi@gmail.com
6. 의료 인공지능
•1부: 제 2의 기계시대와 의료 인공지능
•2부: 의료 인공지능의 과거와 현재
•3부: 미래를 어떻게 맞이할 것인가
7. 의료 인공지능
•1부: 제 2의 기계시대와 의료 인공지능
•2부: 의료 인공지능의 과거와 현재
•3부: 미래를 어떻게 맞이할 것인가
19. • AP 통신: 로봇이 인간 대신 기사를 작성
• 초당 2,000 개의 기사 작성 가능
• 기존에 300개 기업의 실적 ➞ 3,000 개 기업을 커버
20. • 1978
• As part of the obscure task of “discovery” —
providing documents relevant to a lawsuit — the
studios examined six million documents at a
cost of more than $2.2 million, much of it to pay
for a platoon of lawyers and paralegals who
worked for months at high hourly rates.
• 2011
• Now, thanks to advances in artificial intelligence,
“e-discovery” software can analyze documents
in a fraction of the time for a fraction of the
cost.
• In January, for example, Blackstone Discovery of
Palo Alto, Calif., helped analyze 1.5 million
documents for less than $100,000.
21. “At its height back in 2000, the U.S. cash equities trading desk at
Goldman Sachs’s New York headquarters employed 600 traders,
buying and selling stock on the orders of the investment bank’s
large clients. Today there are just two equity traders left”
22. • 일본의 Fukoku 생명보험에서는 보험금 지급 여부를 심사
하는 사람을 30명 이상 해고하고, IBM Watson Explorer
에게 맡기기로 결정
• 의료 기록을 바탕으로 Watson이 보험금 지급 여부를 판단
• 인공지능으로 교체하여 생산성을 30% 향상
• 2년 안에 ROI 가 나올 것이라고 예상
• 1년차: 140m yen
• 2년차: 200m yen
27. • 약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
• 강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
• 초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
28.
29. 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100
90%
50%
10%
PT-AI
AGI
EETNTOP100 Combined
언제쯤 기계가 인간 수준의 지능을 획득할 것인가?
Philosophy and Theory of AI (2011)
Artificial General Intelligence (2012)
Greek Association for Artificial Intelligence
Survey of most frequently cited 100 authors (2013)
Combined
응답자
누적 비율
Superintelligence, Nick Bostrom (2014)
30. Superintelligence: Science of fiction?
Panelists: Elon Musk (Tesla, SpaceX), Bart Selman (Cornell), Ray Kurzweil (Google),
David Chalmers (NYU), Nick Bostrom(FHI), Demis Hassabis (Deep Mind), Stuart
Russell (Berkeley), Sam Harris, and Jaan Tallinn (CSER/FLI)
January 6-8, 2017, Asilomar, CA
https://brunch.co.kr/@kakao-it/49
https://www.youtube.com/watch?v=h0962biiZa4
31. Superintelligence: Science of fiction?
Panelists: Elon Musk (Tesla, SpaceX), Bart Selman (Cornell), Ray Kurzweil (Google),
David Chalmers (NYU), Nick Bostrom(FHI), Demis Hassabis (Deep Mind), Stuart
Russell (Berkeley), Sam Harris, and Jaan Tallinn (CSER/FLI)
January 6-8, 2017, Asilomar, CA
Q: 초인공지능이란 영역은 도달 가능한 것인가?
Q: 초지능을 가진 개체의 출현이 가능할 것이라고 생각하는가?
Table 1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
YES YES YES YES YES YES YES YES YES
Table 1-1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
YES YES YES YES YES YES YES YES YES
Q: 초지능의 실현이 일어나기를 희망하는가?
Table 1-1-1
Elon Musk Start Russell Bart Selman Ray Kurzweil David Chalmers Nick Bostrom DemisHassabis Sam Harris Jaan Tallinn
Complicated Complicated Complicated YES Complicated YES YES Complicated Complicated
https://brunch.co.kr/@kakao-it/49
https://www.youtube.com/watch?v=h0962biiZa4
32. • 약한 인공 지능 (Artificial Narrow Intelligence)
• 특정 방면에서 잘하는 인공지능
• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전
• 강한 인공 지능 (Artificial General Intelligence)
• 모든 방면에서 인간 급의 인공 지능
• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습
• 초 인공 지능 (Artificial Super Intelligence)
• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능
• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크
33.
34.
35.
36.
37.
38.
39.
40. 의료 인공지능
•1부: 제 2의 기계시대와 의료 인공지능
•2부: 의료 인공지능의 과거와 현재
•3부: 미래를 어떻게 맞이할 것인가
41. •복잡한 의료 데이터의 분석 및 insight 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
42. •복잡한 의료 데이터의 분석 및 insight 도출
•영상 의료/병리 데이터의 분석/판독
•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
44. 600,000 pieces of medical evidence
2 million pages of text from 42 medical journals and clinical trials
69 guidelines, 61,540 clinical trials
IBM Watson on Medicine
Watson learned...
+
1,500 lung cancer cases
physician notes, lab results and clinical research
+
14,700 hours of hands-on training
45.
46.
47.
48. 메이요 클리닉 협력
(임상 시험 매칭)
전남대병원
도입
인도 마니팔 병원
WFO 도입
식약처 인공지능
가이드라인 초안
메드트로닉과
혈당관리 앱 시연
2011 2012 2013 2014 2015
뉴욕 MSK암센터 협력
(폐암)
MD앤더슨 협력
(백혈병)
MD앤더슨
파일럿 결과 발표
@ASCO
왓슨 펀드,
웰톡에 투자
뉴욕게놈센터 협력
(교모세포종 분석)
GeneMD,
왓슨 모바일 디벨로퍼
챌린지 우승
클리블랜드 클리닉 협력
(암 유전체 분석)
한국 IBM
왓슨 사업부 신설
Watson Health 출범
피텔, 익스플로리스 인수
J&J, 애플, 메드트로닉 협력
에픽 시스템즈, 메이요클리닉
제휴 (EHR 분석)
동경대 도입
( WFO)
왓슨 펀드,
모더나이징 메디슨
투자
학계/의료계
산업계
패쓰웨이 지노믹스 OME
클로즈드 알파 서비스 시작
트루븐 헬스
인수
애플 리서치 키트
통한 수면 연구 시작
2017
가천대
길병원
도입
메드트로닉
Sugar.IQ 출시
제약사
테바와 제휴
태국 범룽랏 국제 병원,
WFO 도입
머지
헬스케어
인수
2016
언더 아머 제휴
브로드 연구소 협력 발표
(유전체 분석-항암제 내성)
마니팔 병원의
WFO 정확성 발표
대구가톨릭병원
대구동산병원
도입
부산대병원
도입
왓슨 펀드,
패쓰웨이 지노믹스
투자
제퍼디! 우승
조선대병원
도입
한국 왓슨
컨소시움 출범
쥬피터
메디컬
센터
도입
식약처 인공지능
가이드라인
메이요 클리닉
임상시험매칭
결과발표
2018
건양대병원
도입
IBM Watson Health Chronicle
WFO
최초 논문
49. 메이요 클리닉 협력
(임상 시험 매칭)
전남대병원
도입
인도 마니팔 병원
WFO 도입
식약처 인공지능
가이드라인 초안
메드트로닉과
혈당관리 앱 시연
2011 2012 2013 2014 2015
뉴욕 MSK암센터 협력
(폐암)
MD앤더슨 협력
(백혈병)
MD앤더슨
파일럿 결과 발표
@ASCO
왓슨 펀드,
웰톡에 투자
뉴욕게놈센터 협력
(교모세포종 분석)
GeneMD,
왓슨 모바일 디벨로퍼
챌린지 우승
클리블랜드 클리닉 협력
(암 유전체 분석)
한국 IBM
왓슨 사업부 신설
Watson Health 출범
피텔, 익스플로리스 인수
J&J, 애플, 메드트로닉 협력
에픽 시스템즈, 메이요클리닉
제휴 (EHR 분석)
동경대 도입
( WFO)
왓슨 펀드,
모더나이징 메디슨
투자
학계/의료계
산업계
패쓰웨이 지노믹스 OME
클로즈드 알파 서비스 시작
트루븐 헬스
인수
애플 리서치 키트
통한 수면 연구 시작
2017
가천대
길병원
도입
메드트로닉
Sugar.IQ 출시
제약사
테바와 제휴
태국 범룽랏 국제 병원,
WFO 도입
머지
헬스케어
인수
2016
언더 아머 제휴
브로드 연구소 협력 발표
(유전체 분석-항암제 내성)
마니팔 병원의
WFO 정확성 발표
대구가톨릭병원
대구동산병원
도입
부산대병원
도입
왓슨 펀드,
패쓰웨이 지노믹스
투자
제퍼디! 우승
조선대병원
도입
한국 왓슨
컨소시움 출범
쥬피터
메디컬
센터
도입
식약처 인공지능
가이드라인
2018
건양대병원
도입
메이요 클리닉
임상시험매칭
결과발표
WFO
최초 논문
IBM Watson Health Chronicle
50.
51. Annals of Oncology (2016) 27 (suppl_9): ix179-ix180. 10.1093/annonc/mdw601
Validation study to assess performance of IBM cognitive
computing system Watson for oncology with Manipal
multidisciplinary tumour board for 1000 consecutive cases:
An Indian experience
•인도 마니팔 병원의 1,000명의 암환자 에 대해 의사와 WFO의 권고안의 ‘일치율’을 비
•유방암 638명, 대장암 126명, 직장암 124명, 폐암 112명
•의사-왓슨 일치율
•추천(50%), 고려(28%), 비추천(17%)
•의사의 진료안 중 5%는 왓슨의 권고안으로 제시되지 않음
•일치율이 암의 종류마다 달랐음
•대장암(85%), 폐암 (17.8%)
•삼중음성 유방암(67.9%), HER2 음성 유방암 (35%)
52. San Antonio Breast Cancer Symposium—December 6-10, 2016
Concordance WFO (@T2) and MMDT (@T1* v. T2**)
(N= 638 Breast Cancer Cases)
Time Point
/Concordance
REC REC + FC
n % n %
T1* 296 46 463 73
T2** 381 60 574 90
This presentation is the intellectual property of the author/presenter.Contact somusp@yahoo.com for permission to reprint and/or distribute.26
* T1 Time of original treatment decision by MMDT in the past (last 1-3 years)
** T2 Time (2016) of WFO’s treatment advice and of MMDT’s treatment decision upon blinded re-review of non-concordant
cases
53. WFO in ASCO 2017
• Early experience with IBM WFO cognitive computing system for lung
and colorectal cancer treatment (마니팔 병원)
• 지난 3년간: lung cancer(112), colon cancer(126), rectum cancer(124)
• lung cancer: localized 88.9%, meta 97.9%
• colon cancer: localized 85.5%, meta 76.6%
• rectum cancer: localized 96.8%, meta 80.6%
Performance of WFO in India
2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527)
54. WFO in ASCO 2017
•가천대 길병원의 대장암과 위암 환자에 왓슨 적용 결과
• 대장암 환자(stage II-IV) 340명
• 진행성 위암 환자 185명 (Retrospective)
• 의사와의 일치율
• 대장암 환자: 73%
• 보조 (adjuvant) 항암치료를 받은 250명: 85%
• 전이성 환자 90명: 40%
• 위암 환자: 49%
• Trastzumab/FOLFOX 가 국민 건강 보험 수가를 받지 못함
• S-1(tegafur, gimeracil and oteracil)+cisplatin):
• 국내는 매우 루틴; 미국에서는 X
55. ORIGINAL ARTICLE
Watson for Oncology and breast cancer treatment
recommendations: agreement with an expert
multidisciplinary tumor board
S. P. Somashekhar1*, M.-J. Sepu´lveda2
, S. Puglielli3
, A. D. Norden3
, E. H. Shortliffe4
, C. Rohit Kumar1
,
A. Rauthan1
, N. Arun Kumar1
, P. Patil1
, K. Rhee3
& Y. Ramya1
1
Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2
IBM Research (Retired), Yorktown Heights; 3
Watson Health, IBM Corporation,
Cambridge; 4
Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA
*Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka,
India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com
Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug
approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to
help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment
recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer.
Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the
Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in
2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not
agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered
concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO.
Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases.
Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III
disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02;
P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status
was not found to affect concordance.
Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases
examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not.
This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment
decision making, especially at centers where expert breast cancer resources are limited.
Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer,
concordance, multidisciplinary tumor board
Introduction
Oncologists who treat breast cancer are challenged by a large and
rapidly expanding knowledge base [1, 2]. As of October 2017, for
example, there were 69 FDA-approved drugs for the treatment of
breast cancer, not including combination treatment regimens
[3]. The growth of massive genetic and clinical databases, along
with computing systems to exploit them, will accelerate the speed
of breast cancer treatment advances and shorten the cycle time
for changes to breast cancer treatment guidelines [4, 5]. In add-
ition, these information management challenges in cancer care
are occurring in a practice environment where there is little time
available for tracking and accessing relevant information at the
point of care [6]. For example, a study that surveyed 1117 oncolo-
gists reported that on average 4.6 h per week were spent keeping
VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
All rights reserved. For permissions, please email: journals.permissions@oup.com.
Annals of Oncology 29: 418–423, 2018
doi:10.1093/annonc/mdx781
Published online 9 January 2018
Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689
by guest
•Annals of Oncology, 2018 January
•Peer-reviewed Journal 에 출판된 최초의&유일한 WFO 정확성 관련 논문
•IBM 최고의료책임자 Dr.Kyu Rhee 등이 저자에 포함
56. ORIGINAL ARTICLE
Watson for Oncology and breast cancer treatment
recommendations: agreement with an expert
multidisciplinary tumor board
S. P. Somashekhar1*, M.-J. Sepu´lveda2
, S. Puglielli3
, A. D. Norden3
, E. H. Shortliffe4
, C. Rohit Kumar1
,
A. Rauthan1
, N. Arun Kumar1
, P. Patil1
, K. Rhee3
& Y. Ramya1
1
Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2
IBM Research (Retired), Yorktown Heights; 3
Watson Health, IBM Corporation,
Cambridge; 4
Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA
*Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka,
India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com
Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug
approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to
help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment
recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer.
Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the
Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in
2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not
agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered
concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO.
Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases.
Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III
disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02;
P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status
was not found to affect concordance.
Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases
examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not.
This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment
decision making, especially at centers where expert breast cancer resources are limited.
Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer,
concordance, multidisciplinary tumor board
Introduction
Oncologists who treat breast cancer are challenged by a large and
rapidly expanding knowledge base [1, 2]. As of October 2017, for
example, there were 69 FDA-approved drugs for the treatment of
breast cancer, not including combination treatment regimens
[3]. The growth of massive genetic and clinical databases, along
with computing systems to exploit them, will accelerate the speed
of breast cancer treatment advances and shorten the cycle time
for changes to breast cancer treatment guidelines [4, 5]. In add-
ition, these information management challenges in cancer care
are occurring in a practice environment where there is little time
available for tracking and accessing relevant information at the
point of care [6]. For example, a study that surveyed 1117 oncolo-
gists reported that on average 4.6 h per week were spent keeping
VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
All rights reserved. For permissions, please email: journals.permissions@oup.com.
Annals of Oncology 29: 418–423, 2018
doi:10.1093/annonc/mdx781
Published online 9 January 2018
Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689
by guest
Table 2. MMDT and WFO recommendations after the initial and blinded second reviews
Review of breast cancer cases (N 5 638) Concordant cases, n (%) Non-concordant cases, n (%)
Recommended For consideration Total Not recommended Not available Total
Initial review (T1MMDT versus T2WFO) 296 (46) 167 (26) 463 (73) 137 (21) 38 (6) 175 (27)
Second review (T2MMDT versus T2WFO) 397 (62) 194 (30) 591 (93) 36 (5) 11 (2) 47 (7)
T1MMDT, original MMDT recommendation from 2014 to 2016; T2WFO, WFO advisor treatment recommendation in 2016; T2MMDT, MMDT treatment recom-
mendation in 2016; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology.
31%
18%
1% 2% 33%
5% 31%
6%
0% 10% 20%
Not available Not recommended RecommendedFor consideration
30% 40% 50% 60% 70% 80% 90% 100%
8% 25% 61%
64%
64%
29% 51%
62%
Concordance, 93%
Concordance, 80%
Concordance, 97%
Concordance, 95%
Concordance, 86%
2%
2%
Overall
(n=638)
Stage I
(n=61)
Stage II
(n=262)
Stage III
(n=191)
Stage IV
(n=124)
5%
Figure 1. Treatment concordance between WFO and the MMDT overall and by stage. MMDT, Manipal multidisciplinary tumor board; WFO,
Watson for Oncology.
5%Non-metastatic
HR(+)HER2/neu(+)Triple(–)
Metastatic
Non-metastatic
Metastatic
Non-metastatic
Metastatic
10%
1%
2%
1% 5% 20%
20%10%
0%
Not applicable Not recommended For consideration Recommended
20% 40% 60% 80% 100%
5%
74%
65%
34% 64%
5% 38% 56%
15% 20% 55%
36% 59%
Concordance, 95%
Concordance, 75%
Concordance, 94%
Concordance, 98%
Concordance, 94%
Concordance, 85%
Figure 2. Treatment concordance between WFO and the MMDT by stage and receptor status. HER2/neu, human epidermal growth factor
receptor 2; HR, hormone receptor; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology.
Annals of Oncology Original article
57. 잠정적 결론
•왓슨 포 온콜로지와 의사의 일치율:
•암종별로 다르다.
•같은 암종에서도 병기별로 다르다.
•같은 암종에 대해서도 병원별/국가별로 다르다.
•시간이 흐름에 따라 달라질 가능성이 있다.
58. 원칙이 필요하다
•어떤 환자의 경우, 왓슨에게 의견을 물을 것인가?
•왓슨을 (암종별로) 얼마나 신뢰할 것인가?
•왓슨의 의견을 환자에게 공개할 것인가?
•왓슨과 의료진의 판단이 다른 경우 어떻게 할 것인가?
•왓슨에게 보험 급여를 매길 수 있는가?
이러한 기준에 따라 의료의 질/치료효과가 달라질 수 있으나,
현재 개별 병원이 개별적인 기준으로 활용하게 됨
59. Empowering the Oncology Community for Cancer Care
Genomics
Oncology
Clinical
Trial
Matching
Watson Health’s oncology clients span more than 35 hospital systems
“Empowering the Oncology Community
for Cancer Care”
Andrew Norden, KOTRA Conference, March 2017, “The Future of Health is Cognitive”
60. • 복잡한 의료 데이터의 분석 및 insight 도출
• 영상 의료/병리 데이터의 분석/판독
• 연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
63. 페이스북의 딥페이스
Taigman,Y. et al. (2014). DeepFace: Closing the Gap to Human-Level Performance in FaceVerification, CVPR’14.
Figure 2. Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three
locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million
parameters, where more than 95% come from the local and fully connected layers.
very few parameters. These layers merely expand the input
into a set of simple local features.
The subsequent layers (L4, L5 and L6) are instead lo-
cally connected [13, 16], like a convolutional layer they ap-
ply a filter bank, but every location in the feature map learns
a different set of filters. Since different regions of an aligned
image have different local statistics, the spatial stationarity
The goal of training is to maximize the probability of
the correct class (face id). We achieve this by minimiz-
ing the cross-entropy loss for each training sample. If k
is the index of the true label for a given input, the loss is:
L = log pk. The loss is minimized over the parameters
by computing the gradient of L w.r.t. the parameters and
Human: 95% vs. DeepFace in Facebook: 97.35%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
64. Schroff, F. et al. (2015). FaceNet:A Unified Embedding for Face Recognition and Clustering
Human: 95% vs. FaceNet of Google: 99.63%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
False accept
False reject
s. This shows all pairs of images that were
on LFW. Only eight of the 13 errors shown
he other four are mislabeled in LFW.
on Youtube Faces DB
ge similarity of all pairs of the first one
our face detector detects in each video.
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
lead to truly amazing results. Figure 7 shows one cluster in
a users personal photo collection, generated using agglom-
erative clustering. It is a clear showcase of the incredible
invariance to occlusion, lighting, pose and even age.
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
sification. Our end-to-end training both simplifies the setup
and shows that directly optimizing a loss relevant to the task
at hand improves performance.
Another strength of our model is that it only requires
False accept
False reject
Figure 6. LFW errors. This shows all pairs of images that were
incorrectly classified on LFW. Only eight of the 13 errors shown
here are actual errors the other four are mislabeled in LFW.
5.7. Performance on Youtube Faces DB
We use the average similarity of all pairs of the first one
hundred frames that our face detector detects in each video.
This gives us a classification accuracy of 95.12%±0.39.
Using the first one thousand frames results in 95.18%.
Compared to [17] 91.4% who also evaluate one hundred
frames per video we reduce the error rate by almost half.
DeepId2+ [15] achieved 93.2% and our method reduces this
error by 30%, comparable to our improvement on LFW.
5.8. Face Clustering
Our compact embedding lends itself to be used in order
to cluster a users personal photos into groups of people with
the same identity. The constraints in assignment imposed
by clustering faces, compared to the pure verification task,
Figure 7. Face Clustering. Shown is an exemplar cluster for one
user. All these images in the users personal photo collection were
clustered together.
6. Summary
We provide a method to directly learn an embedding into
an Euclidean space for face verification. This sets it apart
from other methods [15, 17] who use the CNN bottleneck
layer, or require additional post-processing such as concate-
nation of multiple models and PCA, as well as SVM clas-
구글의 페이스넷
65. 바이두의 얼굴 인식 인공지능
Jingtuo Liu (2015) Targeting Ultimate Accuracy: Face Recognition via Deep Embedding
Human: 95% vs.Baidu: 99.77%
Recognition Accuracy for Labeled Faces in the Wild (LFW) dataset (13,233 images, 5,749 people)
3
Although several algorithms have achieved nearly perfect
accuracy in the 6000-pair verification task, a more practical
can achieve 95.8% identification rate, relatively reducing the
error rate by about 77%.
TABLE 3. COMPARISONS WITH OTHER METHODS ON SEVERAL EVALUATION TASKS
Score = -0.060 (pair #113) Score = -0.022 (pair #202) Score = -0.034 (pair #656)
Score = -0.031 (pair #1230) Score = -0.073 (pair #1862) Score = -0.091(pair #2499)
Score = -0.024 (pair #2551) Score = -0.036 (pair #2552) Score = -0.089 (pair #2610)
Method
Performance on tasks
Pair-wise
Accuracy(%)
Rank-1(%)
DIR(%) @
FAR =1%
Verification(%
)@ FAR=0.1%
Open-set
Identification(%
)@ Rank =
1,FAR = 0.1%
IDL Ensemble
Model
99.77 98.03 95.8 99.41 92.09
IDL Single Model 99.68 97.60 94.12 99.11 89.08
FaceNet[12] 99.63 NA NA NA NA
DeepID3[9] 99.53 96.00 81.40 NA NA
Face++[2] 99.50 NA NA NA NA
Facebook[15] 98.37 82.5 61.9 NA NA
Learning from
Scratch[4]
97.73 NA NA 80.26 28.90
HighDimLBP[10] 95.17 NA NA
41.66(reported
in [4])
18.07(reported
in [4])
• 6,000쌍의 얼굴 사진 중에 바이두의 인공지능은 불과 14쌍만을 잘못 판단
• 알고 보니 이 14쌍 중의 5쌍의 사진은 오히려 정답에 오류가 있었고,
실제로는 인공지능이 정확 (red box)
67. •손 엑스레이 영상을 판독하여 환자의 골연령 (뼈 나이)를 계산해주는 인공지능
• 기존에 의사는 그룰리히-파일(Greulich-Pyle)법 등으로 표준 사진과 엑스레이를 비교하여 판독
• 인공지능은 참조표준영상에서 성별/나이별 패턴을 찾아서 유사성을 확률로 표시 + 표준 영상 검색
•의사가 성조숙증이나 저성장을 진단하는데 도움을 줄 수 있음
68. - 1 -
보 도 자 료
국내에서 개발한 인공지능(AI) 기반 의료기기 첫 허가
- 인공지능 기술 활용하여 뼈 나이 판독한다 -
식품의약품안전처 처장 류영진 는 국내 의료기기업체 주 뷰노가
개발한 인공지능 기술이 적용된 의료영상분석장치소프트웨어
뷰노메드 본에이지 를 월 일 허가했다고
밝혔습니다
이번에 허가된 뷰노메드 본에이지 는 인공지능 이 엑스레이 영상을
분석하여 환자의 뼈 나이를 제시하고 의사가 제시된 정보 등으로
성조숙증이나 저성장을 진단하는데 도움을 주는 소프트웨어입니다
그동안 의사가 환자의 왼쪽 손 엑스레이 영상을 참조표준영상
과 비교하면서 수동으로 뼈 나이를 판독하던 것을 자동화하여
판독시간을 단축하였습니다
이번 허가 제품은 년 월부터 빅데이터 및 인공지능 기술이
적용된 의료기기의 허가 심사 가이드라인 적용 대상으로 선정되어
임상시험 설계에서 허가까지 맞춤 지원하였습니다
뷰노메드 본에이지 는 환자 왼쪽 손 엑스레이 영상을 분석하여 의
료인이 환자 뼈 나이를 판단하는데 도움을 주기 위한 목적으로
허가되었습니다
- 2 -
분석은 인공지능이 촬영된 엑스레이 영상의 패턴을 인식하여 성별
남자 개 여자 개 로 분류된 뼈 나이 모델 참조표준영상에서
성별 나이별 패턴을 찾아 유사성을 확률로 표시하면 의사가 확률값
호르몬 수치 등의 정보를 종합하여 성조숙증이나 저성장을 진단합
니다
임상시험을 통해 제품 정확도 성능 를 평가한 결과 의사가 판단한
뼈 나이와 비교했을 때 평균 개월 차이가 있었으며 제조업체가
해당 제품 인공지능이 스스로 인지 학습할 수 있도록 영상자료를
주기적으로 업데이트하여 의사와의 오차를 좁혀나갈 수 있도록
설계되었습니다
인공지능 기반 의료기기 임상시험계획 승인건수는 이번에 허가받은
뷰노메드 본에이지 를 포함하여 현재까지 건입니다
임상시험이 승인된 인공지능 기반 의료기기는 자기공명영상으로
뇌경색 유형을 분류하는 소프트웨어 건 엑스레이 영상을 통해
폐결절 진단을 도와주는 소프트웨어 건 입니다
참고로 식약처는 인공지능 가상현실 프린팅 등 차 산업과
관련된 의료기기 신속한 개발을 지원하기 위하여 제품 연구 개발부터
임상시험 허가에 이르기까지 전 과정을 맞춤 지원하는 차세대
프로젝트 신개발 의료기기 허가도우미 등을 운영하고 있
습니다
식약처는 이번 제품 허가를 통해 개개인의 뼈 나이를 신속하게
분석 판정하는데 도움을 줄 수 있을 것이라며 앞으로도 첨단 의료기기
개발이 활성화될 수 있도록 적극적으로 지원해 나갈 것이라고
밝혔습니다
71. 40
50
60
70
80
인공지능 의사 A 의사 B
69.5%
63%
49.5%
정확도(%)
영상의학과 펠로우
(소아영상 세부전공)
영상의학과
2년차 전공의
인공지능 vs 의사
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
• 총 환자의 수: 200명
• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)
• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)
• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스
• 인공지능: VUNO의 골연령 판독 딥러닝
골연령 판독에 인간 의사와 인공지능의 시너지 효과
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
72. 40
50
60
70
80
인공지능 의사 A 의사 B
40
50
60
70
80
의사 A
+ 인공지능
의사 B
+ 인공지능
69.5%
63%
49.5%
72.5%
57.5%
정확도(%)
영상의학과 펠로우
(소아영상 세부전공)
영상의학과
2년차 전공의
인공지능 vs 의사 인공지능 + 의사
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
• 총 환자의 수: 200명
• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)
• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)
• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스
• 인공지능: VUNO의 골연령 판독 딥러닝
골연령 판독에 인간 의사와 인공지능의 시너지 효과
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
73. 총 판독 시간 (m)
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
의사 A 의사 B
골연령 판독에서 인공지능을 활용하면
판독 시간의 절감도 가능
• 총 환자의 수: 200명
• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)
• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)
• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스
• 인공지능: VUNO의 골연령 판독 딥러닝
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
77. Copyright 2016 American Medical Association. All rights reserved.
Development and Validation of a Deep Learning Algorithm
for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs
Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD;
Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB;
Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD
IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to
program itself by learning from a large set of examples that demonstrate the desired
behavior, removing the need to specify rules explicitly. Application of these methods to
medical imaging requires further assessment and validation.
OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic
retinopathy and diabetic macular edema in retinal fundus photographs.
DESIGN AND SETTING A specific type of neural network optimized for image classification
called a deep convolutional neural network was trained using a retrospective development
data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy,
diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists
and ophthalmology senior residents between May and December 2015. The resultant
algorithm was validated in January and February 2016 using 2 separate data sets, both
graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
EXPOSURE Deep learning–trained algorithm.
MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting
referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy,
referable diabetic macular edema, or both, were generated based on the reference standard
of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2
operating points selected from the development set, one selected for high specificity and
another for high sensitivity.
RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4
years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the
Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women;
prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm
hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and
0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh
specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity
was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%-
91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint
withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and
specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%.
CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults
with diabetes, an algorithm based on deep machine learning had high sensitivity and
specificity for detecting referable diabetic retinopathy. Further research is necessary to
determine the feasibility of applying this algorithm in the clinical setting and to determine
whether use of the algorithm could lead to improved care and outcomes compared with
current ophthalmologic assessment.
JAMA. doi:10.1001/jama.2016.17216
Published online November 29, 2016.
Editorial
Supplemental content
Author Affiliations: Google Inc,
Mountain View, California (Gulshan,
Peng, Coram, Stumpe, Wu,
Narayanaswamy, Venugopalan,
Widner, Madams, Nelson, Webster);
Department of Computer Science,
University of Texas, Austin
(Venugopalan); EyePACS LLC,
San Jose, California (Cuadros); School
of Optometry, Vision Science
Graduate Group, University of
California, Berkeley (Cuadros);
Aravind Medical Research
Foundation, Aravind Eye Care
System, Madurai, India (Kim); Shri
Bhagwan Mahavir Vitreoretinal
Services, Sankara Nethralaya,
Chennai, Tamil Nadu, India (Raman);
Verily Life Sciences, Mountain View,
California (Mega); Cardiovascular
Division, Department of Medicine,
Brigham and Women’s Hospital and
Harvard Medical School, Boston,
Massachusetts (Mega).
Corresponding Author: Lily Peng,
MD, PhD, Google Research, 1600
Amphitheatre Way, Mountain View,
CA 94043 (lhpeng@google.com).
Research
JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY
(Reprinted) E1
Copyright 2016 American Medical Association. All rights reserved.
세계 최고의 의학 저널에 발표
78. 안저 판독 인공지능의 개발
• CNN으로 후향적으로 128,175개의 안저 이미지 학습
• 미국의 안과전문의 54명이 3-7회 판독한 데이터
• 우수한 안과전문의들 7-8명의 판독 결과와 인공지능의 판독 결과 비교
• EyePACS-1 (9,963 개), Messidor-2 (1,748 개)a) Fullscreen mode
b) Hit reset to reload this image. This will reset all of the grading.
c) Comment box for other pathologies you see
eFigure 2. Screenshot of the Second Screen of the Grading Tool, Which Asks Graders to Assess the
Image for DR, DME and Other Notable Conditions or Findings
79. • EyePACS-1 과 Messidor-2 의 AUC = 0.991, 0.990
• 7-8명의 안과 전문의와 민감도와 특이도가 동일한 수준
• F-score: 0.95 (vs. 인간 의사는 0.91)
Additional sensitivity analyses were conducted for sev- effects of data set size on algorithm performance were exam-
Figure 2. Validation Set Performance for Referable Diabetic Retinopathy
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
EyePACS-1: AUC, 99.1%; 95% CI, 98.8%-99.3%A
100
High-sensitivity operating point
High-specificity operating point
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,% 1 – Specificity, %
20
Messidor-2: AUC, 99.0%; 95% CI, 98.6%-99.5%B
100
High-specificity operating point
High-sensitivity operating point
Performance of the algorithm (black curve) and ophthalmologists (colored
circles) for the presence of referable diabetic retinopathy (moderate or worse
diabetic retinopathy or referable diabetic macular edema) on A, EyePACS-1
(8788 fully gradable images) and B, Messidor-2 (1745 fully gradable images).
The black diamonds on the graph correspond to the sensitivity and specificity of
the algorithm at the high-sensitivity and high-specificity operating points.
In A, for the high-sensitivity operating point, specificity was 93.4% (95% CI,
92.8%-94.0%) and sensitivity was 97.5% (95% CI, 95.8%-98.7%); for the
high-specificity operating point, specificity was 98.1% (95% CI, 97.8%-98.5%)
and sensitivity was 90.3% (95% CI, 87.5%-92.7%). In B, for the high-sensitivity
operating point, specificity was 93.9% (95% CI, 92.4%-95.3%) and sensitivity
was 96.1% (95% CI, 92.4%-98.3%); for the high-specificity operating point,
specificity was 98.5% (95% CI, 97.7%-99.1%) and sensitivity was 87.0% (95%
CI, 81.1%-91.0%). There were 8 ophthalmologists who graded EyePACS-1 and 7
ophthalmologists who graded Messidor-2. AUC indicates area under the
receiver operating characteristic curve.
Research Original Investigation Accuracy of a Deep Learning Algorithm for Detection of Diabetic Retinopathy
안저 판독 인공지능의 정확도
83. LETTERH
his task, the CNN achieves 72.1±0.9% (mean±s.d.) overall
he average of individual inference class accuracies) and two
gists attain 65.56% and 66.0% accuracy on a subset of the
set. Second, we validate the algorithm using a nine-class
rtition—the second-level nodes—so that the diseases of
have similar medical treatment plans. The CNN achieves
two trials, one using standard images and the other using
images, which reflect the two steps that a dermatologist m
to obtain a clinical impression. The same CNN is used for a
Figure 2b shows a few example images, demonstrating th
distinguishing between malignant and benign lesions, whic
visual features. Our comparison metrics are sensitivity an
Acral-lentiginous melanoma
Amelanotic melanoma
Lentigo melanoma
…
Blue nevus
Halo nevus
Mongolian spot
…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
Convolution
AvgPool
MaxPool
Concat
Dropout
Fully connected
Softmax
Deep CNN layout. Our classification technique is a
Data flow is from left to right: an image of a skin lesion
e, melanoma) is sequentially warped into a probability
over clinical classes of skin disease using Google Inception
hitecture pretrained on the ImageNet dataset (1.28 million
1,000 generic object classes) and fine-tuned on our own
29,450 skin lesions comprising 2,032 different diseases.
ning classes are defined using a novel taxonomy of skin disease
oning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melano
melanoma). Inference classes are more general and are comp
or more training classes (for example, malignant melanocytic
class of melanomas). The probability of an inference class is c
summing the probabilities of the training classes according to
structure (see Methods). Inception v3 CNN architecture repr
from https://research.googleblog.com/2016/03/train-your-ow
classifier-with.html
• 129,450개의 피부과 병변 이미지 데이터를 자체 제작
• 미국의 피부과 전문의 18명이 데이터 교정
• CNN (Inception v3)으로 이미지를 학습
• 피부과 전문의들 21명과 인공지능의 판독 결과 비교
• 표피세포 암 (keratinocyte carcinoma)과 지루각화증(benign seborrheic keratosis)의 구분
• 악성 흑색종과 양성 병변 구분 (표준 이미지 데이터 기반)
• 악성 흑색종과 양성 병변 구분 (더마토스코프로 찍은 이미지 기반)
피부암 판독 인공지능의 개발
84. 딥러닝과 피부과 전문의의
피부암 분류 정확도 LETTE
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
1
Specificity
Melanoma: 225 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
1
Specificity
Carcinoma: 707 images
1
Specificity
Melanoma: 1,010 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Algorithm: AUC = 0.96
Dermatologists (25)
Average dermatologist
Algorithm: AUC = 0.94
Dermatologists (22)
Average dermatologist
Algorithm: AUC = 0.91
Dermatologists (21)
Average dermatologist
21명 중에 인공지능보다 정확성이 떨어지는 피부과 전문의들이 상당수 있었음
피부과 전문의들의 평균 성적도 인공지능보다 좋지 않았음
85. Skin Cancer Image Classification (TensorFlow Dev Summit 2017)
Skin cancer classification performance of
the CNN and dermatologists.
https://www.youtube.com/watch?v=toK1OSLep3s&t=419s
87. A B DC
Benign without atypia / Atypic / DCIS (ductal carcinoma in situ) / Invasive Carcinoma
Interpretation?
Elmore etl al. JAMA 2015
Diagnostic Concordance Among Pathologists
유방암 병리 데이터 판독하기
88. Figure 4. Participating Pathologists’ Interpretations of Each of the 240 Breast Biopsy Test Cases
0 25 50 75 100
Interpretations, %
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
Case
Benign without atypia
72 Cases
2070 Total interpretations
A
0 25 50 75 100
Interpretations, %
218
220
222
224
226
228
230
232
234
236
238
240
Case
Invasive carcinoma
23 Cases
663 Total interpretations
D
0 25 50 75 100
Interpretations, %
147
145
149
151
153
155
157
159
161
163
165
167
169
171
173
175
177
179
181
183
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
Case
DCIS
73 Cases
2097 Total interpretations
C
0 25 50 75 100
Interpretations, %
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
136
138
140
142
144
Case
Atypia
72 Cases
2070 Total interpretations
B
Benign without atypia
Atypia
DCIS
Invasive carcinoma
Pathologist interpretation
DCIS indicates ductal carcinoma in situ.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
Elmore etl al. JAMA 2015
유방암 판독에 대한 병리학과 전문의들의 불일치도
89. Elmore etl al. JAMA 2015
•정확도: 75.3%
(정답은 경험이 많은 세 명의 병리학과 전문의가 협의를 통해 정하였음)
spentonthisactivitywas16(95%CI,15-17);43participantswere
awarded the maximum 20 hours.
Pathologists’ Diagnoses Compared With Consensus-Derived
Reference Diagnoses
The 115 participants each interpreted 60 cases, providing 6900
total individual interpretations for comparison with the con-
sensus-derived reference diagnoses (Figure 3). Participants
agreed with the consensus-derived reference diagnosis for
75.3% of the interpretations (95% CI, 73.4%-77.0%). Partici-
pants (n = 94) who completed the CME activity reported that
Patient and Pathologist Characteristics Associated With
Overinterpretation and Underinterpretation
The association of breast density with overall pathologists’
concordance (as well as both overinterpretation and under-
interpretation rates) was statistically significant, as shown
in Table 3 when comparing mammographic density grouped
into 2 categories (low density vs high density). The overall
concordance estimates also decreased consistently with
increasing breast density across all 4 Breast Imaging-
Reporting and Data System (BI-RADS) density categories:
BI-RADS A, 81% (95% CI, 75%-86%); BI-RADS B, 77% (95%
Figure 3. Comparison of 115 Participating Pathologists’ Interpretations vs the Consensus-Derived Reference
Diagnosis for 6900 Total Case Interpretationsa
Participating Pathologists’ Interpretation
ConsensusReference
Diagnosisb
Benign
without atypia Atypia DCIS
Invasive
carcinoma Total
Benign without atypia 1803 200 46 21 2070
Atypia 719 990 353 8 2070
DCIS 133 146 1764 54 2097
Invasive carcinoma 3 0 23 637 663
Total 2658 1336 2186 720 6900
DCIS indicates ductal carcinoma
in situ.
a
Concordance noted in 5194 of
6900 case interpretations or
75.3%.
b
Reference diagnosis was obtained
from consensus of 3 experienced
breast pathologists.
Diagnostic Concordance in Interpreting Breast Biopsies Original Investigation Research
총 240개의 병리 샘플에 대해서,
115명의 병리학과 전문의들이 판독한 총 6900건의 사례를 정답과 비교
유방암 판독에 대한 병리학과 전문의들의 불일치도
95. Clinical study on ISBI dataset
Error Rate
Pathologist in competition setting 3.5%
Pathologists in clinical practice (n = 12) 13% - 26%
Pathologists on micro-metastasis(small tumors) 23% - 42%
Beck Lab Deep Learning Model 0.65%
Beck Lab’s deep learning model now outperforms pathologist
Andrew Beck, Machine Learning for Healthcare, MIT 2017
96. 구글의 유방 병리 판독 인공지능
• The localization score(FROC) for the algorithm reached 89%, which significantly
exceeded the score of 73% for a pathologist with no time constraint.
97. 인공지능의 민감도 + 인간의 특이도
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
• 구글의 인공지능은 민감도에서 큰 개선 (92.9%, 88.5%)
•@8FP: FP를 8개까지 봐주면서, 달성할 수 있는 민감도
•FROC: FP를 슬라이드당 1/4, 1/2, 1, 2, 4, 8개를 허용한 민감도의 평균
•즉, FP를 조금 봐준다면, 인공지능은 매우 높은 민감도를 달성 가능
• 인간 병리학자는 민감도 73%에 반해, 특이도는 거의 100% 달성
•인간 병리학자와 인공지능 병리학자는 서로 잘하는 것이 다름
•양쪽이 협력하면 판독 효율성, 일관성, 민감도 등에서 개선 기대 가능
101. Fig 1. What can consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi
sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an
accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me
attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a
PLOS Medicine 2016
102. • 복잡한 의료 데이터의 분석 및 insight 도출
• 영상 의료/병리 데이터의 분석/판독
• 연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
107. S E P S I S
A targeted real-time early warning score (TREWScore)
for septic shock
Katharine E. Henry,1
David N. Hager,2
Peter J. Pronovost,3,4,5
Suchi Saria1,3,5,6
*
Sepsis is a leading cause of death in the United States, with mortality highest among patients who develop septic
shock. Early aggressive treatment decreases morbidity and mortality. Although automated screening tools can detect
patients currently experiencing severe sepsis and septic shock, none predict those at greatest risk of developing
shock. We analyzed routinely available physiological and laboratory data from intensive care unit patients and devel-
oped “TREWScore,” a targeted real-time early warning score that predicts which patients will develop septic shock.
TREWScore identified patients before the onset of septic shock with an area under the ROC (receiver operating
characteristic) curve (AUC) of 0.83 [95% confidence interval (CI), 0.81 to 0.85]. At a specificity of 0.67, TREWScore
achieved a sensitivity of 0.85 and identified patients a median of 28.2 [interquartile range (IQR), 10.6 to 94.2] hours
before onset. Of those identified, two-thirds were identified before any sepsis-related organ dysfunction. In compar-
ison, the Modified Early Warning Score, which has been used clinically for septic shock prediction, achieved a lower
AUC of 0.73 (95% CI, 0.71 to 0.76). A routine screening protocol based on the presence of two of the systemic inflam-
matory response syndrome criteria, suspicion of infection, and either hypotension or hyperlactatemia achieved a low-
er sensitivity of 0.74 at a comparable specificity of 0.64. Continuous sampling of data from the electronic health
records and calculation of TREWScore may allow clinicians to identify patients at risk for septic shock and provide
earlier interventions that would prevent or mitigate the associated morbidity and mortality.
INTRODUCTION
Seven hundred fifty thousand patients develop severe sepsis and septic
shock in the United States each year. More than half of them are
admitted to an intensive care unit (ICU), accounting for 10% of all
ICU admissions, 20 to 30% of hospital deaths, and $15.4 billion in an-
nual health care costs (1–3). Several studies have demonstrated that
morbidity, mortality, and length of stay are decreased when severe sep-
sis and septic shock are identified and treated early (4–8). In particular,
one study showed that mortality from septic shock increased by 7.6%
with every hour that treatment was delayed after the onset of hypo-
tension (9).
More recent studies comparing protocolized care, usual care, and
early goal-directed therapy (EGDT) for patients with septic shock sug-
gest that usual care is as effective as EGDT (10–12). Some have inter-
preted this to mean that usual care has improved over time and reflects
important aspects of EGDT, such as early antibiotics and early ag-
gressive fluid resuscitation (13). It is likely that continued early identi-
fication and treatment will further improve outcomes. However, the
Acute Physiology Score (SAPS II), SequentialOrgan Failure Assessment
(SOFA) scores, Modified Early Warning Score (MEWS), and Simple
Clinical Score (SCS) have been validated to assess illness severity and
risk of death among septic patients (14–17). Although these scores
are useful for predicting general deterioration or mortality, they typical-
ly cannot distinguish with high sensitivity and specificity which patients
are at highest risk of developing a specific acute condition.
The increased use of electronic health records (EHRs), which can be
queried in real time, has generated interest in automating tools that
identify patients at risk for septic shock (18–20). A number of “early
warning systems,” “track and trigger” initiatives, “listening applica-
tions,” and “sniffers” have been implemented to improve detection
andtimelinessof therapy forpatients with severe sepsis andseptic shock
(18, 20–23). Although these tools have been successful at detecting pa-
tients currently experiencing severe sepsis or septic shock, none predict
which patients are at highest risk of developing septic shock.
The adoption of the Affordable Care Act has added to the growing
excitement around predictive models derived from electronic health
R E S E A R C H A R T I C L E
onNovember3,2016http://stm.sciencemag.org/Downloadedfrom
108. puted as new data became avail
when his or her score crossed t
dation set, the AUC obtained f
0.81 to 0.85) (Fig. 2). At a spec
of 0.33], TREWScore achieved a s
a median of 28.2 hours (IQR, 10
Identification of patients b
A critical event in the developme
related organ dysfunction (seve
been shown to increase after th
more than two-thirds (68.8%) o
were identified before any sepsi
tients were identified a median
(Fig. 3B).
Comparison of TREWScore
Weevaluatedtheperformanceof
methods for the purpose of provid
use of TREWScore. We first com
to MEWS, a general metric used
of catastrophic deterioration (17)
oped for tracking sepsis, MEWS
tion of patients at risk for severe
Fig. 2. ROC for detection of septic shock before onset in the validation
set. The ROC curve for TREWScore is shown in blue, with the ROC curve for
MEWS in red. The sensitivity and specificity performance of the routine
screening criteria is indicated by the purple dot. Normal 95% CIs are shown
for TREWScore and MEWS. TPR, true-positive rate; FPR, false-positive rate.
R E S E A R C H A R T I C L E
A targeted real-time early warning score (TREWScore)
for septic shock
AUC=0.83
At a specificity of 0.67,TREWScore achieved a sensitivity of 0.85
and identified patients a median of 28.2 hours before onset.
109.
110. Sugar.IQ
사용자의 음식 섭취와 그에 따른 혈당 변화,
인슐린 주입 등의 과거 기록 기반
식후 사용자의 혈당이 어떻게 변화할지
Watson 이 예측
111. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
112. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
113. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
114. ADA 2017, San Diego, Courtesy of Taeho Kim (Seoul Medical Center)
115. •미국에서 아이폰 앱으로 출시
•사용이 얼마나 번거로울지가 관건
•어느 정도의 기간을 활용해야 효과가 있는가: 2주? 평생?
•Food logging 등을 어떻게 할 것인가?
•과금 방식도 아직 공개되지 않은듯
118. An Algorithm Based on Deep Learning for Predicting In-Hospital
Cardiac Arrest
Joon-myoung Kwon, MD;* Youngnam Lee, MS;* Yeha Lee, PhD; Seungwoo Lee, BS; Jinsik Park, MD, PhD
Background-—In-hospital cardiac arrest is a major burden to public health, which affects patient safety. Although traditional track-
and-trigger systems are used to predict cardiac arrest early, they have limitations, with low sensitivity and high false-alarm rates.
We propose a deep learning–based early warning system that shows higher performance than the existing track-and-trigger
systems.
Methods and Results-—This retrospective cohort study reviewed patients who were admitted to 2 hospitals from June 2010 to July
2017. A total of 52 131 patients were included. Specifically, a recurrent neural network was trained using data from June 2010 to
January 2017. The result was tested using the data from February to July 2017. The primary outcome was cardiac arrest, and the
secondary outcome was death without attempted resuscitation. As comparative measures, we used the area under the receiver
operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), and the net reclassification index.
Furthermore, we evaluated sensitivity while varying the number of alarms. The deep learning–based early warning system (AUROC:
0.850; AUPRC: 0.044) significantly outperformed a modified early warning score (AUROC: 0.603; AUPRC: 0.003), a random forest
algorithm (AUROC: 0.780; AUPRC: 0.014), and logistic regression (AUROC: 0.613; AUPRC: 0.007). Furthermore, the deep learning–
based early warning system reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the modified early warning
system, random forest, and logistic regression, respectively, at the same sensitivity.
Conclusions-—An algorithm based on deep learning had high sensitivity and a low false-alarm rate for detection of patients with
cardiac arrest in the multicenter study. (J Am Heart Assoc. 2018;7:e008678. DOI: 10.1161/JAHA.118.008678.)
Key Words: artificial intelligence • cardiac arrest • deep learning • machine learning • rapid response system • resuscitation
In-hospital cardiac arrest is a major burden to public health,
which affects patient safety.1–3
More than a half of cardiac
arrests result from respiratory failure or hypovolemic shock,
and 80% of patients with cardiac arrest show signs of
deterioration in the 8 hours before cardiac arrest.4–9
However,
209 000 in-hospital cardiac arrests occur in the United States
each year, and the survival discharge rate for patients with
cardiac arrest is <20% worldwide.10,11
Rapid response systems
(RRSs) have been introduced in many hospitals to detect
cardiac arrest using the track-and-trigger system (TTS).12,13
Two types of TTS are used in RRSs. For the single-parameter
TTS (SPTTS), cardiac arrest is predicted if any single vital sign
(eg, heart rate [HR], blood pressure) is out of the normal
range.14
The aggregated weighted TTS calculates a weighted
score for each vital sign and then finds patients with cardiac
arrest based on the sum of these scores.15
The modified early
warning score (MEWS) is one of the most widely used
approaches among all aggregated weighted TTSs (Table 1)16
;
however, traditional TTSs including MEWS have limitations, with
low sensitivity or high false-alarm rates.14,15,17
Sensitivity and
false-alarm rate interact: Increased sensitivity creates higher
false-alarm rates and vice versa.
Current RRSs suffer from low sensitivity or a high false-
alarm rate. An RRS was used for only 30% of patients before
unplanned intensive care unit admission and was not used for
22.8% of patients, even if they met the criteria.18,19
From the Departments of Emergency Medicine (J.-m.K.) and Cardiology (J.P.), Mediplex Sejong Hospital, Incheon, Korea; VUNO, Seoul, Korea (Youngnam L., Yeha L.,
S.L.).
*Dr Kwon and Mr Youngnam Lee contributed equally to this study.
Correspondence to: Joon-myoung Kwon, MD, Department of Emergency medicine, Mediplex Sejong Hospital, 20, Gyeyangmunhwa-ro, Gyeyang-gu, Incheon 21080,
Korea. E-mail: kwonjm@sejongh.co.kr
Received January 18, 2018; accepted May 31, 2018.
ª 2018 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley. This is an open access article under the terms of the Creative Commons
Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for
commercial purposes.
DOI: 10.1161/JAHA.118.008678 Journal of the American Heart Association 1
ORIGINAL RESEARCH
byguestonJune28,2018http://jaha.ahajournals.org/Downloadedfrom
120. • 대학병원 신속 대응팀에서 처리 가능한 알림 수 (A, B 지점) 에서 더 큰 정확도 차이를 보임
• A: DEWS 33.0%, MEWS 0.3%
• B: DEWS 42.7%, MEWS 4.0%
(source: VUNO)
APPH(Alarms Per Patients Per Hour)
(source: VUNO)
Less False Alarm
131. • J&J이 수면 유도 마취로봇인 ‘세더시스(Sedasys)' 를 2014년 출시
• 결장경, 내시경 검사 때 프로포폴을 주사해 환자 수면을 유도하는 마취용 의료로봇
• 혈중 산소 함량, 심장박동 수 등 환자 신체 징후에 따라 투약량을 조절
• 2013년 FDA가 승인하면서 미국, 호주, 캐나다 등 병원에 2014년부터 보급
• 수면내시경 의료비를 1/10 로 낮춤 (2000달러 vs 150-200달러)
• 마취전문의협회 등은 대대적인 반대 캠페인을 벌이고 정치권에 규제 로비를 전개
• 월스트리트 저널:“J&J가 수입원이 줄어들 위기에 처한 마취전문의들과 싸움에서 패한 것"
135. 기계적인 일을 모두 기계가 대신한다면,
과연 인간의 역할은 무엇일까?
현재 의사의 역할에는 어떤 것들이 있을까?
136. 기계적인 일을 모두 기계가 대신한다면,
과연 인간의 역할은 무엇일까?
현재 의사의 역할에는 어떤 것들이 있을까?
• 사라질 역할
• 유지될 역할
• 새로운 역할
137. •근거 및 논리에 의한 판단
•순서도로 도식화할 수 있는 것
•시각적 인지능력에 기반한 역할
•아래의 질문에 대한 답이 YES인 것
•‘왜 그런 결정을 내렸는지 논리적으로 설명할 수 있는가?’
•‘다른 의사들에게 가도 비슷한 결정을 내릴 것인가?’
•‘내가 한 달 뒤에 보더라도 같은 결정을 내릴까?’
사라질 역할
141. •마지막 의료적 의사 결정
•인간만이 할 수 있는 인간적인 일
•Human touch
•커뮤니케이션, 공감 능력
•환자를 진료/치료하는 이외의 역할
•기초 연구
•새로운 데이터와 기준을 만들어내는 일
유지될 역할
142.
143. Over the course of a career, an oncologist may impart bad news an average of 20,000 times,
but most practicing oncologists have never received any formal training to help them
prepare for such conversations.
144. High levels of empathy in primary care physicians correlate with
better clinical outcomes for their patients with diabetes
145. the three levels of physicians’ empathy
was highly significant (2
(4) ϭ 22.04, P Ͻ
.001). The likelihood of good control
(A1c Ͻ 7.0%) was significantly greater in
the patients of physicians with high
empathy scores than in the patients of
physicians with low scores (56% and
40%, respectively; z ϭ 4.0, P Ͻ .01).
Conversely, the likelihood of poor
control (A1c Ͼ 9) was significantly lower
in the patients of physicians with high
empathy scores than it was in the patients
of physicians in the low-scoring group
Statistical control for gender, age, and
type of insurance
Logistic regression was used to examine
the unique contribution of levels of
physicians’ empathy in predicting
optimal clinical outcomes after
controlling for physicians’ and patients’
gender and age, and patients’ health
insurance. In the first logistic model, the
outcomes of the hemoglobin A1c test
were dichotomized according to
whether they had achieved good
scoring category of physic
the odds of good control o
by 50%. Physicians’ gende
was associated with good
patients’ A1c outcome), p
(younger age was associat
control of patients’ A1c),
type of insurance (Medica
associated with good cont
contributed significantly t
Patients’ gender and age d
contribute. The Hosmer–
goodness-of-fit test showe
model was mathematically
(2
(8) ϭ 7.03, P ϭ .53). Th
indicated that the physicia
empathy was a unique and
contributor to the predict
control of hemoglobin A1
patients, beyond the contr
gender and age of the phy
patients, and type of patie
insurance.
In another logistic regress
classified the results of the
into two categories in whi
test result of less than 100
good control. The same p
in the previous model wer
the independent variables
results of this analysis are
Table 3.
The odds ratios for physic
Table 2
Frequency and Percent Distributions of the Hemoglobin A1c and LDL-C Test
Results for 891 Diabetic Patients, Treated Between July 2006 and June 2009, by
Levels of Their Physicians’ Empathy*
No. (%) of patients by levels of physicians’ empathy
Patient outcome
High
(n ؍ 205)
Moderate
(n ؍ 282)
Low
(n ؍ 404)
Hemoglobin A1c†
.........................................................................................................................................................................................................
Ͻ7.0% 115 (56) 139 (49) 163 (40)
.........................................................................................................................................................................................................
Ն7.0% and Յ9.0% 59 (29) 99 (35) 135 (34)
.........................................................................................................................................................................................................
Ͼ9.0% 31 (15) 44 (16) 106 (26)
LDL-C‡
.........................................................................................................................................................................................................
Ͻ100 121 (59) 149 (53) 180 (44)
.........................................................................................................................................................................................................
Ն100 and Յ130 56 (27) 86 (30) 128 (32)
.........................................................................................................................................................................................................
Ͼ130 28 (14) 47 (17) 96 (24)
* From a study of physicians’ empathy and patients’ outcomes, Jefferson Medical College.
†
2
(4) ϭ 22.04, P Ͻ .001.
‡
2
(4) ϭ 15.55, P Ͻ .001.
•891명의 당뇨병 환자를 대상으로 한 2011년 연구
•공감 능력이 높은 의사에게 진료받은 환자들이,
•혈당 관리(당화혈색소)도 잘 되었으며,
•나쁜 콜레스테롤(LDL-C) 수치도 더 낮았다.
•의사의 공감능력 외의 다른 변수 (의료진 성별, 환자 성별, 보험 여부 등등)는 차이 없음
146. High levels of empathy in primary care physicians correlate with
better clinical outcomes for their patients with diabetes