The document discusses text mining of disease outbreak reports from online news and other unstructured sources for early detection of public health threats. It describes the BioCaster system which analyzes over 9,000 news reports daily using natural language processing and a multilingual ontology to extract structured event data on outbreaks from multiple languages. The system aims to supplement traditional surveillance and support timely response by public health experts.
Immune labs basics part 1 acute phase reactants ESR, CRP Ahmed Yehia Ismaeel,...
High throughput analysis and alerting of disease outbreaks from the grey literature
1. High throughput analysis and alerting of disease outbreaks from the grey literature Nigel Collier Associate Professor National Institute of Informatics, Tokyo [email_address] http://research.nii.ac.jp/~collier
2.
3.
4. Projects [1] Collier, N. et al ., "The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers", in Proc. of the Annual Meeting of the European Association for Computational Linguistics (EACL-99), pp.271-272, Norway , 1999. [2] Mizuta, Y. et al . (2006), “Zone analysis in biology articles as a basis for information extraction”, International Journal of Medical Informatics, Elsevier, Vol. 75, Issue 6, pp. 468-487. [3] Collier, N. et al . (2007), "Detecting Web rumours with a multilingual ontology supported text classification system", Advances in Disease Surveillance, vol. 4, pp. 242. 1998 2000 2002 2004 2006 2008 2010 GENIA PIA ZAISA BioCaster Support for database curation & resource building (PI:Tsujii) Linking biomedical annotations to SW ontologies Locating biomedical results in full research papers Early detection and alerting of PH threats
7. Alerting real world events Cholera, 2007, Iraq 1. Real world event 2. Grey literature response 4. Detecting unusual events News volume Time 3. Text mining on unstructured news Alert level 5. Issue alert
8. Globalization: a problem in one location is everybody’s headache … . + many hundreds more each year Plague, 2005, DRC Nipah, 1998, Malaysia Anthrax, 2001, USA SARS, 2003 ,HK Cholera, 2007, Iraq H5N1 flu, 2003- Ebola, 2007, DRC Foot & mouth, 2001, UK
9.
10. Typical operational flow Local field workers GP reports EMS visits Over the counter sales Grocery sales School absentees Reference labs Local labs Chief medical officer/Cabinet Timeliness Rumours WHO Disease Control Centre See also: [4] Mandl K. D. et al. (2004), “Implementing syndromic surveillance: a practical guide informed by early experience”, Journal of the American Medical Informatics Association, vol. 11, no. 2, Mar/Apr 2004, pp. 141-150.
11.
12.
13.
14. Typical operational flow [2] Local field workers GP reports EMS visits Over the counter sales Grocery sales School absentees Reference labs Local labs Chief medical officer/Cabinet Timeliness Rumours WHO Disease Control Centre Grey literature analysis
15.
16. Existing systems * non-governmental systems GPHIN ProMed* Argus MedISys BioCaster* Location Canada USA USA EU Japan Automatic? Automatic & Manual Manual volunteers Automatic & Manual Automatic Automatic Language coverage UN official languages All? ~36? EU official languages Asia-Pacific (en,jp,th,vn,sp,zh…) Access Closed Open Closed Closed Mixed Open source? No NA No No Yes
17. THEME 1: TEXT MINING SYSTEM Funding source: [A] JSPS grant in aid for scientific research on priority areas (18049071), PI, 4/2006-3/2007 [B] JSPS grant in aid for scientific research : Young researcher award category A (18680015), PI, 4/2006-3/2008
20. The nature of the task: ambiguity Confusion South Sudan hit by Ebola-like fever Zika virus in Micronesia ( Yap ) Vista attacked by 13-year old virus Obama fever builds as black Americans await a new era Undiagnosed disease in Java Bird flu outbreak drill spooks Manitoba town Boredom causing outbreaks of petrol sniffing
21.
22.
23.
24.
25. Event capture using a regular expression language (SRL) [5] McCrae, J., Conway, M. and Collier, N. (2009), “Simple Rule Language editor and handbook”, available from http://code.google.com/p/srl-editor/
26. Single slot relations [1] disease(D);species(“human”) :- PERSON(P,matches(@victim)) skipwords(2) DISEASE(D) species human disease measles species human disease chickenpox
27.
28.
29. Multi-slot challenge Is the disease outbreak ongoing or historical? Is there an outbreak ongoing now in Switzerland? San Diego measles outbreak In San Diego Measles, one of the most contagious diseases, has infected 11 children… Over the past month officials have tracked … The latest victim is an 8-year-old who may have spread the virus during a visit to Whole Foods Market in Hillcrest and later to a Cirque du Soleil performance… This is the most measles cases in the city in 17 years … The outbreak is believed to have started with a child who caught measles in Switzerland , then returned to the United States .
30. Multi slot relations disease measles city San Diego country United States species human Simple discourse rules OUTBREAK has_disease: measles has_location_country: United States has_location_province: California has_agent: rubeola virus has_species: human international_travel: true
31.
32.
33.
34.
35. Global health monitor Key figures: >1900 news sources >9000 news reports analysed/day http://www.biocaster.org and http://born.nii.ac.jp
36. THEME 2: MULTILINGUAL ONTOLOGY Funding source: [C] Trandisciplinary integration center project fund from ROIS, PI, 4/2006-3/2008 [D] JST Sakigake grant in aid for scientific research , PI, 10/2008-9/2011
37.
38. [9] Kawazoe, A., Chanlekha, H., Shigematsu, M. and Collier, N. (2008), “Structuring an event ontology for disease outbreak detection”, in BMC Bioinformatics (in press). [10] Collier, N., Kawazoe, A., Jin, L., Shigematsu, M., Dien, D. Barrero, R., Takeuchi , K.and Kawtrakul, A. (2007), “A multilingual ontology for infectious disease surveillance: rationale, design and challenges”, Language Resources and Evaluation, Elsevier, DOI: 10.1007/s10579-007-9019-7. A closer look
39. Example BCO classes* Language ar en fr id ja ko ma ru sp th vi zh Link ICD10 (232) ICD9 (185) LOINC (316) MeSH (1119) MedDRA (218) NCIMetaThesaurus (338) OIE (18) PHAC (53) Pathport (34) SNOMED CT (485) Wikipedia (685) Term arabicTerm (968) englishTerm (4113) frenchTerm (1281) indonesianTerm (1081) japaneseTerm (2077) koreanTerm (1176) malaysianTerm (1001) russianTerm (1187) spanishTerm (1171) thaiTerm (1485) vietnameseTerm (1297) chineseTerm(1142) Diseases AvianDisease (22) BeeDisease (6) BovineDisease (24) CanineDisease (4) CaprineDisease (14) CervineDisease (2) EquineDisease (17) FelineDisease (4) FishDisease (2) HumanDisease (216) LaomorphDisease (2) Non-humanPrimateDisease (16) OtherDisease (2) RodentDisease (8) SwineDisease (12) * Figures in brackets indicate current number of individuals
40.
41.
42.
43. THEME 3: EARLY ALERTING Funding source: [D] JST Sakigake grant in aid for scientific research , PI, 10/2008-9/2011
44.
45. Basic approach Frequencies for daily disease- country counts are compared against the 7 day history; An abnormal trend signals a possible alert Possible alerts are compared against human gold standard alerts to calculate accuracy