SlideShare uma empresa Scribd logo
1 de 59
Helping Haiti - a semantic
  web approach to crisis
information management
 a different translational informatics project

            Simon Twigger, Ph.D.
Questions of interest
                Has anyone done any expression
                studies using congenic rats?
                 What tissue is this gene expressed in?
  What expression data is   Are any of these genes
known for SD (aka SD/NHsd,
  Harlan Sprague Dawley,       associated with my
   Sprague Dawley) rats?            phenotype?
          Has this gene been seen in the brain?
        What rat expression studies have been done on
        Mammary Cancer(aka breast neoplasms/breast
       cancer/cancer of the breast, breast carcinoma...)?
Data hidden in plain sight
GEO + GMiner + OBA
GEO + GMiner + OBA
GEO Records


               Create Annotation
               Jobs & Queue Up

                                     Q-Out
                                                                  1..n Annot. Workers




                                   RabbitMQ                           Index text
                                                                        at OBA


                                                                        Parse
                                    Q-In
                                                                       Results


              Results saved to                Put results in to
              GMiner database                 queue for save
Browse/Review Results
Browse/Review Results
Browse/Review Results
http://chicagofree.info/2010/01/public-service-haiti-pictures-via-twitter/
http://www.nytimes.com/interactive/2010/01/18/world/americas/0118-haiti-assess-maps.html
URGENT Christopher Frecynet is still alive under
his house. 64 Rue Nord Alexis.(RUELLE NAZON,
AVENUE POUPELARD
Mirna Nazaire lives in P-A-P at Bizoton 6#12.
Entire neighborhood without food. People are
dying.
French hospital is now open and ready to receive
the wounded at the french lycee in rue
marcadieux bourdon
Questions of interest
 Which hospitals are open?
          Who is in trouble
Does anyone have any tents?
       Where are the open roads?
            Any information on Person ABC?
        What help is needed?
Who needs this info?
• Aid Agencies, Non-Governmental Organizations
   Red Cross, UN, etc.
• Military & other relief suppliers
• Individuals in Haiti
• Donors - matching needs to offers
• etc.
Structured data in biology
http://epic.cs.colorado.edu/helping_haiti_tweak_the_twe.html
Main Hashtags
Data Tags
Keywords
Mirna Nazaire lives in P-A-P at Bizoton 6#12.
Entire neighborhood without food. People are
dying.



#haiti #need food #name Mirna Nazaire
lives in #loc PAP at Bizoton 6
#12 #info neighborhood w/o food. People dying
French hospital is now open and ready to receive
the wounded at the french lycee in rue
marcadieux bourdon



#haiti #offering hospital rooms #loc french
lycee in rue marcadieux bourdon #num 30+
#info French hospital is open and ready 2
receive
GMiner Ontologies
OWL Ontologies

 Classes - categories of things you care
 about


 Properties - attributes of the things
 you care about
‘Triples’ of data
  c1         p2       “Value”

subject   predicate   object
‘Triples’ of data
  c1            p2        “Value”

subject      predicate    object


simont     hasHairColor   brown
simont    inOfficeNumber   H8808
simont hasPhoneNumber 456-1234
TtT Classes
TtT Properties
TtT Properties
TtT Properties
Inference
trapped

           domain

          has_trapped
Inference
trapped                 The ontology asserts
                        that any thing that has
           domain       a ‘has_trapped’
                        property is a member
          has_trapped   of the ‘Trapped’class
Inference
trapped                    The ontology asserts
                           that any thing that has
           domain          a ‘has_trapped’
                           property is a member
          has_trapped      of the ‘Trapped’class




Tweet:123 #haiti #trapped 5 people #loc Pap
Inference
trapped                     The ontology asserts
                            that any thing that has
             domain         a ‘has_trapped’
                            property is a member
            has_trapped     of the ‘Trapped’class


Tweet 123             “5 people”


Tweet:123 #haiti #trapped 5 people #loc Pap
Inference
trapped                     The ontology asserts
                            that any thing that has
             domain         a ‘has_trapped’
                            property is a member
            has_trapped     of the ‘Trapped’class


Tweet 123             “5 people”


Tweet:123 #haiti #trapped 5 people #loc Pap
tweetneed.org
RDF Graph
maison des anges                            Insulin

                                 has need
          has contact




             tweet:8550350793


                            has location

        twitter id
                             18.588724,-72.275065


8550350793
                                            has longitude
                           has latitude




                     18.588724                  -72.275065
RDF Graph
7953197721
                                           Delmas

              twitter id
                                 has location




                      tweet:7953197721


                              has need
          has need



Insulin
                                 medication
7953197721
                                                                                                        Delmas

                                                                           twitter id
                                                                                              has location




                                                                                   tweet:7953197721


                                                                                           has need
                                                                       has need



                                                             Insulin
                                                                                              medication


maison des anges                            Insulin

                                 has need
          has contact




             tweet:8550350793


                            has location

        twitter id
                             18.588724,-72.275065


8550350793
                                            has longitude
                           has latitude




                     18.588724                  -72.275065
7953197721
                                                                                         Delmas

                                                            twitter id
                                                                               has location




                                                                    tweet:7953197721


                                                                            has need
                                                        has need


maison des anges                            Insulin
                                            Insulin
                                                                               medication
                                 has need
          has contact




             tweet:8550350793


                            has location

        twitter id
                             18.588724,-72.275065


8550350793
                                            has longitude
                           has latitude




                     18.588724                  -72.275065
Duplication
              1.21pm
Duplication
              1.21pm




                   31x




              9.29pm
Using MD5 Hashes

         simon twigger


f6f12de7192d1a5d903c016ecb5b3a0c
Using MD5 Hashes

          simon twigger


f6f12de7192d1a5d903c016ecb5b3a0c

          haiti loc info


26e7c844f0c80a8860d6835591117639
Using MD5 Hashes

          simon twigger


f6f12de7192d1a5d903c016ecb5b3a0c

          haiti loc info


26e7c844f0c80a8860d6835591117639
Using MD5 Hashes
          rt @baybe_doll: #haiti #need help #name mr. bernard jean
          louis #loc lumiere evangelical chapel at rue midway 22 in
          carrefour #contact phone # 3778-2506


             3d609759195d03a059baca1e063be4eb                        [3]

                        contact haiti loc name need

                   b767eeb9c16e74bfb22ee6ec0998a670          [13]

help bernard jean louis lumiere evangelical chapel rue midway carrefour ...

                   d4dc5272669dee93721b4c005307cfc7          [4]
Who is using tweet data?



  http://haiti.ushahidi.com
                                 http://haiti.sahanafoundation.org/




                                 http://swift.ushahidi.com/

http://haiti.managingnews.com/
How to integrate?
How to integrate?
                               Topi
                                                    Topi
                                c
                                      Topi           c
                                       c

                        Topi
                         c                   Topi
                                              c




       Topi
        c



Topi
 c


Topi
 c
                 Topi
                  c

          Topi
           c
How to integrate?
                               Topi
                                                                                       Topi
                                c
                                      Topi                                              c
                                       c

                        Topi
                         c                                              Topi
                                                                         c




                                                                     Top
                                                                       c
                                                                         i
                                                    Topi




                                                                                          Top
                                                     c




                                                                                            c
       Topi




                                                                                              i
        c

                                             Topi
Topi                                          c




                                                                                              Top
 c




                                                                                                c
                                                                                                  i
                                             Topi
Topi                                          c
 c                                                                                                     To
                                                              Topi                                        p
                                                                                                      c i
                 Topi                                          c                To
                                                                                   p
                  c                                                            c i
                                                       Topi
          Topi                                          c
           c
How to integrate?
                               Topi
                                                                                              Topi
                                c
                                      Topi                                                     c
                                       c

                        Topi
                         c                                                     Topi
                                                                                c




                                                                            Top
                                                                              c
                                                                                i
                                                           Topi




                                                                                                 Top
                                                            c




                                                                                                   c
       Topi




                                                                                                     i
        c

                                                    Topi
Topi                                                 c




                                                                                                     Top
 c




                                                                                                       c
                                                                                                         i
                                                    Topi
Topi                                                 c
 c                                                                                                            To
                                                                     Topi                                        p
                                                                                                             c i
                 Topi                                                 c                To
                                                                                          p
                  c                                                                   c i
                                             Topi
                                              c               Topi
          Topi                                                 c
           c
Timeline of incident reports at haiti.ushahidi.com
       January 12th - February 4th 2010
Timeline of incident reports at haiti.ushahidi.com
       January 12th - February 4th 2010
Crisis Commons

Mais conteúdo relacionado

Mais de Simon Twigger

Mais de Simon Twigger (7)

Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
A Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNGA Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNG
 
DevOps and Automation for Bioinformaticians
DevOps and Automation for BioinformaticiansDevOps and Automation for Bioinformaticians
DevOps and Automation for Bioinformaticians
 
the iPad - an interface for Biologists?
the iPad - an interface for Biologists?the iPad - an interface for Biologists?
the iPad - an interface for Biologists?
 
Semantic Web Approaches to Candidate Gene Identification
Semantic Web Approaches to Candidate Gene IdentificationSemantic Web Approaches to Candidate Gene Identification
Semantic Web Approaches to Candidate Gene Identification
 
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
 
Virtual Proteomics Analysis Cluster in the Cloud
Virtual Proteomics Analysis Cluster in the CloudVirtual Proteomics Analysis Cluster in the Cloud
Virtual Proteomics Analysis Cluster in the Cloud
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Helping Haiti - a semantic web approach to crisis information management

  • 1. Helping Haiti - a semantic web approach to crisis information management a different translational informatics project Simon Twigger, Ph.D.
  • 2. Questions of interest Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? What expression data is Are any of these genes known for SD (aka SD/NHsd, Harlan Sprague Dawley, associated with my Sprague Dawley) rats? phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)?
  • 3. Data hidden in plain sight
  • 4. GEO + GMiner + OBA
  • 5. GEO + GMiner + OBA GEO Records Create Annotation Jobs & Queue Up Q-Out 1..n Annot. Workers RabbitMQ Index text at OBA Parse Q-In Results Results saved to Put results in to GMiner database queue for save
  • 11. URGENT Christopher Frecynet is still alive under his house. 64 Rue Nord Alexis.(RUELLE NAZON, AVENUE POUPELARD
  • 12. Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying.
  • 13. French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon
  • 14. Questions of interest Which hospitals are open? Who is in trouble Does anyone have any tents? Where are the open roads? Any information on Person ABC? What help is needed?
  • 15. Who needs this info? • Aid Agencies, Non-Governmental Organizations Red Cross, UN, etc. • Military & other relief suppliers • Individuals in Haiti • Donors - matching needs to offers • etc.
  • 18.
  • 22. Mirna Nazaire lives in P-A-P at Bizoton 6#12. Entire neighborhood without food. People are dying. #haiti #need food #name Mirna Nazaire lives in #loc PAP at Bizoton 6 #12 #info neighborhood w/o food. People dying
  • 23. French hospital is now open and ready to receive the wounded at the french lycee in rue marcadieux bourdon #haiti #offering hospital rooms #loc french lycee in rue marcadieux bourdon #num 30+ #info French hospital is open and ready 2 receive
  • 24.
  • 26. OWL Ontologies Classes - categories of things you care about Properties - attributes of the things you care about
  • 27. ‘Triples’ of data c1 p2 “Value” subject predicate object
  • 28. ‘Triples’ of data c1 p2 “Value” subject predicate object simont hasHairColor brown simont inOfficeNumber H8808 simont hasPhoneNumber 456-1234
  • 33. Inference trapped domain has_trapped
  • 34. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class
  • 35. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class Tweet:123 #haiti #trapped 5 people #loc Pap
  • 36. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class Tweet 123 “5 people” Tweet:123 #haiti #trapped 5 people #loc Pap
  • 37. Inference trapped The ontology asserts that any thing that has domain a ‘has_trapped’ property is a member has_trapped of the ‘Trapped’class Tweet 123 “5 people” Tweet:123 #haiti #trapped 5 people #loc Pap
  • 39.
  • 40. RDF Graph maison des anges Insulin has need has contact tweet:8550350793 has location twitter id 18.588724,-72.275065 8550350793 has longitude has latitude 18.588724 -72.275065
  • 41.
  • 42. RDF Graph 7953197721 Delmas twitter id has location tweet:7953197721 has need has need Insulin medication
  • 43. 7953197721 Delmas twitter id has location tweet:7953197721 has need has need Insulin medication maison des anges Insulin has need has contact tweet:8550350793 has location twitter id 18.588724,-72.275065 8550350793 has longitude has latitude 18.588724 -72.275065
  • 44. 7953197721 Delmas twitter id has location tweet:7953197721 has need has need maison des anges Insulin Insulin medication has need has contact tweet:8550350793 has location twitter id 18.588724,-72.275065 8550350793 has longitude has latitude 18.588724 -72.275065
  • 45. Duplication 1.21pm
  • 46. Duplication 1.21pm 31x 9.29pm
  • 47. Using MD5 Hashes simon twigger f6f12de7192d1a5d903c016ecb5b3a0c
  • 48. Using MD5 Hashes simon twigger f6f12de7192d1a5d903c016ecb5b3a0c haiti loc info 26e7c844f0c80a8860d6835591117639
  • 49. Using MD5 Hashes simon twigger f6f12de7192d1a5d903c016ecb5b3a0c haiti loc info 26e7c844f0c80a8860d6835591117639
  • 50. Using MD5 Hashes rt @baybe_doll: #haiti #need help #name mr. bernard jean louis #loc lumiere evangelical chapel at rue midway 22 in carrefour #contact phone # 3778-2506 3d609759195d03a059baca1e063be4eb [3] contact haiti loc name need b767eeb9c16e74bfb22ee6ec0998a670 [13] help bernard jean louis lumiere evangelical chapel rue midway carrefour ... d4dc5272669dee93721b4c005307cfc7 [4]
  • 51.
  • 52. Who is using tweet data? http://haiti.ushahidi.com http://haiti.sahanafoundation.org/ http://swift.ushahidi.com/ http://haiti.managingnews.com/
  • 54. How to integrate? Topi Topi c Topi c c Topi c Topi c Topi c Topi c Topi c Topi c Topi c
  • 55. How to integrate? Topi Topi c Topi c c Topi c Topi c Top c i Topi Top c c Topi i c Topi Topi c Top c c i Topi Topi c c To Topi p c i Topi c To p c c i Topi Topi c c
  • 56. How to integrate? Topi Topi c Topi c c Topi c Topi c Top c i Topi Top c c Topi i c Topi Topi c Top c c i Topi Topi c c To Topi p c i Topi c To p c c i Topi c Topi Topi c c
  • 57. Timeline of incident reports at haiti.ushahidi.com January 12th - February 4th 2010
  • 58. Timeline of incident reports at haiti.ushahidi.com January 12th - February 4th 2010

Notas do Editor

  1. We build databases that help researchers get access to and use rat data. Here’s a selection of problems that many rat researchers face, trying to answer questions based on masses of data that is too prolific to read, hard to get to, inconsistently organized and hard to integrate.
  2. NCBI’s Gene Expression Omnibus has a lot of relevant data, either as text or raw data. Can we start to capture some of this informaiton in an informatically-tractable fashion using ontologies and the OBA tools at the National Center for Biomedical Ontology in an annotation pipeline? The red boxes highlight some concepts of interest - rat strains and tissues being used in this experiment. A human can read these and know whats going on but what about a computer?
  3. We built a pipeline to take snippets of text from GEO records, fire them off into a queue and have them annotated by various ontologies at NCBO. The results are returned to another queue and loaded into the database. We then do a manual review of the automated annotations (not shown here) using a customized curation interface.
  4. Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
  5. Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
  6. Here’s a different area of need - the Haiti earthquake from mid January.
  7. This is the type of information that started flowing across Twitter very soon after the earthquake hit.
  8. This has valuable information but again, there is a lot of it, its unstructured and hence hard to a computer to pull out actionable data.
  9. Here are the questions facing organizations and individuals in haiti and around the world providing support
  10. Lots of people need the information but pulling it out opf plain text tweets is hard
  11. We already have somewhat structure information in biological databases, etc. there is still a lot of free text but at least we know what’s being talked about which makes interpretation somewhat easier. Nothing like this existed in the twitterverse until...
  12. The UC Boulder team came up with TweakTheTweet (TtT) as a way to structure tweets to get more information out of it
  13. Based on the tags, we can pull out information - but it still relatively unstructured. What is a ‘need’, does something tagged as ‘Need’ on this site mean the same as ‘Need’ on another site, is the Loc a lat/long, a house address, etc? Can we use ontologies as we used for biological data in GMiner to add structure and facilitate interpretation? Do these ontologies exist? No.
  14. Here are two of the ontologies we are using in GMiner - they list concepts of interest related to inbred Rat strains and mouse anatomy. These form the controlled vocabularies of relevant facts that we use to go looking in the plain text of a GEO record. Ontologies provide a more structured format for the concepts of interest and go beyond keyword lists as a way to organize and analyze annotated data.
  15. OWL ontologies have two main types of thing - Classes (things you care about) and Properties (attributes of the things you care about)
  16. The data is expressed in triples- a subject (the thing we are talking about), predicate (what type of info we are talking about, the property it posseses) and object (the information we know about the subject). Here are some examples relating to me...
  17. Created at TweakTheTweet ontology using Protege and RDF/OWL
  18. tag_terms are used to store the potential text matches in tweets - english and french (and more to come?)
  19. tag_terms are used to store the potential text matches in tweets - english and french (and more to come?)
  20. Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  21. Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  22. Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  23. Ontology describes logical structures for the data. If Tweet 123 has a ‘has_trapped’ property associated with it, the ontology can be used to infer that Tweet123 is also part of the ‘trapped’ class of tweets. We dont have to specify this in our dataset, the ontology enables this to happen.
  24. I put together a simple ruby on rails site at http://tweetneed.org that has been grabbing tweets since around 19th of January using the main TtT hashtags. Its certainly not a complete set but I’ve been using this as a platform for exploring the data and developing approaches to filter the tweets and extract useful information.
  25. Parse the tweet into useful information, trying to pull out as much useful data as possible - now have lat and long as specific fields, etc and each set of data is expressed as a triple - subject (the tweet), predicate (the property of interest) and the value. These triples can be dumped out as RDF (N-triples) and placed into a triple store.
  26. RDF data is a graph of nodes and edges - nodes are the subjects and objects, the edges correspond to the predicates in the RDF.
  27. This other tweet can be parsed to extract its relevant data - this one also contains Insulin as a need.
  28. The RDF generated for this tweet also corresponds to a graph of nodes and edges.
  29. Graphs can self-assemble based on shared properties, plus inference and Reasoners can be used to infer new class membership and organize the data in other ways as needed.
  30. Graphs can self-assemble based on shared properties, plus inference and Reasoners can be used to infer new class membership and organize the data in other ways as needed.
  31. A feature of Twitter is that people can retweet an existing tweet - this is good in that the retweet will probably reach a different audience than the original, however, in a crisis this results in a lot of duplicated data that has to be filtered through. Here’s just one tweet from the tweetneed.org database that I have in there 31x over a period of 8 hours and Im sure that is missing some as the original tweet is a RT.
  32. MD5 algorithm takes a text string and generates a unique alphanumerical string based on that text string - change any character and the MD5 value changes. Helpful as a way to take a variable length string of characters and boil it down to a unique, fixed length identifier. Often used to sign digital files - change one bit in a file and the MD5 checksum changes so you can detect if its the slightest bit different from the official value.
  33. MD5 algorithm takes a text string and generates a unique alphanumerical string based on that text string - change any character and the MD5 value changes. Helpful as a way to take a variable length string of characters and boil it down to a unique, fixed length identifier. Often used to sign digital files - change one bit in a file and the MD5 checksum changes so you can detect if its the slightest bit different from the official value.
  34. MD5 hash of the full text shows there are 3 other tweets in the database that are identical copies - this gets rid of obvious duplicates but its very conservative. Using just the hash tags isnt much good - 13 other tweets and many are different from this one, too promiscious. Using the keywords from the tweet (remove hashtags, @names, stop words and other short strings and take what is left) does a better job, identifying 4 duplicates.
  35. You can explore how this is working for a particular tweet on tweetneed.org. The tweets identified by the keyword hash include the original tweet
  36. A variety of organizations and groups are following the Tweet stream and extracting useful facts that are stored in their local databases, here’s just a few with Ushahidi and Sahana being two of the more central locations. Ushahidi is developing Swift River, a specific app to filter the stream of information from Twitter and other sources. This is still in development. Some benefits of multiple organizations tracking the same source of data is that they may each add unique and useful information to the original source - one site may verify the info, another may find the lat/long of the location, one site may have other info that increases the urgency of a particular report. They may also serve and reach different communities with different needs. However, one downside is that each organization may do the same work multiple times and the new info added by one organization may not be available to the others. Bringing this all back together to avoid duplication of effort and share data is not a trivial task.
  37. One potential solution is to export data in RDF using shared ontologies to describe common attributes. Place into a triple store (or federate multiple triple stores), integrate around common identifiers, use ontologies and reasoners to infer additional information not otherwise present. This could be a central location that people could query (RSS feed or REST, etc) to access additional data added by their colleagues and to access novel inferred information that was not apparent until the different data sources were merged.
  38. One potential solution is to export data in RDF using shared ontologies to describe common attributes. Place into a triple store (or federate multiple triple stores), integrate around common identifiers, use ontologies and reasoners to infer additional information not otherwise present. This could be a central location that people could query (RSS feed or REST, etc) to access additional data added by their colleagues and to access novel inferred information that was not apparent until the different data sources were merged.
  39. One potential solution is to export data in RDF using shared ontologies to describe common attributes. Place into a triple store (or federate multiple triple stores), integrate around common identifiers, use ontologies and reasoners to infer additional information not otherwise present. This could be a central location that people could query (RSS feed or REST, etc) to access additional data added by their colleagues and to access novel inferred information that was not apparent until the different data sources were merged.
  40. For some perspective, here’s an animated timeline of incident reports flowing into the haiti.ushahidi.com site since January 12th. These reports come from a wide variety of sources, SMS messages, individuals entering data on the ushahidi website and also Twitter.
  41. TtT is one of a variety of Crisis Commons projects where developers from around the world are volunteering and getting engaged building software, some related to Haiti but also for use in the next crisis that comes along.