O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
(1)Standardizing for Open DataIvan	  Herman,	  W3C	  Open	  Data	  Week	  Marseille,	  France,	  June	  26	  2013	  Slides...
(2)Data	  is	  everywhere	  on	  the	  Web!	  l  Public,	  private,	  behind	  enterprise	  firewalls	  l  Ranges	  from	...
(3)
(4)
(5)
(6)
(7)
(8)W3C’s	  standardization	  focus	  was,	  traditionally,	  on	  Web	  scale	  integration	  of	  data	  l Some	  basic	...
(9)We	  have	  a	  number	  of	  standards	  RDF	  1.1	  SPARQL	  1.1	  URI	  JSON-­‐LD	   Turtle	   RDFa	   RDF/XML	  RDF...
(10)We	  have	  a	  number	  of	  standards	  RDB2RDF	   RDF	  1.1	  RDFS	  1.1	  SPARQL	  1.1	  OWL	  2	  URI	  JSON-­‐LD...
(11)We	  have	  Linked	  Data	  principles	  
(12)Integration	  is	  done	  in	  different	  ways	  l Very	  roughly:	  l  data	  is	  accessed	  directly	  as	  RDF	 ...
(13)
(15)However…	  l There	  is	  a	  price	  to	  pay:	  a	  relatively	  heavy	  ecosystem	  l  many	  developers	  shy	  ...
(16)Typical	  situation	  on	  the	  Web	  l Data	  published	  in	  CSV,	  JSON,	  XML	  l An	  application	  uses	  on...
(17)Non-­‐RDF	  Data	  l In	  some	  setting	  that	  data	  can	  be	  converted	  into	  RDF	  l But,	  in	  many	  ca...
(18)
(19)What	  that	  application	  does… 	  	  l Gets	  the	  data	  published	  by	  NHS	  l Processes	  the	  data	  (e.g...
(20)The	  reality	  of	  data	  on	  the	  Web…	  l It	  is	  still	  a	  fairly	  messy	  space	  out	  there	  L	  l ...
(21)How	  do	  developers	  perceive	  this?	  ‘When	  transportation	  agencies	  consider	  data	  integration,	  one	  ...
(22)One	  may	  look	  at	  the	  problem	  through	  different	  goggles	  l Two	  alternatives	  come	  to	  the	  fore:...
(24)But	  religions	  and	  cultures	  can	  coexist…	  J	  
(25)Open	  Data	  on	  the	  Web	  Workshop	  l Had	  a	  successful	  workshop	  in	  London,	  in	  April:	  l  around...
(26)We	  also	  talked	  to	  our	  “stakeholders”	  l Member	  organizations	  and	  companies	  l Open	  Data	  Instit...
(27)Some	  takeaway	  l The	  Semantic	  Web	  community	  needs	  stability	  of	  the	  technology	  l  do	  not	  add...
(28)Some	  takeaway	  l Look	  at	  the	  more	  general	  space,	  too	  l  importance	  of	  metadata	  l  deal	  wit...
(29)We	  need	  to	  meet	  app	  developers	  where	  they	  are!	  
(30)Metadata	  is	  of	  a	  major	  importance	  l Metadata	  describes	  the	  characteristics	  of	  the	  dataset	  l...
(31)Vocabulary	  Management	  Action	  l Standard	  vocabularies	  are	  necessary	  to	  describe	  data	  l  there	  a...
(32)W3C’s	  plan:	  	  l Provide	  a	  space	  whereby	  l  communities	  can	  develop	  l  host	  vocabularies	  at	 ...
(34)CSV	  on	  the	  Web	  l Planned	  work	  areas:	  l  metadata	  vocabulary	  to	  describe	  CSV	  data	  l  struc...
(36)Open	  Data	  Best	  Practices	  l Document	  best	  practices	  for	  data	  publishers	  l  management	  of	  pers...
(37)Summary	  l Data	  on	  the	  Web	  has	  many	  different	  facets	  l We	  have	  concentrated	  on	  the	  integra...
(38)In	  future…	  l We	  should	  look	  at	  other	  formats,	  not	  only	  CSV	  l  MARC,	  GIS,	  ABIF,…	  l Bette...
Enjoy	  the	  event!	  
Standardizing for Open Data
Standardizing for Open Data
Standardizing for Open Data
Standardizing for Open Data
Próximos SlideShares
Carregando em…5
×

Standardizing for Open Data

635 visualizações

Publicada em

Plans of W3C in the area of standard activities on Data on the Web

Publicada em: Tecnologia, Educação
  • Seja o primeiro a comentar

Standardizing for Open Data

  1. 1. (1)Standardizing for Open DataIvan  Herman,  W3C  Open  Data  Week  Marseille,  France,  June  26  2013  Slides at: http://www.w3.org/2013/Talks/0626-Marseille-IH/
  2. 2. (2)Data  is  everywhere  on  the  Web!  l  Public,  private,  behind  enterprise  firewalls  l  Ranges  from  informal  to  highly  curated  l  Ranges  from  machine  readable  to  human  readable  l  HTML  tables,  twitter  feeds,  local  vocabularies,  spreadsheets,  …  l  Expressed  in  diverse  models    l  tree,  graph,  table,  …  l  Serialized  in  many  ways    l  XML,  CSV,  RDF,  PDF,  HTML  Tables,  microdata,…  
  3. 3. (3)
  4. 4. (4)
  5. 5. (5)
  6. 6. (6)
  7. 7. (7)
  8. 8. (8)W3C’s  standardization  focus  was,  traditionally,  on  Web  scale  integration  of  data  l Some  basic  principles:  l  use  of  URIs  everywhere  (to  uniquely  identify  things)  l  relate  resources  among  one  another  (to  connect  things  on  the  Web)  l  discover  new  relationships  through  inferences  l This  is  what  the  Semantic  Web  technologies  are  all  about    
  9. 9. (9)We  have  a  number  of  standards  RDF  1.1  SPARQL  1.1  URI  JSON-­‐LD   Turtle   RDFa   RDF/XML  RDF:  data  model,  links,  basic  assertions;  different  serializations    SPARQL:  querying  data  A  fairly  stable  set  of  technologies  by  now!  
  10. 10. (10)We  have  a  number  of  standards  RDB2RDF   RDF  1.1  RDFS  1.1  SPARQL  1.1  OWL  2  URI  JSON-­‐LD   Turtle   RDFa   RDF/XML  RDF:  data  model,  links,  basic  assertions;  different  serializations    SPARQL:  querying  data  RDFS:    simple  vocabularies  OWL:  complex  vocabularies,  ontologies  RDB2RDF:  databases  to  RDF  A  fairly  stable  set  of  technologies  by  now!  
  11. 11. (11)We  have  Linked  Data  principles  
  12. 12. (12)Integration  is  done  in  different  ways  l Very  roughly:  l  data  is  accessed  directly  as  RDF  and  turned  into  something  useful  l  relies  on  data  being  “preprocessed”  and  published  as  RDF  l  data  is  collected  from  different  sources,  integrated  internally  l  using,  say,  a  triple  store  
  13. 13. (13)
  14. 14. (15)However…  l There  is  a  price  to  pay:  a  relatively  heavy  ecosystem  l  many  developers  shy  away  from  using  RDF  and  related  tools  l Not  all  applications  need  this!  l  data  may  be  used  directly,  no  need  for  integration  concerns  l  the  emphasis  may  be  on  easy  production  and  manipulation  of  data  with  simple  tools  
  15. 15. (16)Typical  situation  on  the  Web  l Data  published  in  CSV,  JSON,  XML  l An  application  uses  only  1-­‐2  datasets,  integration  done  by  direct  programming  is  straightforward  l  e.g.,  in  a  Web  Application  l Data  is  often  very  large,  direct  manipulation  is  more  efficient  
  16. 16. (17)Non-­‐RDF  Data  l In  some  setting  that  data  can  be  converted  into  RDF  l But,  in  many  cases,  it  is  not  done  l  e.g.,  CSV  data  is  way  too  big  l  RDF  tooling  may  not  be  adequate  for  the  task  at  hand  l  integration  is  not  a  major  issue  
  17. 17. (18)
  18. 18. (19)What  that  application  does…    l Gets  the  data  published  by  NHS  l Processes  the  data  (e.g.,  through  Hadoop)  l Integrates  the  result  of  the  analysis  with  geographical  data  Ie:  the  raw  data  is  used  without  integration  
  19. 19. (20)The  reality  of  data  on  the  Web…  l It  is  still  a  fairly  messy  space  out  there  L  l  many  different  formats  are  used  l  data  is  difficult  to  find  l  published  data  are  messy,  erroneous,    l  tools  are  complex,  unfinished…    
  20. 20. (21)How  do  developers  perceive  this?  ‘When  transportation  agencies  consider  data  integration,  one  pervasive  notion  is  that  the  analysis  of  existing  information  needs  and  infrastructure,  much  less  the  organization  of  data  into  viable  channels  for  integration,  requires  a  monumental  initial  commitment  of  resources  and  staff.  Resource-­‐scarce  agencies  identify  this  perceived  major  upfront  overhaul  as  "unachievable"  and  "disruptive.”’      -­‐-­‐  Data  Integration  Primer:  Challenges  to  Data  Integration,  US  Dept.  of  Transportation    
  21. 21. (22)One  may  look  at  the  problem  through  different  goggles  l Two  alternatives  come  to  the  fore:  1.  provide  tools,  environments,  etc.,  to  help  outsiders  to  publish  Linked  Data  (in  RDF)  easily  l  a  typical  example  is  the  Datalift  project  2.  forget  about  RDF,  Linked  Data,  etc,  and  concentrate  on  the  raw  data  instead  
  22. 22. (24)But  religions  and  cultures  can  coexist…  J  
  23. 23. (25)Open  Data  on  the  Web  Workshop  l Had  a  successful  workshop  in  London,  in  April:  l  around  100  participants  l  coming  from  different  horizons:  publishers  and  users  of    Linked  Data,  CSV,  PDF,  …    
  24. 24. (26)We  also  talked  to  our  “stakeholders”  l Member  organizations  and  companies  l Open  Data  Institute,  Open  Knowledge  Foundation,  Schema.org  l …  
  25. 25. (27)Some  takeaway  l The  Semantic  Web  community  needs  stability  of  the  technology  l  do  not  add  yet  another  technology  block  J  l  existing  technologies  should  be  maintained  
  26. 26. (28)Some  takeaway  l Look  at  the  more  general  space,  too  l  importance  of  metadata  l  deal  with  non-­‐RDF  data  formats  l  best  practices  are  necessary  to  raise  the  quality  of  published  data  
  27. 27. (29)We  need  to  meet  app  developers  where  they  are!  
  28. 28. (30)Metadata  is  of  a  major  importance  l Metadata  describes  the  characteristics  of  the  dataset  l  structure,  datatypes  used  l  access  rights,  licenses  l  provenance,  authorship  l  etc.  l Vocabularies  are  also  key  for  Linked  Data  
  29. 29. (31)Vocabulary  Management  Action  l Standard  vocabularies  are  necessary  to  describe  data  l  there  are  already  some  initiatives:  W3C’s  data  cube,  data  catalog,  PROV,  schema.org,  DCMI,  …    l At  the  moment,  it  is  a  fairly  chaotic  world…  l  many,  possibly  overlapping  vocabularies  l  difficult  to  locate  the  one  that  is  needed  l  vocabularies  may  not  be  properly  managed,  maintained,  versioned,  provided  persistence…  
  30. 30. (32)W3C’s  plan:    l Provide  a  space  whereby  l  communities  can  develop  l  host  vocabularies  at  W3C  if  requested  l  annotate  vocabularies  with  a  proper  set  of  metadata  terms  l  establish  a  vocabulary  directory  l The  exact  structure  is  still  being  discussed:  http://www.w3.org/2013/04/vocabs/  
  31. 31. (34)CSV  on  the  Web  l Planned  work  areas:  l  metadata  vocabulary  to  describe  CSV  data  l  structure,  reference  to  access  rights,  annotations,  etc.  l  methods  to  find  the  metadata  l  part  of  an  HTTP  header,  special  rows  and  columns,  packaging  formats…  l  mapping  content  to  RDF,  JSON,  XML  l Possibly  at  a  later  phase:    l  API  standards  to  access  CSV  data  
  32. 32. (36)Open  Data  Best  Practices  l Document  best  practices  for  data  publishers  l  management  of  persistence,  versioning,  URI  design  l  use  of  core  vocabularies  (provenance,  access  control,  ownership,  annotations,…)  l  business  models  l Specialized  Metadata  vocabularies  l  quality  description  (quality  of  the  data,  update  frequencies,  correction  policies,  etc.)  l  description  of  data  access  API-­‐s  l  …  
  33. 33. (37)Summary  l Data  on  the  Web  has  many  different  facets  l We  have  concentrated  on  the  integration  aspects  in  the  past  years  l We  have  to  take  a  more  general  view,  look  at  other  types  of  data  published  on  the  Web      
  34. 34. (38)In  future…  l We  should  look  at  other  formats,  not  only  CSV  l  MARC,  GIS,  ABIF,…  l Better  outreach  to  data  publishing  communities  and  organizations  l  WF,  RDA,  ODI,  OKFN,  …  
  35. 35. Enjoy  the  event!  

×