2. Waarom terminologievalidatie?
• Consistentie van beschrijvingen/data bij
invoeren/wijzigen (authority files en thesauri)
• Bij zoeken kan de taal van de zoeker worden
afgestemd op de taal van de data die gezocht
wordt. Dus meertalige terminologiebronnen
verbeteren toegankelijkheid.
• Uitbreiden van zoekacties via aangeboden
relaties in de terminologiebron. Bijv.
hiërarchisch.
3. Gebruik van validatie bij
invoeren/wijzigen….
Termen en
hierarchie
wordt
getoond in
pop-up
scherm
5. Voorbeelden beschikbare
terminologiebronnen
• The Getty Art & Architecture Thesaurus (AAT)
http://www.getty.edu/research/conducting_research/vocabularies/aat/
• The Getty Union List of Artist Names (ULAN)
http://www.getty.edu/research/conducting_research/vocabularies/ulan/
• The Getty Thesaurus of Geographical Names (TGN)
http://www.getty.edu/research/conducting_research/vocabularies/tgn/in
• RKD-Artist
http://www.rkd.nl/rkddb/
• Iconclass
http://www.iconclass.nl/libertas/ic?style=index.xsl
• Dutch version of the AAT
http://www.aat-ned.nl/
6. Hoe een terminologiebron
beschikbaar maken?
• Papier ???
• Word Document ???
• CSV file ???
• CD-ROM ???
• Website ???
• XML ???
• Webservice !!!
7. Problemen met beschikbaar stellen op
CD-ROM
• Specifieke software nodig
• Naadloze integratie alleen mogelijk met
software van de zelfde maker
• Data conversie is een alternatief (van wat naar
wat?)
• Updates lastig te implementeren (technische-
en synchronisatieproblemen)
• Distributie via post of (soms zware)
downloads
8. Problemen met websites die
terminologiebronnen aanbieden
• Alle websites zien er anders uit en werken
anders
• (Zoek)faciliteiten variëren per website
• Integratie alleen mogelijk via ‘knippen &
plakken’
9. Oplossing
• In plaats van een interface voor gebruikers
hebben we een interface nodig voor
programma’s, een zogenaamde API
(Application Program Interface)
• Dit maakt het voor gebruikers mogelijk
externe terminologiebronnen te gebruiken
vanuit de vertrouwde eigen
werkomgeving/applicatie.
11. Vereisten voor een terminologie API
• Platform-neutraal
• Taalonafhankelijk
• Kunnen omgaan met hiërarchieën
• Gemakkelijk te implementeren
• Via internet te gebruiken
• Technologie: http + xml = webservices
12. Voor alle duidelijkheid…
• Een website is GEEN webservice!
• Een webservice heeft GEEN gebruikersinterface
• Een webservice accepteert aanroepen vanuit
softwareprogramma’s
• En geeft ‘raw’ XML als resultaat
Webservice (also Web Service) is gedefinieerd door
de W3C als "a software system designed to
support interoperable Machine to Machine
interaction over a network."
13. Twee stijlen van webservices
• 1 - Non-Soap: simple RPC (Remote Procedure Call), RESTful
(Representational state transfer)
• 2 - Soap
• Voorbeeld Non-Soap: SRU (
http://www.loc.gov/standards/sru/)
• Voorbeeld Soap: MuseumsVokabular.de (
http://museum.zib.de/museumsvokabular/webservice/museumvo
)
• Beide werken anders en beide geven resultaten in andere
vorm
• Beide systemen willen we in Adlib ondersteunen!
14. ‘Twee’ mogelijke oplossingen
• Harmoniseer alle webservices voor
terminologiebronnen, zodat ze allemaal
dezelfde aanroepmethode, dezelfde XML
retourneren en dezelfde syntax hanteren……
• Gebruik “vertalers” voor de transformatie
naar specifieke implementaties van
webservices
15. Gateway oplossing
Web service
Send http or 1 (iconclass)
SOAP request
Gateway (translator)
Application on
client computer Return data
in XML
Web service 2
(museumvok)
Lists can be used to pick terms from Hierarchies can be viewed to see terms in context Non-preferred terms can be automatically replaced by their preferred forms| Scope notes can be shown te determine the appropriateness of a term [demo: enter ‘afr’ for object name with show all domains switched on]
Lists can be used for browsing Hierarchies can be displayed to show terms in their context Non-preferred terms can be substituted by their equivalent counterparts Equivalent terms can be searched simultaneously [demo: search for ‘t’ in object name in the search wizard, click on ‘tea kettle’] [demo2: click on ‘Generic’ , press all keys and show that teapots are found]
Various institutions have been working on controlled vocabularies. In the museum world the most successful thesaurus constructor is the Getty Research Institute. They have produced three very valuable resources, the AAT, ULAN and TGN. The AAT has been translated in the Dutch language and is heavily used in the Netherlands and Flanders (Dutch speaking part of Belgium). Another widely used terminology source is the Iconclass system. This is a hierarchical system of rather complex notations that finds its origins in the Netherlands in the 50-ies of the previous century. Because the notations were “designed” in the pre-computer era they look to say the least somewhat “chaotic” and are hard to use for a novice user. On the other hand the strength of Iconclass is in its multi-linguality and the rich context. Iconclass is maintained these days at the RKD (Rijksbureau voor Kunsthistorische Documentatie, or National bureau for the History of Art). Iconclass is now in an experimental version available as a web service
This list is more or less showing the evolution of distribution of Controlled Vocabularies. The early versions were indeed printed on paper and used in manual systems as reference works. The first steps in processing Controlled Vocabularies were made by using the text processing capabilities of computers to print-out thesauri. This was already done in the pre-word time. I was involved in writing bespoke printing software for the MARDOC 4 thesaurus, using 8080 assembler on a Z80 processor. The next phase was exchange as ASCII files. Of course accented characters were a big issue during those days. In 2002 the Dutch version of the AAT was completed. This vocabulary was distributed by the RKD as a “test” version for use with Adlib Museum and TMS. This CD-ROM was quite successful, but never saw a final release or any updates. From this attempt it was learned that a CD-ROM approach is rather static, user unfriendly and expensive. A year later the Dutch AAT became available as a web site. Quite useful, but insufficient for machine-to-machine communication Expressing thesauri in XML was a great step forward, not in the least because XML, unless CSV files supports repeated elements and is fully Unicode, relieving the users from the accented characters problem. SKOS was another step forward, towards standardization, although… it still has a number of different implementations. The final solution for creating distributed Vocabs is Web Services..
Although the “test” CD-ROM (or beta if you like) was rather successful, some important lessons were learned from this experience. First of all specific software was used to access the database: Collection Connection from Cit, and Adlib Museum from Adlib. In addition to this the files were placed in XML format on the CD. The “bespoke” software did not seem to be too much of a problem because both systems used were the predominant systems in the Netherlands… But… hey as a principle.. Vocabs should be software agnostic…. But what about the XML files… ? Although in theory these are system-agnostic it turned out that virtually none of the potential AAT-NL users had the skills to make use of these files. A pearl locked in an oyster. Updates are also a problem.. The Beta-CD-Rom could only be produced by financial support of C-It and Adlib. Although producing CD-Rom’s has become a lot cheaper and perhaps CD-R are an options this remains a clumsy solution for any realistic distribution volumes. After all, don’t we even see software distribution on CD-ROM disappear ?
Did the World Wide Web offer the final solution ? The answer is clearly “No”. Since the appearance of online vocabs on the web we are still using them in a “manual” way, i.e. as electronic reference works. The reason behind this is of course that a web browser is a human interface program and that HTML is a page lay-out language, making it extremely difficult, or even impossible for computer programs to work with resources that they make available. Furthermore all vocab web sites have a different look-and-feel, admitted … the Getty made their AAT, ULAN and TGN implementations uniform.
So what do we really need ? We need a computer-to-computer protocol instead, or in other words an Application Program Interface. This API should be platform neutral, it should be just as useful for a Linux, a Windows or a OS-X program. Also it needs to support Unicode, has to be able to deal with hierarchies and should be easy to implement. Web technology offers all of this in the form of Web Services
There is a lot of confusion when people talk bout web services: a web service is NOT the same as a web site. It is for machine-to-machine communication.
There are two basic styles of web services: One way is to send the request to a server in the form of a URL-encoded string, using CGI (Common Gateway Interface) style encoding for the parameters. This is also referred to as the REST way of doing things (REpresentational State Transfer) The other is using the Simple Object Access Protocol, although the Simple might not turn out to be really simple…
Harmonizing the world is not a trivial thing to do. In most cases this does not occur by committees, but by natural selection. Finally one or perhaps two flavors of a certain technology will survive (sometimes three…. Windows, Mac OSX and Linux?) So creating adapters is the thing to do. This is exactly what a gateway does… it accepts a web service request in one variant, passes it to one or more web servers and returns the XML in the expected form. XSLT is often used to implement this.
For Adlib products there is a tool available that allows the museum to add external thesauri to a thesaurus validated field. This assumes a non-Soap based connection. Adlib have implemented a number of external sources on a publicly accessible gateway server.
The “external” thesaurus that has been made available to Adlib Museum by entering it’s URL in Adlib designer can now be accessed as if the thesaurus is a native Adlib thesaurus. The user cannot see the difference between a “local” thesaurus and the “remote” thesaurus. The thesaurus can be browsed just like any local thesaurus, terms can be viewed in detail and generic searching can take place.
The “external” thesaurus that has been made available to Adlib Museum by entering it’s URL in Adlib designer can now be accessed as if the thesaurus is a native Adlib thesaurus. The user cannot see the difference between a “local” thesaurus and the “remote” thesaurus. The thesaurus can be browsed just like any local thesaurus, terms can be viewed in detail and generic searching can take place.