Presentation on how to chat with PDF using ChatGPT code interpreter
資料處理、轉換及搜尋: 開放近用的技術挑戰
1. Data-processing, Data-transfer and Search: Further Technical Challenges for Open Access / By
Wolfram Horstmann, Bielefeld University Library
資料處理、轉換及搜尋: 開放近用的技術挑戰 / 沃爾夫勒姆‧霍斯特曼, 比勒費爾德大學圖書館
Open Access is a child of the Internet. Theoretically, the World Wide Web (WWW) has made it
possible for everyone to have immediate access to news, media and communication everywhere.
Scientists created the Web almost 20 years ago in order to exchange academic information more
efficiently, and thus made direct access to information possible. Since then, for many people, the Web
has developed into the ultimate global information platform.
開放近用衍生自網際網路。從理論上說,全球資訊網使得每個人在每個地方都能立即近用新
聞、媒體和傳播。為了更有效地交換學術資訊, 20 年前, 科學家建立全球資訊網, 得以直接近用
資訊。自那時以來,對於很多人來說,全球資訊網已發展成為全球資訊的平台。
Today scientists and scholars are once again aiming at the goal of Open Access to academic
information on the Internet. This is not because access to academic information on the Internet has
meanwhile been closed off. Rather, information in the form we are concerned with today did not exist
in the infancy of the WWW. In those days, academic publications existed largely in printed form. It is
only in the past decade that they have become available electronically on a large scale. In addition,
today we are not merely concerned with publications: many other data can be found in academic
offices and laboratories on computers, storage media or servers that are not compatible with the
standards of the WWW. For example, we are talking about the digitalisation of cultural heritage,
experimental measurement data, computer programs for evaluation, modelling or simulation, and
learning materials.
今天,科學家和學者再次把目標對準網際網路上的開放近用學術資訊。這並不是因為從網際網
路近用學術資訊的管道曾經被關閉, 而是因為今天所關心的資訊形式, 並不是全球資訊網發展初
期的樣子。在過去的日子裡,學術出版品多半以印本形式出現, 不過十來年的光景, 絕大部份都
以電子方式呈現。此外,我們今天不僅關心出版品:還有許多未採用全球資訊網標準的資訊, 散
布在研究室和實驗室的電腦、儲存媒體、或伺服器上。舉例來說,我們談論的數位化文化遺產,
實驗產出的數據, 供評價、模型或模擬用的電腦程式,以及學習素材。
Manual processing of all these data is impossible, which is why the machine-readability of data plays
an important role. Firstly, machine-readability means that data must be recognisable from external
servers or digital services. This recognition mostly takes place via metadata, a kind of digital label for
data, which contains information about the form and content of the underlying object. In addition,
machine-readability requires a transfer protocol which allows the data to be transferred from one place
to another. In the traditional WWW this is primarily ‘http’ (hyper-text transfer protocol). However, for
the multitude of data types and uses to be found in science and scholarship today, this is not adequate
since far more multifarious information on the type and purpose of the data has to be exchanged before
any transfer can take place.
以人工方式處理這些數據是不可能的,所以, 機讀的數據就扮演重要角色。首先,機讀意味著數
據必須能夠被外部伺服器或數位服務辨識。主要經由後設資料進行此辨識,即資料的數位標示,
包括資料本的形式及內容。此外,機讀用到傳輸協定, 讓資料從一個地方傳輸到另一個地方。在
傳統的全球資料網裡, 主要用的是'http'(超文本傳輸協定)。然而,今天的科學和學術領域裡的多
樣性資料, 一個傳輸協定是不夠的, 在傳輸之前, 資訊的類型和資料的目的, 都可能改變。
For academic data, but also for other forms of data, labelling with the ‘Simple Dublin Core Metadata
Element Set’ (http://dublincore.org) has become standard practice. As a transfer protocol for open,
machine-readable data stores, the ‘Open Archives Initiative Protocol for Metadata Harvesting’
(http://www.openarchives.org) is often used. The combination of the two allows a new form of
2. technical networking based on the principles of Open Access: digital knowledge stores, known as
repositories, are coming into existence worldwide. Alongside the repositories that are created directly
for academic disciplines, many academic libraries function as systematic digital age providers of
information by operating repositories. These repositories expose their data without restricting access
for digital ‘harvesters’, which collect metadata and structure them in intermediate storage facilities for
systematic access. After that, search engines enable researchers, teachers and learners to access
information, which is distributed worldwide in an unrestricted and targeted fashion.
將學術及其他型式的數據,以'簡單都柏林核心後設資料元素集'(http://dublincore.org)標示, 已是
制式的做法。做為開放機讀資料的傳輸協定, 通常採用'開放式檔案後設資料擷取協定'
(http://www.openarchives.org)。將這兩個協定結合在一起, 形成開放近用的網路技術: 在世界各地
儲存數位知識, 即典藏所。除了學科的典藏所外, 很多學術圖書館也運作典藏所, 扮演數位資訊提
供者的角色。這些典藏所將資料釋放出來, 沒有任何近用的限制, 數位'擷取器'可收集後設資料,
並在中介儲存設備組織它們, 供系統化的近用。然後,搜索引擎就能使研究人員、教師和學者近
用資訊,進而以無限制且針對性地方式, 將之散布於全世界。
But even if the data are present in repositories, labelled with metadata and accessible from other servers
and services, there is still no guarantee that the results are actually usable by academics. Due to major
Internet protagonists like Google, scientists and scholars are accustomed to relatively comprehensive
and rapid access to the results. Google and others invest a great deal in the registration and computer-
based structuring of data, which relate not just to metadata but to every conceivable form of
information which subsists in the digital object itself. The approach of structuring academic
information exclusively via metadata is conceptually superior. However, in practice this approach still
needs to be turned from an individual testing application into a comprehensive everyday tool. A
‘future-proof’ solution could lie in collaboration between libraries, which guarantee the quality of the
metadata and data presentation, and experts in information sciences, media studies and informatics.
不過,唯然儲存在典藏所的數據已經採用後設資料標識,並可經由其他伺服器和服務近用,但
仍不能保證學者真的可使用到該數據。因為, 谷歌、科學家和學者等網際網路的主要參與者, 已
經習慣於全面和快速地近用到結果。谷歌和其他投入者花了很大力氣將數據登錄及結構化, 不祗
是後設資料, 還包括數位本身所有可能想像的資訊形式。經由後設資料才能近用結構化的學術資
訊, 是很優的概念, 經由圖書館間的合作, 以及資訊科學、媒體、資訊學等領域的專家的參與, 保
證後設資料及資料呈現的品質。
Especially for the young generation of researchers and students, the WWW has developed into a highly
interactive environment. For many, the browser is a central switchboard
in their professional and social lives, in which communication, the exchange of data and the structuring
and configuration of their daily routines take place. The academic world also works on an increasingly
interactive basis. This means that not only access, but also the manipulation of data, collective editing à
la Wikipedia (http://www.wikipedia.org) or sharing à la Del-icio-us(http://del.icio.us) are expected.
尤其是對年輕一代的研究人員和學生來說,全球資訊網已經成為一個高度互動的環境。對於許
多人來說,瀏覽器是他們專業和社會生活中央的交換器, 在上面傳播、交換資料, 及結構化及組
態日常生活。學術世界的基礎, 越來越仰賴互動。這意味著,不僅是近用,還包括運作數據、共
同編輯維基百科(http://www.wikipedia.org)或網路書籤(http://del.icio.us)等行為, 也逐步地轉移到
網路上。
The reconfiguration of the WWW into an interactive environment suitable for science and scholarship
represents a challenge for service providers even with respect to publications with a relatively simple
structure. Increasingly, however, we also have to deal with the other materials mentioned above, such
as multifarious digital items, computer programs, and learning materials. These days, many academic
results are obtained with the help of precisely these new media; traditional publication with text and
3. graphics forms only a fraction of this academic work. Tracing back, let alone the verifying of scientific
results, is becoming increasingly difficult on the basis of publications alone.
將全球資訊重新組態成適合科學及學術的互動環境, 即使對出版品這種結構簡單的服務提供者,
也是相當的挑戰。而且還需面對前述的其他材料, 諸如: 多樣化數位品項、電腦程式、學習素材
等。近來,許多學術成果藉著這些新媒體呈現,由文字和圖形祗佔這些學術作品的一部份。追
溯回去,更不用說驗證科學的結果,更難以單獨以出版品為基礎。
At the outset, there seem to be no limits to the possibilities offered by a new, virtualised academic
world in the context of such forms of electronic publishing. In such a comprehensive scenario,
however, it must not be forgotten that vast quantities of data are generated that are totally inconceivable
in the analogue, nonelectronic world. Also, much of this information is not intended to be used by the
public or even by scientists or scholars in related disciplines. And not every piece of academic
information generated in such a scenario can or need be preserved and placed at the permanent disposal
of posterity. Science and scholarship have become more fugacious.
首先,在新的虛擬學術世界裡, 對電子出版似乎沒有限制的可能。從整個面向來看, 在類比的非
電子世界裡, 很難想像會生產出這麼大量的數據。此外,大部分資訊並不打算被社會大眾或相關
學科的科學家或學者使用。而且, 在這種情況下產生的學術資訊, 並不需要鉅細糜遺的永久保
存。科學和學術已變得更加無常。
In addition, the atomisation of science and scholarship into more and more sub-disciplines has made it
more and more difficult to provide interdisciplinary services for the academic community in the way
that university libraries have traditionally done. Today, only academics themselves know what
information and services they need for their work in their respective areas of research. The challenge
for information service providers will consist in offering to structure, process, and make accessible
academics' specialist knowledge with functional, generally valid information tools, be they search
engines, or tools for the administration of information, documentation, editing or communication.
此外,科學與學術越分越細, 形成愈來愈多的次學門, 大學圖書館的傳統服務, 很難再對學術社群
提供跨領域的服務。今天,只有學者才知道在各自研究領域的工作裡, 需要什麼樣的資訊和服
務。資訊服務者面臨的挑戰,以有效的資訊工具為學者提供結構化、處理過且可以近用的專業
知識, 成為他們管理資訊、文件的搜尋引或工具, 編輯或傳播這些資訊和文件。
p. 66-68
Open Access: Opportunities and challenges. A handbook [開放近用 : 機會及挑戰] / European
Commission/German Commission for UNESCO). -- Luxembourg: Office for Official Publications of
the European Communities, 2008. -- 144 pp., 14.8 x 21.0 cm. -- ISBN 978-92-79-06665-8. -- EUR
23459, http://tinyurl.com/3q8wo5