SlideShare uma empresa Scribd logo
1 de 7
Baixar para ler offline
104                                                             Inte   rcultural F aubline   s




Stubbs, Michael (1995a) 'Collocations and Semantic Profiles: On the Cause of the Trouble
     with Quantitative Studies', Functions of Language 2(l): 23-55 .
                                                                                                  8      Parallel Corpora in Thanslation Studies: Issues
------ (1995b) 'Collocations and Cultural Connotations of Common Words', Linguistics                     in Corpus Design and Analysis
   and Education 7: 379-390.
Toury, Gideon (1995) Desciptive Translation Studies     - and Beyond, Amsterdam and               FEDERICO ZANETTIN
    Philadelphia: John Benjamins.
Vanderauwera, fua (1985) Dutch Novels Translated into English: The Transformation                              Abstract: This chapter deals with the design and analysis of'translation-
    of a "Minority" Literature, Amsterdam: Rodopi.                                                             driven' corpora, i.e. principled collections of electronic texts compiled with
Wodin, Natascha (1983) Die gliiserne Stadt,Leipzig: Reclam Verlag.                                             the aim of studying translation products and processes, with special reference
Woolls, David ( 1997) Mubiconcord" Version 1.5, Birmingham: CFL Software Development.                          to parallel corpora.'The comparison and contrast ofpaired translation units
Ztirn, Unica ( 1917) Der Mann im Jasmin,Frankturt am Main and Berlin: Verlag Ullstein;                         is, of course, not new to
                                                                                                                                       translation research, but the possibility of retrieving
    translated by Malcolm Green as The House of lllnesses, 1993, and The Man of Jas'                           on a computer screen hundreds of similar contexts and their translations,
    mine & Other Texts,1994, London: Atlas Press.                                                              and the relative ease of combining this with statistical analysis and data
                                                                                                               manipulation, allow hypotheses to be tested on a larger scale as well as
                                                                                                               tentative generalilations to be made. Corpus linxuistics is seen as a
                                                                                                       .._.*^lrslbqip!.ost-wbLeh_ssryb^e-sw,ti.ed-ro^ustq;;f    a;;;jt;;;;,fi     i"";t"dir;
                                                                                                           *^Fa-idILet way^as it is app-lied to the sutdy,ottexruoniariilnrrs"i;nc
                                                                                                               in the same
                                                                                                                          cf,rpora, as well as orher ,rr".irt"ri"iirir7;;ffio,          ,o,
                                                                                                               complement other types of investigation of printed texts in translation
                                                                                                               studies. It is suggested in this chapter that issues of corpus design (which
                                                                                                               types of tefis are included, what languages are involved, which  citeria are
                                                                                                               used   for sampling, what the research aims and applications of the projects
                                                                                                               are), and corpus encoding (how the 'translation' from printed texts to
                                                                                                               electronic corpus comes about) deserve careful consideration insofar as
                                                                                                              these issues are likely to affect findings based on corpora. It is also argued
                                                                                                              that, in order to enhance and maximize the advantage which can be deived
                                                                                                              from research based on electronic texts in translation stud.ies, there is    a
                                                                                                               need for greater standardization and interchange of corpus resources.


                                                                                                 1.    Introduction
                                                                                                 In the past ten yea.rs or so there has been a growing interest in the application of
                                                                                                 computer-assisted methods of investigation to the study of translation and trans-
                                                                                                 lated texts. In narti                     istics has seryed as amethodological framework
                                                                                                 for the creation and use of what      I would like tb call     .t
                                                                                                 The corpora developed and used within this approach have quite airreienTitraiii:.
                                                                                                 teristics, depending on the aims and research interests upon which the various research
                                                                                                 projects are based, but they generally involve a comparison between two different
                                                                                                 corpus components.
                                                                                                     A first type of 'translation-driven corpus' is the monolingual comparable cor-
                                                                                                 pus, consisting of a set of translations and a comparable set of texts spontaneously
                                                                                                 created in the same language and selected according to similar design criteria. cor-
                                                                                                 pora belonging to this first type include the English Comparable Corpus (ECC),
Inte rc ultu ral F aultline s    Zanettin: Issues in Corpus Design ttntl Arutlysis
             106                                                                                                                                                                                       107



                                                                                                                 cerns which surface in the design of general-language corpora. These issues are
           developed at UMIST, Manchester under the direction of Mona Baker, and the
                                                                                                                 aspects of representativeness, diversification and encoding.
           Finnish Comparable Corpus at the University of Joensuu (see Jantunen 2000;
           Kemppanen 2000; Tirkkonen-Condit 2000). These monolingual comparable cor-
         ',p-o-{a
                  ar9 'tran.s!ati_93*{riy9l1.91-.t 11rjl4ion depeadent' in that the "non-tr_anslationaf-         2.     Representativeness in bi-directional parallel corpora
           component is modelled on the composition of the translatigl{ 1et" (Laviosa 1997:
           293). The motivations behind the construction of a corpus with these characteristics                  one of the major issues in corpus design is that of representativeness; what distin_
           are mostly theoretical; the comparison between the two components of the corpus                      guishes a corpus from a collection of electronic texts (or a text archive) is that a
           should enable investigation of the linguistic features of translated texts as opposed                corpus is put together in a principled way so as to be representative of a larger
           to those found in spontaneous text production. Since translation, is_'ig-tory-"-"*--                 textual population, in order to make it possible to generalize findings concerning
           tive event which is shaped by its own goals, pressures and context of produclion"                    that population. Thus, the most appropriate design for a corpus depends on what it
           @aker 1996: r5), tirxti pioauceo ii i ieiritt of tfiieciftiff'snbnldshow a distinc-                  is meant to represent (Biber 1993; Halverson l99g; Kennedy 199g; Biber et al.
             ti","_1ilgu.i.F,gake:up, riihiah ian   $ dt:{:gliTe:9'_gC.lir1gggg!+,"9!dg1:1                       1998). This should be remembered before making any general statements about
                                                                             -omparaule.o6g
                   A diicona type oT'transfation-Oriu"n             is q+l:lunguar                              language, texts, or translations based on corpus analysis; what is found in a corpus
     '   "    either of the t ws components i ncludes-tran
                                                          "olputt
                                                                  ated t.           cumptred-trc-feiis
             N                                               sl
                                                                       "i$Vf,ari "                              will only apply to what that particular corpus represents. The more general, large
           spontaneously produced in two languages under similar             circumstances and within           and varied the textual population to be represented, the more variables must be
           the same domains. Corpora of this kind have been created and used mainly with                        taken into account in the selection of the texts to be included in the corpus. To
    -=. ,. 3gpJ-99!_".!9f"s
                               t, mi4,4, _either for the extraction of biliagual terminology (Laffling          ensure representativeness of a certain genre, decisions must be taken as to sam_
            1992) or for the training of translators (Zanettin 1994, 1998, forthcoming; Gavioli                 pling criteria:
           and Zanettin 1997; Peters and Picchi 1998). A bilingual comparable corpus is trans-
           lation-driven in that the ultimate aim of its creation is to develop a tool and a resource              '    should translations be chosen at random from within the total population to
           for trainees and practitioners in the translation profession, and its composition is                         be represented, or should a motivated choice be made based on criteria such
           dictated by the provision of translating texts belonging to a specifrc genre. This type                      as text status and reception (e.g. prestige, readership)? Even when making a
           of corpus, when in printed rather than in electronic format, is also referred to in                          random selection, we still need to dehne the boundaries and internal catego-
           translation studies as 'parallel texts' (e.g. Piotrowska 1997; Scheffner 1998).                              ries of the total population, so all selections are bound to be subjective to a
.               A third type of'translation-driven corpus', and the one on which this chapter                           certain extent.
           will focus, is the parallel corpus, comprising a set of translations in one language                    '    Should the corpus be composed of complete texts or samples, and if the lat-
           and their respective source texts in another language. Parallel corpora can be used                      '   ter, what size and type should the samples be? There seems to be general
           for studying translated texts with two different goals in mind; they can be used by                          agreement among translation scholars that the basic unit in translation is a
           researchers to describe what translators actually do with texts and how they trans-                          ful1 text, but pracrical limitations may still lead corpus designers to opt for
           form them in the process of translation, and they can help practitioners to make                             samples.
           informed choices based on translation traditions and norms while translating or                         '    In any case, a compromise has to be reached between what is desirable and
           learning to translate. A parallel corpus can also be created to contain both directions                      what is feasible on practical grounds; constraints include availability of texts,
           of translations, thereby forming a bi-directional parallel (Aston 1999) or reciprocal                        copyright restrictions and project fundin!.
           (Teubert 1996) corpus, appearing to encompass all the different varieties described
           above. The design of such a corpus is not devoid of theoretical problems, as will be                     one of the best known parallel corpora is the English Norwegian parallel cor-
           argued in the following section. However, an examination of these problems may                       pus (ENPC), developed at the University of oslo and documented ir a number of
           also provide a means of improving the design of the different kinds of translation-                  publications (e.g. Johansson et al. 1996; Ebeling 199g). The ENpc has been taken
           driven corpora. In this chapter I will briefly exemplify some of the issues which                    as a model in a number ofprojects for other language pairs (Johansson 199g: 9-10)
            arise in the design of bi-directional parallel corpora with reference to English and                and as a starting point for the English Italian Translational corpus (cEXI) project
            Italian, focusing on some of those corpus design issues which are related to the                    at the school for Translators and Interpreters (SSLMIT) of the university of Bolo-

            specific translation-driven nature of a corpus and which may be different from con-                 gna at Forli, which is currently at the stage ofcorpus design.
Inte rc uhural F aultline s    Zanettin: Issues in Corpus Design and Analysis                                       109
108



    Figure 1, adapted from Johansson and Hofland (1994:26), shows the composi-                     tity, but also one of quality; that is to say, if we compare translated narrative fiction
tion of a bi-directional parallel corpus based on the design ofthe ENPC. Each corpus               in the two languages, we find that, while most genres are translated into Italian, and
component (CC) has a number from 1 to 4.                                                           within popular fiction (detective, romance and science-hction novels) translations
                                                                                                   may even constitute the majority of all published books, the vast majority of prose
                     Language A                          Language B
                                                                                                   fiction translated from Ttalian into English is what may be called high-quality, ,liter-
                                                                     _-l
   Translations                                      frc2                                          ary' fiction, by which I mean authors like Eco, Calvino or Tabucchi.
                                                     I Translations in    I                             This means that in a bi-directional parallel corpus of English and Italian narra-
                                                     I L-euug" e          I
                                                                                                   tive fiction, if the two translational components of t{e corpus are representative of
                                                                                                   the respective populations of translated books, strictly speaking they will not be
                         CC3                                 CC4                                   comparable with the non-translational components, i.e. the source texts for the trans-
  Source Texts     Source Texts  for                   Source Texts for                           lations in the two languages respectively. In short, for Ita1ian, mostly translated
                    Translations in                     Translations in                            'popular' fiction would be compared with original 'literary' fiction, while the re-
                                                                                                  verse would be true for English, in which translated 'literary' fiction would be
                                                                                                  compared with mostly original 'popular fiction'. Since the flow of translations and
                        Figure 1: A bi-directional parallel corpus                                the policies regarding them differ in different cultures and for different languages,
                                                                                                  it is difficult to envisage the same design for parallel corpora in different directions
In principle, such a composition should not only allow comparison of tfanslations                 of translation and in different language pairs.
and source texts (CC 1 :CC 4, CC2:CC3) but also of translations and non-translations
in the same language (CCl:CC3, CCZ:CC4), as one would do using a monolingual
comparable corpus. Furthermore, it would also seem possible to compare non-
                                                                                                  3.    Diversification in translation-driven corpora
translated texts in the two different languages (CC3:CC4), as one would using a
bilingual comparable corpus. I All three types of translation-driven corpora described            Another key issue in the design of translation-driven corpora concerns the need to
in Section 1 are then ideally represented in a bi-directional parallel corpus and, in             use different types of corpora according to researchers' aims and objectives. Full
addition, for two different languages. A closer examination, however, reveals that                 reciprocity in a bi-directional paral.lel corpus would, in fact, appear impossible, as
the different components of a bi-directional parallel corpus are hardly comparable                 the concern with representativeness clashes with the concern with comparability.
as they stand.                                                                                    Is there a way out of this apparent dead end? I believe this can only be found through
     Like the ENPC, the CEXI is a bi-directional paraliel co{pus, comprisirg both                 diversification, i.e. by using the same translated texts in corpora of different types
English translations and Italian source texts and Italian translations and English                created ad hoc according to specific design criteria. The comparison of translated
source texts. Like most translation-driven corpora, the Italian-English translational             texts with their source texts should be complemented with the comparison be-
corpus only aims at representing wrilten, published translations, and within these                tween these translated texts and a reference corpus in the same language. The ideal
only the universe of translated books (to the exclusion of journals or magazines),                translational corpus is then not a pre-formed set of texts but an open-ended corpus
rather than 'translation' in its entirefy. Translated books may represent only a small            comprising different components which allow different types of comparison to
percentage of all translated texts, but they constitute data which are more readily               be made.
available. In Italy, one in four published books is a translation, and half of these are              As an example, let us consider a simple quantitative analysis involving type/
translated from English (Vigini 1999:87). On the other hand, only about 27o - or                  token ratio for a quite straightforward single source author parallel corpus, com-
 one in fifty - of the books published in English-speaking countries such as the UK               prising five novels and a short story by salman Rushdie and their Italian published
 and the USA are translations (Venuti 7995: l2), of which only about 47o Ne from                  translations. Type/token ratio is taken to be an indicator of lexical variety in texts
 Itab.an (Index Translationum 1998). The difference is not only one of relative quan-             (Baker 1995 and 1996; Laviosa 1998a); the more word forms are used with respect
                                                                                                  to the total number of words in a text, the wider the range of vocabulary used in the
 I Yet another possibility would be to compare translated texts in two different languages        text, and more effort may be required on the part of the reader to process it than
 (CC1:CC2), i.e. acomparable bilingual corpus oftranslations (Johansson 1998: 8). This type of    texts with less varied vocabulary. Type/token ratio is also a function of the total
 comparison has not yet attracted the attention of scholars however.                              length of a text, so that to have comparable figures for texts of different length we
110                                                                              I nte   rcultural Faultline                s       Zanettin: Issue.s in Corpus Design arul Analysis
                                                                                                                                                                                                                                      11t


need a standard measure. Wordsmith Tools (Scott 1996) allows the standardized                                                           The data, however, are not directly comparable, as different values could be due
fype/token ratio to be calculated, and the base length chosen for this investigation                                                to structural differences between the two languages rather than to the process of
was the suggested 1000 words. Table 1 shows the standardized rype/token ratio for                                                   translation (Munday 1998:545). Therefore the typeftoken ratio for the texts in the
the Rushdie parallel corpus:                                                                                                        two languages has to be related to reference corpora for English and Italian.

                                                                                                                                               Corpus                                  Tokens     Types         T/T ratio
                                                              Types              T/T ratio                T/T ratio                                                                                                             T/T ratio
                                                  Tokens
                                                                                                                                                                                                                              (std 1M0J
                     Texts                   (running         (word                (whole                (std 10O0)                  Kq$Fi9,,cqs.'jri.:., . -.'f1
                                                  words)                            texts)                                          {ll:dfanifurs   d6mrj       r.'1       :j;lt iiit+iff!,,#t
                                                             .forms)                                                                                                   1


                                                                                                                                    Italian Reference comrrs                         856,00r     58.864             6.88          49.99

 I   figli della mezzanotte                   224,398        23.999                      10.69                  52.69
                                                                                                                                    Rrlshd4{drpus7""".
                                                                                                                                    (&slisli:ibffice
                                                                                                                                                          ir;
                                                                                                                                                       ffih]
                                                                                                                                                                  r.s1
                                                                                                                                                                       I

                                                                                                                                    Enslish Reference Comus                          843.629     27.46&             3.26          M44
 jMgij#,,EltiU                                                                       iiirS;:l:4             ,49
                                                                                                                                     Table 2: Type/token ratio: Rushdie parallel co{pus vs. comparable reference corpora
 I versi satanici                             796,&6         23,536                      11.97                    53.41

                                              ,                                                                t:gz*                The reference corpus for English was made up of 20 novels randomly selected from
                                                            ial.;F]q
                                                                                                                                 the written imaginative component of the British National corpus (1995). Six of
  Lliltimo sospiro del Moro                       167.630    21.843                      13.03                    53.63
                                                                                                                                                   full texts, while the remaining fourteen are extracts of about 40,000
                                                                                                                                    these novels are
                                                                                                                                 words each. The reference corpus for Italian comprises seventeen full text novels
                                                                                 iel;ffi                     ij                  by contemporary Italian writers downloaded from the Internet. Thus, the two refer-
  La vergogna                                     to7.946     15,884                     t4.11                    s3.57          ence corpora used in this experiment are only loosely comparable, as the main
                                                                                                                                 criterion for the inclusion of texts in the corpus was their availability in electronic
                                       ,{
                                                                                         1,1i9-5
                                                                                                                                 format. They should however provide useful indicative reference values.
     Harun e il mar delle stoie                    44.624       7,487                    16.78                    49.86              The overall type,/token ratio for the Italian reference corpus (49.g9) is 5.53 points
                                                                                                                                 higher than that of the English reference corpus (44.44). we can also see thar the
                                              i!*M           ::5,'5,.f.9
                                                                                         l?.6s,;    .l                           ratio for the two components of the Rushdie corpus (translations and source texts)
                                                                                                                  55.54
                                                                                                                                is higher than the average for both Italian and English. However, the difference
  Chekov e Zulu                                     s,o25        1,946                   38.73
                                                                                                                                between the source texts by Rushdie and the reference corpus for Engiish is higher
                                                             rrr:::.i,i,::i=         l- :rl.i,:,:
                                                            tEl:;654                                           ;15Oi45          (5.2) than that between the translations and the reference corpus for Italian (3.1), as
                                                                                     ',?"qiPq
                                                                                                                                can be seen in Figure 2.
     Italian trarslations (total)                 146,269    41 ,162                      6.40                    53.09
                                                                                                                                     A further check is necessary at this point to make sure that the results are statis-
                                                              ::::r::'.!l| i:i      't|i;#                l;!1-;;::49ffi        tically significant. what must be computed is the standard deviation for the two
                                                              ?l::s|,9                                     ::.;..ia ;   :
                                                                                                                                reference cofpora, that is to say, the range outside which variation between a reter-
                                                                                                                                ence corpus and the texts in the corresponding component of the Rushdie parallel
                         Table I : Type/token ratio for Rushdie parallel corpus                                                 corpus cannot be ascribed to chance. The standard rleviation is calculatecl through a
                                                                                                                                formula which correlates the average standardized type/token ratio for a sample to
                                                                                                                                the size of that sample.2 Table 3 shows the standardized type/token ratio for each
 The higher the figure obtained by dividing types by tokens, the higher the lexical
                                                                                                                                text in the parallel corpus (column 2), followed by the average standardized typel
 variety. As can be seen from the last column, the standardized type/token ratio for
 the translations is higher that that of the source texts, both as a whole and with
 regard to individual text pairs. This may appear to indicate that the vocabulary used
 in the translations is more varied than that of the source texts, and it runs counter to                                       2
                                                                                                                                    The formula used to calculate the standard cleviarion was rhe         following:
 the suggestion that translations are lexically less varied than originals as a result of                                                                                                                              E    O _ rr.
 the translation process (Baker 1996; Kohn 1996).
                                                                                                                                                                                                                    o h,,-
I
lt2                                                                        Int e rcr.t ltural F aultline s    Zanettin: Issues. in Corpus Design and Analysis
                                                                                                                                                                                                     u3

token ratio for the reference corpora (Column 3:44.44 forEnglish a1'd49.99 fot                                 high type/token ratio, while the last one, Haroun and the sea of stories, is,
                                                                                                                                                                                                at least
Italian) and the standard deviation (Column 4: +- 0.62 for English, + 0.88 for Ital-                           on one level of reading, a story for children, with a rather low type/token ratio,
                                                                                                                                                                                                    and
ian). The fifth column shows the gap between each text and the aYetage, which is,                              the difference from the average in the Italian translation is not statistically signifi-
in all but one case, significantly trigher than the standard deviation for the reference                       cant, i.e. the gap is within the range of the standard deviation. This could also
                                                                                                                                                                                                      be
corpora.                                                                                                      taken to suggest that above and below a certain threshold, there is no significant
                                                                                                              variation in fype/token ratio between translations and source texts.
  f-*'.-
  i
                                                                                                                  summing up, we can say that, for at least four novels by Rushdie, while both
  I
                                                                                                              translations and source texts have a higher tvpe/token ratio than the average for the
  160
  I


  I
                                                                                                              respective languages, the translations are much closer to this average than source
  i
                                                                                                              texts, leading to the tentative conclusion that they are lexically less varied than the
  lso
  I
                                                                                                              source texts as a result of the process of translation- It is Aue that type/token ratio
  I
                                                                                                                                                                                                       is
  iem                                                                                                         not by itself conclusive evidence of lexical variety. It should be considered along-
                                                                                                              side other indexes such as lexical density, i.e. the ratio of lexical to grammatical
  Io       lt-l
                                                                                                              words, or the ratio of hapaxes (i.e. those words occurring only once in a text) to the
      It                                                                                                      total number of words (Baker 1996; Laviosa l99gb). However, this example has
      l*
      lF
           ?n
                                                                                                              been used for methodological purposes to demonstrate that parallel corpora need
      I
      I
                                                                                                                                                                                                      to
      a
                                                                                                             be used in conjunction with other corpus resources. In this case, the use ofa refer_
      i     10
                                                                                                             ence corpus enabled us to go beyond a provisional finding based on the figures for
                                                                                                             a paraliel corpus alone.
                             English                             Italian                                          As much as analyses based on parallet corpora can profit from the comparison
                                                                                                             with data taken from monolingual reference corpora, other types oftranslation-driven
          Figure 2: Type/token ratio (std. 1,000): Rushdie parallel corpus vs. comparable                    corpora could benefit from diversification of analysis. For instance, if we were to
                                            reference corpora                                                compare the corpus of the Italian translations of Rushdie's novels with the Italian
                                                                                                             reference corpus alone, we would also have to hypothesize that (these) translations
                                                                                                             are more lexically varied than (the reference corpus of) texts spontaneously pro_
                                                                                                             duced in the same language.
                                                                                                                 The reference corpora used in this experiment were composed of non-translated
                                                                                                             texts only. However, other kinds ofreference corpora are also possible, depending
                                                                                                             on the purpose of the investigation. To be representative of a particular genre and
                                                                                                             thus reflect its norms and conventions, a reference corpus will have to be created by
                                                                                                             looking at the actual textual population for that genre, regardless of the stafus of the
                                                                                                             texts with respect to translation. Il for instance, what one wants to see is the expect
                                                                                                             ancy norrn for the language ofpopular narrative fiction in Italian, a reference corpus
                                                                                                             might well include translations rather than containing only texts spontaneously pro_
                                                                                                             duced in Italian, since expectations for this kind of texts are created partly by
 Harun e il       mr delle                                                                                   translations themselves (Chesterman 1997 : 67).

                             Table 3: Lexical variation in a parallel corpus
                                                                                                             4.   Corpus encoding
 As we can see, the difference between the English source texts and the average for
 the English reference corpus is almost double that between the Italian translations                         After texts have been selected for inclusion in    a corpus, they need to be acquired   in
 and the average for the ltalian reference corpus, except for the flrst and last parallel                    electronic format, and the process of'translating' texts from paper into digital for-
 texts. Significantly, the first text, Chekov and Zttltt, is a short story with a rather                     mat, as in all kinds of intersemiotic transiation, may involve losses as weli as gains.
I nt e rc u I t u r al   F ault line s     Zanettin: Issues in Corpus Design arul Arutlysi.s



What may be lost is useful or even essential information conceming the context of                             web; corpora encoded in this format could be accessed through the Internet,
                                                                                                                                                                                            effec_
reception of the printed texts, such as paratextual and visual information. This is                           tively constituting a large textual database. For instance, the same translated
                                                                                                                                                                                                text
why the investigations of electronic corpora should, if possible, be coupled with the                        could be part of a comparable monolingual corpus or of one or more
                                                                                                                                                                                     bilingual par_
analysis of the printed source texts, for which the electronic versions are not a sub-                       allel corpora, with different overall designs. The benefits to be derived from
                                                                                                                                                                                                 the
stitute but a complement. On the other hand, an electronic text can be 'enriched'                            encoding of texts, and specifically of translations, in accordance
                                                                                                                                                                                 with international
with linguistic and extralinguistic information, providing a means of carrying out                           standards, making possible the exchange of primary data (the
                                                                                                                                                                               electronic texts) as
analyses which would not be possible without using texts in electronic format-                               well as of secondary data (hndings based on such data) among researchers,
                                                                                                                                                                                               may
    At the linguistic level, corpora can be annotated, adding to each running word                           well compensate for the time and effort required to create ao.pu, ."aouaces
                                                                                                                                                                                             in ac_
part-of-speech tagging as well as syntactic and semantic information. The parsing                            cordance with international standards.
and tagging of the corpus can be automated to a large degree (Wilson and McEnery                                    Parallel corpora allow not only quantitative analyses of the kind
                                                                                                                                                                                               illustrated
 1996; Kennedy 1998), and computer applications have been developed to facilitate                              through the Rushdie example to be carried out. They also facilitate qualitative
                                                                                                                                                                                                     analy-
the inserlion of annotation ofdiscourse features, such as referring expressions (Biber                         ses, based on the examination of parallel concordances. A search
                                                                                                                                                                                           in an aligned
et al. 1998). A linguisticaliy annotated corpus makes possible more refined analy-                             parallel corpus could go in both directions, from source ro target
                                                                                                                                                                                    texts or vice versa.
ses; for instance, type/token ratio statistics carried out on a lemmatized corpus could                        In the first case, a concordance of a source language item or pattern would
                                                                                                                                                                                                      high_
probably provide a more accurate picture of lexical variety than figures based on                              light recurrent translation choices made by the translators; in the second
                                                                                                                                                                                                   case, a
mere word-form counts. A corpus tagged for parts of speech couid be searched for                              concordance of target language items or patterns would yield as a
                                                                                                                                                                                        result the range
just one use of a word form, i.e. to obtain a concordance of set as a noun rather than                        of source language features from which those items or patterns originated.
as an adjective or a verb.                                                                                         In any case, in order to carry out paraller concordances, parallel corpora
                                                                                                                                                                                                   need to
                                                                                                              be aligned. Alignment procedures can, to a rarge extent, be automated,
    As regards extralinguistic information, all the variables considered in the phase                                                                                                        and may be
                                                                                                              performed on the basis of statistical elaboration, taking into account
of corpus design can be encoded in each electronic text so that they can be retrieved                                                                                                     the number of
                                                                                                              sentences, words or even characters in the pairs of texts to be aligned (church
and used as selection criteria for inclusion in a different corpus. In a corpus of                                                                                                                      and
translated books, for example, coding could include bibliographical information
                                                                                                              Gale 1991), or using bilingual lexicons (peters and picchi l99g) or .anchor
                                                                                                                                                                                                      lists,
                                                                                                              (Hofland and Johansson 1998). However, while advancements
about the printed text, information about the translator and the process of transla-                                                                                              in automatic corpus
                                                                                                              alignment techniques will certainly help enhance not only machine transration
tion, and about the source text for the translation. We may, for example, want to                                                                                                                        re-
                                                                                                              search but also descriptive studies, a certain degree of interaction
select translations not only according to when they were published, but also accord-                                                                                                between machines
                                                                                                              and humans during the alignment process will not only continue to
ing to the amount of time that has elapsed between their publication and that of the                                                                                                  be necessary but
                                                                                                              may also provide a way of examining in some detail how transrations
source texts.3                                                                                                                                                                                 map onto
                                                                                                             source texts. Manual proofreading of aligned texts may help the researcher
    In order to be able to exchange and re-use corpus resources and in order to rep-                                                                                                                to ob_
                                                                                                             serve any regularities which may characteri ze a certain body of translations
licate and compare findings based on different corpora, the adoption of a common                                                                                                               as regards
                                                                                                             the segmentation, completeness and distribution of translated segments,
standard for the encoding of translated texts seems advisable. The Text Encoding                                                                                                             in order to
                                                                                                             investigate what Toury (1995: 58-59) calls .,matricial norms',.
Initiative (Sperberg-McQueen and Burnard 1994) provides useful guidelines for
                                                                                                                  The output of the alignment procedure can be either bilingual texts with
the encoding of electronic texts in accordance with the international SGML stand-                                                                                                                    alter_
                                                                                                            nating source text segments and their translations, or monolingual texts
ard (ISO 8879: 1986).4 A Corpus Encoding Standard (CES) compliant with these                                                                                                                    in which
                                                                                                            segments which correspond are numbered, pairgd, and retrieved on
guidelines has been developed and is already available in XML format (Ide and                                                                                                           the basis of an
                                                                                                            index. In order to ailow different types of comparison to be made, it
Bonhomme 2000), enabling the encoding of corpora in one or more languages, and                                                                                                                should be
                                                                                                            possible to take individual electronic texts from translation-driven
                                                                                                                                                                                      corpora, includ_
also of parallel corpora. XML is an abbreviated version of SGML for use on the                              ing parallel co{pora, and re-use them in different projects. The Corpus Encoding
                                                                                                            Standard mentioned above makes possible the creation of parallel
                                                                                                                                                                                     corpora in which
                                                                                                            translations and dource texts are encoded as individual entities and from
                                                                                                                                                                                              which the
r The Translational English Corpus (TEC) header (Baker 1999) includes many of                     these     retrieval of paraliel concordances is carried out on the basis of an external index
specifications.                                                                                             generated during the alignment process. The same source text could
ilnternational Organization for Standardization, /,lO 8879: Information processing Text and                                                                                            thus be used in
                                                                                  -                         a monolingual comparable colpus, aligned with translations in
                                                                                                                                                                                 different languages
office systems Standard Generalized Markup l"angtnge (SGML), ([Geneva]: ISO, 1986).                         or with different translations in the same language.
I
                                                                 Inte   rcuhural F aultline   s    Zanettin: Issues in Corpus Design and Analysis                                        117
116



5.     Conclusion
                                                                                                   Gavioli, Laura and Federico zanetin (1997) 'comparable corpora and rranslation: A
                                                                                                      Pedagogic Perspective', http://www.sslmit.unibo.itlcultpaps/
                                                                                                                                                                 (last checked on 15 sep-
                                                                                                      tember 2000).
Computerized parallel corpora together with other types of translation-driven cor-                 Halverson, Sandra (1998) 'Translations Studies and Representative Corpora: Establish-
pora can be an invaluable resource in both descriptive and applied translation studies,               ing Links between Translation corpora, Theoretical/Descriptive Categories and a
in that they allow the investigation of linguistic and extralinguistic features of trans-             Conception of the Object of Study' , Meta 43(4): 494-514.
lated texts on a much larger scale than can be achieved by manual analysis ofprinted              Hofland, Knut and Stig Johansson (1998) 'The Translation corpus Aligner: A program
texts. However, as more and more translation-driven corpora are created, there is a                   for Automatic Alignment of Parallel rexts', in Stig Johansson and Signe oksefjell
need for the criteria adopted in the design of these corpora to be carefully consid-                  (eds.) corpora and cross-Linguistic Research: Theory, Method, an^d case studies,
ered and made transparent, so tiat research can be replicated and findings can be                     Amsterdam and Atlanta: Rodopi, 87-100.
compared and evaluated. Compiling corpora can involve a lot of time and effort,                   Ide, Nancy and Patrice Bonhomme (2000) xML corpus Encoding standard Document
                                                                                                      xcES 0.2, http://www-cs.vassar.edurXCES/ (lasr checked on 15 September 2000).
and in order to avoid ending up with strands of isolated experiments, we should
                                                                                                  Index Translationum, 5d edition, UNESCO 1998.
make sure that the maximum advantage can be derived from ttre enterprise. The
                                                                                                  Jantunen, Jarmo (2000) 'what can corpora Tell us about Translated Language: A com-
criteria used in selecting the texts to be included in a corpus, and those adopted in                 parable Corpus of Finnish in use for Making Hypotheses', paper presented at the
preparing them for use with corpus soffware, e.g. the procedures for alignment of a                   uMlsrrucl- Research Models in Translation Studies conference, Manchester, 2g-
parallel corpus, have implications for the findings based on the corpora themselves.                 30 April 2000.
I would like to suggest that corpus design and encoding should not be seen as merely              Johansson, Stig and Knut Hofland (1994) 'Towards an English-Norwegian paraliel Cor-
preliminary to the actual analysis of a corpus, but as important moments in them-                    pus", in udo Fries, Gunnel rottie and Peter Schneider (eds.) creating and {Jsing
selves in the study of translation and translated texts.                                              English Language Corpora, Amsterdam: Rodopi. 25-37.
                                                                                                  ----,   Jarle Ebeling and Knur Hofland (1996) 'coding and Aligning the English-Norwe-
                                                                                                      gian Parailel Corpus', in Karin Aijmer, Bengt Altenberg and Mats Johansson (eds.)
                                                                                                      Languages in Contrast, Lund: Lund University Press, 87-112.
References                                                                                        ------ (1998) 'on the Role of Corpora in cross-Linguistic Research', in stig Johansson
                                                                                                      and signe oksefiell (eds.) corpora and cross-Linguistic Research: Theory, Method.,
Aston, Guy (1999) 'Corpus Use and Learning to Translate', TextusXll(2),289-314.                       and Case Studies, Amsterdam and Atlanta: Rodopi, 3-24.
Baker, Mona (1995) 'Corpora in Translation Studies: An Overview and Some Sugges-                  Kemppanen, Hannu (2000) 'Looking for Evaluative Keywords in Authentic and rrans-
    tions for Future Research', T ar g e t 7 (2) : 223 -243.                                          lated Finnish: Corpus Research on Finnish History Texts', paper presented at the
------ (1996) 'Corpus-Based Translation Studies: The Challenges that Lie Ahead', in                   UMISTTCL Research Models in Translation studies conference, Manchester, 2g-
    Harold Somers (ed.) Terminology, LSP, and Translation: Studies in Language En-
                                                                                                     30   April   2000.
    gineeing inHonour of Juan C. Sager, Amsterdam and Philadelphia: John Benjamins,
                                                                                                  Kennedy, Graeme (1998) An Introduction to Corpus Lingr jstlcs, London and New york:
      175-186.
                                                                                                      Longman.
------ (1999) 'The Role of Corpora in Investigating the Linguistic Behaviour of Profes-
                                                                                                  Kohn, Jiinos (1996) 'what Can (Corpus) Linguistics Do for Translation?', in Kinga
    sional Translators' , International Journal of Corpus Linguistics 4(2): 1-18.
                                                                                                      KJaudy, Josd Lambert and Anik6 Sohdr (eds.) Translation studies in Hungary,Bl-
Biber, Douglas (1993) 'Representativeness in Corpus Design', Literary and Linguistic
                                                                                                      dapest: Scholastica, 39 -52.
      Computing 8(4): 243-257   .
                                                                                                  Laffling, John (1992) 'on consrructing a Transfer Dictionary for Man and Machine',
------, Susan Conrad and Randi Reppen (1998) Corpus Linguistics: Investigating Lan-
    guage Structure and Use, Cambridge: Cambridge University Press.
                                                                                                      Target 4(1): 11-31.
                                                                                                  Laviosa, Sara (1997) 'How Comparable Can 'Comparable Corpora' Be?',Target 9(2)
BriishNational Corpus,version 1.0, 1995, Oxford: OxfordUniversity Computing Services.
                                                                                                      289-319.
Chesterman, Andrew (7997) Memes of Translation: The Spread of Ideas in Translation
   Theory, Amsterdam and Philadelphia: John Benjamins.                                            ----- (1998a) 'The English Comparable Corpus: a Resource and a Methodology',, in
Ebeling, Jarte (1998) 'Contrastive Linguistics, Translation, and Parallel Corpora' , Meta             Lynrre Bowker, Michael Cronin, Dorothy Kenny and Jennifer pearson (ed,s.) Unity
      43(4):602-615.                                                                                  in Diversity: current Trends in Translation studies, Manchester: St. Jerome pub-
 Church, Kenneth W. and William A. Gale (1991) 'Concordances for Parallel Texts', in                 lishing, l0l-112.
    Using Corpora: Proceedings of the 7'h Annual Conference of the UW for the New                 ------ (1998b) 'Core Patterns of Lexical Use in   a Comparable Corpus   of English Narra-
    OED andText Research, Oxford, Oxford University Press, 40-62.                                    tive Prose', Meta 43(4): 551-570.

Mais conteúdo relacionado

Semelhante a Zanettin

PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
butest
 
PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
butest
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Fulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
University of Bari (Italy)
 

Semelhante a Zanettin (20)

PhD Poster
PhD PosterPhD Poster
PhD Poster
 
PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
 
PA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.docPA5-2_iconf08.doc.doc
PA5-2_iconf08.doc.doc
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital Libraries
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Ana's dissertation workshop 2
Ana's dissertation workshop 2Ana's dissertation workshop 2
Ana's dissertation workshop 2
 
Translation studies: Simplification and Explicitation Universals
Translation studies: Simplification and Explicitation UniversalsTranslation studies: Simplification and Explicitation Universals
Translation studies: Simplification and Explicitation Universals
 
Dr.saleem gul assignment summary
Dr.saleem gul assignment summaryDr.saleem gul assignment summary
Dr.saleem gul assignment summary
 
A Citation Centric Annotation Scheme For Scientific Articles
A Citation Centric Annotation Scheme For Scientific ArticlesA Citation Centric Annotation Scheme For Scientific Articles
A Citation Centric Annotation Scheme For Scientific Articles
 
A Bridge Not too Far
A Bridge Not too FarA Bridge Not too Far
A Bridge Not too Far
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
CL2009_ANNIS_pre
CL2009_ANNIS_preCL2009_ANNIS_pre
CL2009_ANNIS_pre
 
CL2009_ANNIS_pre
CL2009_ANNIS_preCL2009_ANNIS_pre
CL2009_ANNIS_pre
 
E43022023
E43022023E43022023
E43022023
 
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
 
Dictionary based concept mining an application for turkish
Dictionary based concept mining  an application for turkishDictionary based concept mining  an application for turkish
Dictionary based concept mining an application for turkish
 
Article - An Annotated Translation of How to Succeed as a Freelance Translato...
Article - An Annotated Translation of How to Succeed as a Freelance Translato...Article - An Annotated Translation of How to Succeed as a Freelance Translato...
Article - An Annotated Translation of How to Succeed as a Freelance Translato...
 
2015.ESP
2015.ESP2015.ESP
2015.ESP
 

Último

Último (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 

Zanettin

  • 1. 104 Inte rcultural F aubline s Stubbs, Michael (1995a) 'Collocations and Semantic Profiles: On the Cause of the Trouble with Quantitative Studies', Functions of Language 2(l): 23-55 . 8 Parallel Corpora in Thanslation Studies: Issues ------ (1995b) 'Collocations and Cultural Connotations of Common Words', Linguistics in Corpus Design and Analysis and Education 7: 379-390. Toury, Gideon (1995) Desciptive Translation Studies - and Beyond, Amsterdam and FEDERICO ZANETTIN Philadelphia: John Benjamins. Vanderauwera, fua (1985) Dutch Novels Translated into English: The Transformation Abstract: This chapter deals with the design and analysis of'translation- of a "Minority" Literature, Amsterdam: Rodopi. driven' corpora, i.e. principled collections of electronic texts compiled with Wodin, Natascha (1983) Die gliiserne Stadt,Leipzig: Reclam Verlag. the aim of studying translation products and processes, with special reference Woolls, David ( 1997) Mubiconcord" Version 1.5, Birmingham: CFL Software Development. to parallel corpora.'The comparison and contrast ofpaired translation units Ztirn, Unica ( 1917) Der Mann im Jasmin,Frankturt am Main and Berlin: Verlag Ullstein; is, of course, not new to translation research, but the possibility of retrieving translated by Malcolm Green as The House of lllnesses, 1993, and The Man of Jas' on a computer screen hundreds of similar contexts and their translations, mine & Other Texts,1994, London: Atlas Press. and the relative ease of combining this with statistical analysis and data manipulation, allow hypotheses to be tested on a larger scale as well as tentative generalilations to be made. Corpus linxuistics is seen as a .._.*^lrslbqip!.ost-wbLeh_ssryb^e-sw,ti.ed-ro^ustq;;f a;;;jt;;;;,fi i"";t"dir; *^Fa-idILet way^as it is app-lied to the sutdy,ottexruoniariilnrrs"i;nc in the same cf,rpora, as well as orher ,rr".irt"ri"iirir7;;ffio, ,o, complement other types of investigation of printed texts in translation studies. It is suggested in this chapter that issues of corpus design (which types of tefis are included, what languages are involved, which citeria are used for sampling, what the research aims and applications of the projects are), and corpus encoding (how the 'translation' from printed texts to electronic corpus comes about) deserve careful consideration insofar as these issues are likely to affect findings based on corpora. It is also argued that, in order to enhance and maximize the advantage which can be deived from research based on electronic texts in translation stud.ies, there is a need for greater standardization and interchange of corpus resources. 1. Introduction In the past ten yea.rs or so there has been a growing interest in the application of computer-assisted methods of investigation to the study of translation and trans- lated texts. In narti istics has seryed as amethodological framework for the creation and use of what I would like tb call .t The corpora developed and used within this approach have quite airreienTitraiii:. teristics, depending on the aims and research interests upon which the various research projects are based, but they generally involve a comparison between two different corpus components. A first type of 'translation-driven corpus' is the monolingual comparable cor- pus, consisting of a set of translations and a comparable set of texts spontaneously created in the same language and selected according to similar design criteria. cor- pora belonging to this first type include the English Comparable Corpus (ECC),
  • 2. Inte rc ultu ral F aultline s Zanettin: Issues in Corpus Design ttntl Arutlysis 106 107 cerns which surface in the design of general-language corpora. These issues are developed at UMIST, Manchester under the direction of Mona Baker, and the aspects of representativeness, diversification and encoding. Finnish Comparable Corpus at the University of Joensuu (see Jantunen 2000; Kemppanen 2000; Tirkkonen-Condit 2000). These monolingual comparable cor- ',p-o-{a ar9 'tran.s!ati_93*{riy9l1.91-.t 11rjl4ion depeadent' in that the "non-tr_anslationaf- 2. Representativeness in bi-directional parallel corpora component is modelled on the composition of the translatigl{ 1et" (Laviosa 1997: 293). The motivations behind the construction of a corpus with these characteristics one of the major issues in corpus design is that of representativeness; what distin_ are mostly theoretical; the comparison between the two components of the corpus guishes a corpus from a collection of electronic texts (or a text archive) is that a should enable investigation of the linguistic features of translated texts as opposed corpus is put together in a principled way so as to be representative of a larger to those found in spontaneous text production. Since translation, is_'ig-tory-"-"*-- textual population, in order to make it possible to generalize findings concerning tive event which is shaped by its own goals, pressures and context of produclion" that population. Thus, the most appropriate design for a corpus depends on what it @aker 1996: r5), tirxti pioauceo ii i ieiritt of tfiieciftiff'snbnldshow a distinc- is meant to represent (Biber 1993; Halverson l99g; Kennedy 199g; Biber et al. ti","_1ilgu.i.F,gake:up, riihiah ian $ dt:{:gliTe:9'_gC.lir1gggg!+,"9!dg1:1 1998). This should be remembered before making any general statements about -omparaule.o6g A diicona type oT'transfation-Oriu"n is q+l:lunguar language, texts, or translations based on corpus analysis; what is found in a corpus ' " either of the t ws components i ncludes-tran "olputt ated t. cumptred-trc-feiis N sl "i$Vf,ari " will only apply to what that particular corpus represents. The more general, large spontaneously produced in two languages under similar circumstances and within and varied the textual population to be represented, the more variables must be the same domains. Corpora of this kind have been created and used mainly with taken into account in the selection of the texts to be included in the corpus. To -=. ,. 3gpJ-99!_".!9f"s t, mi4,4, _either for the extraction of biliagual terminology (Laffling ensure representativeness of a certain genre, decisions must be taken as to sam_ 1992) or for the training of translators (Zanettin 1994, 1998, forthcoming; Gavioli pling criteria: and Zanettin 1997; Peters and Picchi 1998). A bilingual comparable corpus is trans- lation-driven in that the ultimate aim of its creation is to develop a tool and a resource ' should translations be chosen at random from within the total population to for trainees and practitioners in the translation profession, and its composition is be represented, or should a motivated choice be made based on criteria such dictated by the provision of translating texts belonging to a specifrc genre. This type as text status and reception (e.g. prestige, readership)? Even when making a of corpus, when in printed rather than in electronic format, is also referred to in random selection, we still need to dehne the boundaries and internal catego- translation studies as 'parallel texts' (e.g. Piotrowska 1997; Scheffner 1998). ries of the total population, so all selections are bound to be subjective to a . A third type of'translation-driven corpus', and the one on which this chapter certain extent. will focus, is the parallel corpus, comprising a set of translations in one language ' Should the corpus be composed of complete texts or samples, and if the lat- and their respective source texts in another language. Parallel corpora can be used ' ter, what size and type should the samples be? There seems to be general for studying translated texts with two different goals in mind; they can be used by agreement among translation scholars that the basic unit in translation is a researchers to describe what translators actually do with texts and how they trans- ful1 text, but pracrical limitations may still lead corpus designers to opt for form them in the process of translation, and they can help practitioners to make samples. informed choices based on translation traditions and norms while translating or ' In any case, a compromise has to be reached between what is desirable and learning to translate. A parallel corpus can also be created to contain both directions what is feasible on practical grounds; constraints include availability of texts, of translations, thereby forming a bi-directional parallel (Aston 1999) or reciprocal copyright restrictions and project fundin!. (Teubert 1996) corpus, appearing to encompass all the different varieties described above. The design of such a corpus is not devoid of theoretical problems, as will be one of the best known parallel corpora is the English Norwegian parallel cor- argued in the following section. However, an examination of these problems may pus (ENPC), developed at the University of oslo and documented ir a number of also provide a means of improving the design of the different kinds of translation- publications (e.g. Johansson et al. 1996; Ebeling 199g). The ENpc has been taken driven corpora. In this chapter I will briefly exemplify some of the issues which as a model in a number ofprojects for other language pairs (Johansson 199g: 9-10) arise in the design of bi-directional parallel corpora with reference to English and and as a starting point for the English Italian Translational corpus (cEXI) project Italian, focusing on some of those corpus design issues which are related to the at the school for Translators and Interpreters (SSLMIT) of the university of Bolo- specific translation-driven nature of a corpus and which may be different from con- gna at Forli, which is currently at the stage ofcorpus design.
  • 3. Inte rc uhural F aultline s Zanettin: Issues in Corpus Design and Analysis 109 108 Figure 1, adapted from Johansson and Hofland (1994:26), shows the composi- tity, but also one of quality; that is to say, if we compare translated narrative fiction tion of a bi-directional parallel corpus based on the design ofthe ENPC. Each corpus in the two languages, we find that, while most genres are translated into Italian, and component (CC) has a number from 1 to 4. within popular fiction (detective, romance and science-hction novels) translations may even constitute the majority of all published books, the vast majority of prose Language A Language B fiction translated from Ttalian into English is what may be called high-quality, ,liter- _-l Translations frc2 ary' fiction, by which I mean authors like Eco, Calvino or Tabucchi. I Translations in I This means that in a bi-directional parallel corpus of English and Italian narra- I L-euug" e I tive fiction, if the two translational components of t{e corpus are representative of the respective populations of translated books, strictly speaking they will not be CC3 CC4 comparable with the non-translational components, i.e. the source texts for the trans- Source Texts Source Texts for Source Texts for lations in the two languages respectively. In short, for Ita1ian, mostly translated Translations in Translations in 'popular' fiction would be compared with original 'literary' fiction, while the re- verse would be true for English, in which translated 'literary' fiction would be compared with mostly original 'popular fiction'. Since the flow of translations and Figure 1: A bi-directional parallel corpus the policies regarding them differ in different cultures and for different languages, it is difficult to envisage the same design for parallel corpora in different directions In principle, such a composition should not only allow comparison of tfanslations of translation and in different language pairs. and source texts (CC 1 :CC 4, CC2:CC3) but also of translations and non-translations in the same language (CCl:CC3, CCZ:CC4), as one would do using a monolingual comparable corpus. Furthermore, it would also seem possible to compare non- 3. Diversification in translation-driven corpora translated texts in the two different languages (CC3:CC4), as one would using a bilingual comparable corpus. I All three types of translation-driven corpora described Another key issue in the design of translation-driven corpora concerns the need to in Section 1 are then ideally represented in a bi-directional parallel corpus and, in use different types of corpora according to researchers' aims and objectives. Full addition, for two different languages. A closer examination, however, reveals that reciprocity in a bi-directional paral.lel corpus would, in fact, appear impossible, as the different components of a bi-directional parallel corpus are hardly comparable the concern with representativeness clashes with the concern with comparability. as they stand. Is there a way out of this apparent dead end? I believe this can only be found through Like the ENPC, the CEXI is a bi-directional paraliel co{pus, comprisirg both diversification, i.e. by using the same translated texts in corpora of different types English translations and Italian source texts and Italian translations and English created ad hoc according to specific design criteria. The comparison of translated source texts. Like most translation-driven corpora, the Italian-English translational texts with their source texts should be complemented with the comparison be- corpus only aims at representing wrilten, published translations, and within these tween these translated texts and a reference corpus in the same language. The ideal only the universe of translated books (to the exclusion of journals or magazines), translational corpus is then not a pre-formed set of texts but an open-ended corpus rather than 'translation' in its entirefy. Translated books may represent only a small comprising different components which allow different types of comparison to percentage of all translated texts, but they constitute data which are more readily be made. available. In Italy, one in four published books is a translation, and half of these are As an example, let us consider a simple quantitative analysis involving type/ translated from English (Vigini 1999:87). On the other hand, only about 27o - or token ratio for a quite straightforward single source author parallel corpus, com- one in fifty - of the books published in English-speaking countries such as the UK prising five novels and a short story by salman Rushdie and their Italian published and the USA are translations (Venuti 7995: l2), of which only about 47o Ne from translations. Type/token ratio is taken to be an indicator of lexical variety in texts Itab.an (Index Translationum 1998). The difference is not only one of relative quan- (Baker 1995 and 1996; Laviosa 1998a); the more word forms are used with respect to the total number of words in a text, the wider the range of vocabulary used in the I Yet another possibility would be to compare translated texts in two different languages text, and more effort may be required on the part of the reader to process it than (CC1:CC2), i.e. acomparable bilingual corpus oftranslations (Johansson 1998: 8). This type of texts with less varied vocabulary. Type/token ratio is also a function of the total comparison has not yet attracted the attention of scholars however. length of a text, so that to have comparable figures for texts of different length we
  • 4. 110 I nte rcultural Faultline s Zanettin: Issue.s in Corpus Design arul Analysis 11t need a standard measure. Wordsmith Tools (Scott 1996) allows the standardized The data, however, are not directly comparable, as different values could be due fype/token ratio to be calculated, and the base length chosen for this investigation to structural differences between the two languages rather than to the process of was the suggested 1000 words. Table 1 shows the standardized rype/token ratio for translation (Munday 1998:545). Therefore the typeftoken ratio for the texts in the the Rushdie parallel corpus: two languages has to be related to reference corpora for English and Italian. Corpus Tokens Types T/T ratio Types T/T ratio T/T ratio T/T ratio Tokens (std 1M0J Texts (running (word (whole (std 10O0) Kq$Fi9,,cqs.'jri.:., . -.'f1 words) texts) {ll:dfanifurs d6mrj r.'1 :j;lt iiit+iff!,,#t .forms) 1 Italian Reference comrrs 856,00r 58.864 6.88 49.99 I figli della mezzanotte 224,398 23.999 10.69 52.69 Rrlshd4{drpus7""". (&slisli:ibffice ir; ffih] r.s1 I Enslish Reference Comus 843.629 27.46& 3.26 M44 jMgij#,,EltiU iiirS;:l:4 ,49 Table 2: Type/token ratio: Rushdie parallel co{pus vs. comparable reference corpora I versi satanici 796,&6 23,536 11.97 53.41 , t:gz* The reference corpus for English was made up of 20 novels randomly selected from ial.;F]q the written imaginative component of the British National corpus (1995). Six of Lliltimo sospiro del Moro 167.630 21.843 13.03 53.63 full texts, while the remaining fourteen are extracts of about 40,000 these novels are words each. The reference corpus for Italian comprises seventeen full text novels iel;ffi ij by contemporary Italian writers downloaded from the Internet. Thus, the two refer- La vergogna to7.946 15,884 t4.11 s3.57 ence corpora used in this experiment are only loosely comparable, as the main criterion for the inclusion of texts in the corpus was their availability in electronic ,{ 1,1i9-5 format. They should however provide useful indicative reference values. Harun e il mar delle stoie 44.624 7,487 16.78 49.86 The overall type,/token ratio for the Italian reference corpus (49.g9) is 5.53 points higher than that of the English reference corpus (44.44). we can also see thar the i!*M ::5,'5,.f.9 l?.6s,; .l ratio for the two components of the Rushdie corpus (translations and source texts) 55.54 is higher than the average for both Italian and English. However, the difference Chekov e Zulu s,o25 1,946 38.73 between the source texts by Rushdie and the reference corpus for Engiish is higher rrr:::.i,i,::i= l- :rl.i,:,: tEl:;654 ;15Oi45 (5.2) than that between the translations and the reference corpus for Italian (3.1), as ',?"qiPq can be seen in Figure 2. Italian trarslations (total) 146,269 41 ,162 6.40 53.09 A further check is necessary at this point to make sure that the results are statis- ::::r::'.!l| i:i 't|i;# l;!1-;;::49ffi tically significant. what must be computed is the standard deviation for the two ?l::s|,9 ::.;..ia ; : reference cofpora, that is to say, the range outside which variation between a reter- ence corpus and the texts in the corresponding component of the Rushdie parallel Table I : Type/token ratio for Rushdie parallel corpus corpus cannot be ascribed to chance. The standard rleviation is calculatecl through a formula which correlates the average standardized type/token ratio for a sample to the size of that sample.2 Table 3 shows the standardized type/token ratio for each The higher the figure obtained by dividing types by tokens, the higher the lexical text in the parallel corpus (column 2), followed by the average standardized typel variety. As can be seen from the last column, the standardized type/token ratio for the translations is higher that that of the source texts, both as a whole and with regard to individual text pairs. This may appear to indicate that the vocabulary used in the translations is more varied than that of the source texts, and it runs counter to 2 The formula used to calculate the standard cleviarion was rhe following: the suggestion that translations are lexically less varied than originals as a result of E O _ rr. the translation process (Baker 1996; Kohn 1996). o h,,-
  • 5. I lt2 Int e rcr.t ltural F aultline s Zanettin: Issues. in Corpus Design and Analysis u3 token ratio for the reference corpora (Column 3:44.44 forEnglish a1'd49.99 fot high type/token ratio, while the last one, Haroun and the sea of stories, is, at least Italian) and the standard deviation (Column 4: +- 0.62 for English, + 0.88 for Ital- on one level of reading, a story for children, with a rather low type/token ratio, and ian). The fifth column shows the gap between each text and the aYetage, which is, the difference from the average in the Italian translation is not statistically signifi- in all but one case, significantly trigher than the standard deviation for the reference cant, i.e. the gap is within the range of the standard deviation. This could also be corpora. taken to suggest that above and below a certain threshold, there is no significant variation in fype/token ratio between translations and source texts. f-*'.- i summing up, we can say that, for at least four novels by Rushdie, while both I translations and source texts have a higher tvpe/token ratio than the average for the 160 I I respective languages, the translations are much closer to this average than source i texts, leading to the tentative conclusion that they are lexically less varied than the lso I source texts as a result of the process of translation- It is Aue that type/token ratio I is iem not by itself conclusive evidence of lexical variety. It should be considered along- side other indexes such as lexical density, i.e. the ratio of lexical to grammatical Io lt-l words, or the ratio of hapaxes (i.e. those words occurring only once in a text) to the It total number of words (Baker 1996; Laviosa l99gb). However, this example has l* lF ?n been used for methodological purposes to demonstrate that parallel corpora need I I to a be used in conjunction with other corpus resources. In this case, the use ofa refer_ i 10 ence corpus enabled us to go beyond a provisional finding based on the figures for a paraliel corpus alone. English Italian As much as analyses based on parallet corpora can profit from the comparison with data taken from monolingual reference corpora, other types oftranslation-driven Figure 2: Type/token ratio (std. 1,000): Rushdie parallel corpus vs. comparable corpora could benefit from diversification of analysis. For instance, if we were to reference corpora compare the corpus of the Italian translations of Rushdie's novels with the Italian reference corpus alone, we would also have to hypothesize that (these) translations are more lexically varied than (the reference corpus of) texts spontaneously pro_ duced in the same language. The reference corpora used in this experiment were composed of non-translated texts only. However, other kinds ofreference corpora are also possible, depending on the purpose of the investigation. To be representative of a particular genre and thus reflect its norms and conventions, a reference corpus will have to be created by looking at the actual textual population for that genre, regardless of the stafus of the texts with respect to translation. Il for instance, what one wants to see is the expect ancy norrn for the language ofpopular narrative fiction in Italian, a reference corpus might well include translations rather than containing only texts spontaneously pro_ duced in Italian, since expectations for this kind of texts are created partly by Harun e il mr delle translations themselves (Chesterman 1997 : 67). Table 3: Lexical variation in a parallel corpus 4. Corpus encoding As we can see, the difference between the English source texts and the average for the English reference corpus is almost double that between the Italian translations After texts have been selected for inclusion in a corpus, they need to be acquired in and the average for the ltalian reference corpus, except for the flrst and last parallel electronic format, and the process of'translating' texts from paper into digital for- texts. Significantly, the first text, Chekov and Zttltt, is a short story with a rather mat, as in all kinds of intersemiotic transiation, may involve losses as weli as gains.
  • 6. I nt e rc u I t u r al F ault line s Zanettin: Issues in Corpus Design arul Arutlysi.s What may be lost is useful or even essential information conceming the context of web; corpora encoded in this format could be accessed through the Internet, effec_ reception of the printed texts, such as paratextual and visual information. This is tively constituting a large textual database. For instance, the same translated text why the investigations of electronic corpora should, if possible, be coupled with the could be part of a comparable monolingual corpus or of one or more bilingual par_ analysis of the printed source texts, for which the electronic versions are not a sub- allel corpora, with different overall designs. The benefits to be derived from the stitute but a complement. On the other hand, an electronic text can be 'enriched' encoding of texts, and specifically of translations, in accordance with international with linguistic and extralinguistic information, providing a means of carrying out standards, making possible the exchange of primary data (the electronic texts) as analyses which would not be possible without using texts in electronic format- well as of secondary data (hndings based on such data) among researchers, may At the linguistic level, corpora can be annotated, adding to each running word well compensate for the time and effort required to create ao.pu, ."aouaces in ac_ part-of-speech tagging as well as syntactic and semantic information. The parsing cordance with international standards. and tagging of the corpus can be automated to a large degree (Wilson and McEnery Parallel corpora allow not only quantitative analyses of the kind illustrated 1996; Kennedy 1998), and computer applications have been developed to facilitate through the Rushdie example to be carried out. They also facilitate qualitative analy- the inserlion of annotation ofdiscourse features, such as referring expressions (Biber ses, based on the examination of parallel concordances. A search in an aligned et al. 1998). A linguisticaliy annotated corpus makes possible more refined analy- parallel corpus could go in both directions, from source ro target texts or vice versa. ses; for instance, type/token ratio statistics carried out on a lemmatized corpus could In the first case, a concordance of a source language item or pattern would high_ probably provide a more accurate picture of lexical variety than figures based on light recurrent translation choices made by the translators; in the second case, a mere word-form counts. A corpus tagged for parts of speech couid be searched for concordance of target language items or patterns would yield as a result the range just one use of a word form, i.e. to obtain a concordance of set as a noun rather than of source language features from which those items or patterns originated. as an adjective or a verb. In any case, in order to carry out paraller concordances, parallel corpora need to be aligned. Alignment procedures can, to a rarge extent, be automated, As regards extralinguistic information, all the variables considered in the phase and may be performed on the basis of statistical elaboration, taking into account of corpus design can be encoded in each electronic text so that they can be retrieved the number of sentences, words or even characters in the pairs of texts to be aligned (church and used as selection criteria for inclusion in a different corpus. In a corpus of and translated books, for example, coding could include bibliographical information Gale 1991), or using bilingual lexicons (peters and picchi l99g) or .anchor lists, (Hofland and Johansson 1998). However, while advancements about the printed text, information about the translator and the process of transla- in automatic corpus alignment techniques will certainly help enhance not only machine transration tion, and about the source text for the translation. We may, for example, want to re- search but also descriptive studies, a certain degree of interaction select translations not only according to when they were published, but also accord- between machines and humans during the alignment process will not only continue to ing to the amount of time that has elapsed between their publication and that of the be necessary but may also provide a way of examining in some detail how transrations source texts.3 map onto source texts. Manual proofreading of aligned texts may help the researcher In order to be able to exchange and re-use corpus resources and in order to rep- to ob_ serve any regularities which may characteri ze a certain body of translations licate and compare findings based on different corpora, the adoption of a common as regards the segmentation, completeness and distribution of translated segments, standard for the encoding of translated texts seems advisable. The Text Encoding in order to investigate what Toury (1995: 58-59) calls .,matricial norms',. Initiative (Sperberg-McQueen and Burnard 1994) provides useful guidelines for The output of the alignment procedure can be either bilingual texts with the encoding of electronic texts in accordance with the international SGML stand- alter_ nating source text segments and their translations, or monolingual texts ard (ISO 8879: 1986).4 A Corpus Encoding Standard (CES) compliant with these in which segments which correspond are numbered, pairgd, and retrieved on guidelines has been developed and is already available in XML format (Ide and the basis of an index. In order to ailow different types of comparison to be made, it Bonhomme 2000), enabling the encoding of corpora in one or more languages, and should be possible to take individual electronic texts from translation-driven corpora, includ_ also of parallel corpora. XML is an abbreviated version of SGML for use on the ing parallel co{pora, and re-use them in different projects. The Corpus Encoding Standard mentioned above makes possible the creation of parallel corpora in which translations and dource texts are encoded as individual entities and from which the r The Translational English Corpus (TEC) header (Baker 1999) includes many of these retrieval of paraliel concordances is carried out on the basis of an external index specifications. generated during the alignment process. The same source text could ilnternational Organization for Standardization, /,lO 8879: Information processing Text and thus be used in - a monolingual comparable colpus, aligned with translations in different languages office systems Standard Generalized Markup l"angtnge (SGML), ([Geneva]: ISO, 1986). or with different translations in the same language.
  • 7. I Inte rcuhural F aultline s Zanettin: Issues in Corpus Design and Analysis 117 116 5. Conclusion Gavioli, Laura and Federico zanetin (1997) 'comparable corpora and rranslation: A Pedagogic Perspective', http://www.sslmit.unibo.itlcultpaps/ (last checked on 15 sep- tember 2000). Computerized parallel corpora together with other types of translation-driven cor- Halverson, Sandra (1998) 'Translations Studies and Representative Corpora: Establish- pora can be an invaluable resource in both descriptive and applied translation studies, ing Links between Translation corpora, Theoretical/Descriptive Categories and a in that they allow the investigation of linguistic and extralinguistic features of trans- Conception of the Object of Study' , Meta 43(4): 494-514. lated texts on a much larger scale than can be achieved by manual analysis ofprinted Hofland, Knut and Stig Johansson (1998) 'The Translation corpus Aligner: A program texts. However, as more and more translation-driven corpora are created, there is a for Automatic Alignment of Parallel rexts', in Stig Johansson and Signe oksefjell need for the criteria adopted in the design of these corpora to be carefully consid- (eds.) corpora and cross-Linguistic Research: Theory, Method, an^d case studies, ered and made transparent, so tiat research can be replicated and findings can be Amsterdam and Atlanta: Rodopi, 87-100. compared and evaluated. Compiling corpora can involve a lot of time and effort, Ide, Nancy and Patrice Bonhomme (2000) xML corpus Encoding standard Document xcES 0.2, http://www-cs.vassar.edurXCES/ (lasr checked on 15 September 2000). and in order to avoid ending up with strands of isolated experiments, we should Index Translationum, 5d edition, UNESCO 1998. make sure that the maximum advantage can be derived from ttre enterprise. The Jantunen, Jarmo (2000) 'what can corpora Tell us about Translated Language: A com- criteria used in selecting the texts to be included in a corpus, and those adopted in parable Corpus of Finnish in use for Making Hypotheses', paper presented at the preparing them for use with corpus soffware, e.g. the procedures for alignment of a uMlsrrucl- Research Models in Translation Studies conference, Manchester, 2g- parallel corpus, have implications for the findings based on the corpora themselves. 30 April 2000. I would like to suggest that corpus design and encoding should not be seen as merely Johansson, Stig and Knut Hofland (1994) 'Towards an English-Norwegian paraliel Cor- preliminary to the actual analysis of a corpus, but as important moments in them- pus", in udo Fries, Gunnel rottie and Peter Schneider (eds.) creating and {Jsing selves in the study of translation and translated texts. English Language Corpora, Amsterdam: Rodopi. 25-37. ----, Jarle Ebeling and Knur Hofland (1996) 'coding and Aligning the English-Norwe- gian Parailel Corpus', in Karin Aijmer, Bengt Altenberg and Mats Johansson (eds.) Languages in Contrast, Lund: Lund University Press, 87-112. References ------ (1998) 'on the Role of Corpora in cross-Linguistic Research', in stig Johansson and signe oksefiell (eds.) corpora and cross-Linguistic Research: Theory, Method., Aston, Guy (1999) 'Corpus Use and Learning to Translate', TextusXll(2),289-314. and Case Studies, Amsterdam and Atlanta: Rodopi, 3-24. Baker, Mona (1995) 'Corpora in Translation Studies: An Overview and Some Sugges- Kemppanen, Hannu (2000) 'Looking for Evaluative Keywords in Authentic and rrans- tions for Future Research', T ar g e t 7 (2) : 223 -243. lated Finnish: Corpus Research on Finnish History Texts', paper presented at the ------ (1996) 'Corpus-Based Translation Studies: The Challenges that Lie Ahead', in UMISTTCL Research Models in Translation studies conference, Manchester, 2g- Harold Somers (ed.) Terminology, LSP, and Translation: Studies in Language En- 30 April 2000. gineeing inHonour of Juan C. Sager, Amsterdam and Philadelphia: John Benjamins, Kennedy, Graeme (1998) An Introduction to Corpus Lingr jstlcs, London and New york: 175-186. Longman. ------ (1999) 'The Role of Corpora in Investigating the Linguistic Behaviour of Profes- Kohn, Jiinos (1996) 'what Can (Corpus) Linguistics Do for Translation?', in Kinga sional Translators' , International Journal of Corpus Linguistics 4(2): 1-18. KJaudy, Josd Lambert and Anik6 Sohdr (eds.) Translation studies in Hungary,Bl- Biber, Douglas (1993) 'Representativeness in Corpus Design', Literary and Linguistic dapest: Scholastica, 39 -52. Computing 8(4): 243-257 . Laffling, John (1992) 'on consrructing a Transfer Dictionary for Man and Machine', ------, Susan Conrad and Randi Reppen (1998) Corpus Linguistics: Investigating Lan- guage Structure and Use, Cambridge: Cambridge University Press. Target 4(1): 11-31. Laviosa, Sara (1997) 'How Comparable Can 'Comparable Corpora' Be?',Target 9(2) BriishNational Corpus,version 1.0, 1995, Oxford: OxfordUniversity Computing Services. 289-319. Chesterman, Andrew (7997) Memes of Translation: The Spread of Ideas in Translation Theory, Amsterdam and Philadelphia: John Benjamins. ----- (1998a) 'The English Comparable Corpus: a Resource and a Methodology',, in Ebeling, Jarte (1998) 'Contrastive Linguistics, Translation, and Parallel Corpora' , Meta Lynrre Bowker, Michael Cronin, Dorothy Kenny and Jennifer pearson (ed,s.) Unity 43(4):602-615. in Diversity: current Trends in Translation studies, Manchester: St. Jerome pub- Church, Kenneth W. and William A. Gale (1991) 'Concordances for Parallel Texts', in lishing, l0l-112. Using Corpora: Proceedings of the 7'h Annual Conference of the UW for the New ------ (1998b) 'Core Patterns of Lexical Use in a Comparable Corpus of English Narra- OED andText Research, Oxford, Oxford University Press, 40-62. tive Prose', Meta 43(4): 551-570.