SlideShare uma empresa Scribd logo
1 de 18
M A R I A C A R O L I N A
F I L I P E R O D R I G U E S
F A B I O R O N D I N E L L I
E R I C O C A E T A N O
Corpus linguistics and
semantics studies
What is corpus linguistics?
• Collection and analysis of a specific set of data
• Corpus characteristics
• Technology and corpora
• Access to language in proper use
• Quantification
• It facilitates the access to the material
Most known corpora
• Brown Corpus
• British National Corpus
• Oxford English Corpus
• International Corpus of English
• Example: http://corpus.byu.edu/coca/
Corpus linguistics in semantic prosody
“Prosody” in the term “semantic prosody” is borrowed
from Firth (1957), who used it to refer to phonological
colouring which spreads beyond semantic boundaries. To
give an example, the word animal has so strong a nasal
prosody that the vowel sound of the letter a is endowed
with a nasal quality through assimilation, simply because a
is closely adjacent to the nasal sound of n. In the same way,
lexical items share this particular phenomenon of
“prosody” in lexical patterning. Enlightened by Firthian
sense of a “prosody”, Bill Louw coins the term “semantic
prosody” and endows it with its first definition, a
“consistent aura of meaning with which a form is imbued
by its collocates” (Louw, 1993: 157).
Louw illustrates SP with several examples such as the
adverbs utterly, the phrase bent on and the expression
symptomatic of, which simultaneously carry negative
SP. These three words are followed by expressions
which refer to undesirable things, such as destroying,
ruining, clinical, depression, multitude of sins, etc.
Semantics x Pragmatics
Semantic meaning and pragmatic meaning are the two
extremes in meaning system, for semantic meaning
can be seen as the meaning which arises only from
linguistic factors in a piece of communication, while
pragmatic meaning is that meaning imposed by the
non-linguistic elements which has an impact on
communication
Studies on Corpus Linguistics and Semantics
-Chishman and Teixeira (2009) provide us with an
interesting study on nominal compounds based on Corpus
Linguistics.
-Data from 10 digital issues of National Geographic analyzed
by a software.
-It recalls a common question Brazilian students of English
may ask: when trying to say “bolo de maça”, for instance, they
may try “cake of apple” or even “apple‟s cake” before getting
to “apple cake”, the correct nominal compound.
- Identification and categorization of recurrent
semantic relations between nominal compounds.
Examples: in memory drugs we find a relation of
telicity, for those drugs aim at serving memory
purposes. In school play, there is a relation of
localization, while in rice bag the effect is of
meronimy, for one element contains the other.
Such analysis could inspire us to observe the relation of
compound nouns and even suggest that Brazilian students of
English take a deeper look at them. For instance, what kind of
relation would students find in the following compounds?
How could they explain it with their own words?
- car accident - fruit bat - skin cancer
- island culture - lemon tree - cameraman
- metal armor - ethanol production
A Corpus-Driven Approach to Genre Analysis
- The paper shows that an exhaustive corpus-driven
approach, mixed with statistics, is the most effective
analytical method for comparing texts across genres.
- By using the resources above, the author examines
the characteristics of each genre, looking at words
and phrasal behavior
- According to the author, such na approach can
contribute much to the study of the pragmatic
analisys of written texts
Genres
 Prior conceptions of genres considered external
criteria. Biber(1988,1993)
 With the new approach, genre can based on internal
criteria
 Instead of using a priori listings, genre can emerge
through quantitative research in linguistics
 Biber (1988) and the multianalytical approach: if
some linguistic features are frequently in a text,
other features will appear less frequently
Corpus compilation
 The general reference includes academic texts,
newspaper and literature from 6 pre-existing corpora
 The size of the resulting genre corpora are as follows:
academic corpus (MicroConcord B + text category J of
the 4 corpora), 1,662,106 running words; newspaper
corpus (MicoroConcord A + text category A, B, C texts of
4 corpora), 1,760,664 running words; literature corpus
(text category K-R texts from 4 corpora), 1,019,254
running words. The size of a general reference corpora
derived from mixing the 4 corpora (hereafter re ferred to
as the „GR‟ corpus) was 4,071,830 running words.
Vocabulary variety and difficulty
 The ranked order is, 1. newspaper, 2. literature and 3.
academic. Therefore, both S-TTR and Guiraud values
suggest that newspaper English uses the most varied
vocabulary, literary English an intermediate one, and
academic English the smallest, if estimators of lexical
density are used.
 The inclusion of longer words is taken to mean that texts
have many difficult words from a solely empirical
perspective - 1. academic, 2. newspaper and 3. literature.
N-Gram analisys
 This analisys was done by comparing multi-word units
between genre corpora, in particular 4-word units
occurring in each genre corpus. Coniam (2004) used
KfNgram (Fletcher 2002) to compute 4-word units
occurring in specific genre texts taken from applied
linguistics articles
 N-grams are able to identify the commonest
collocations in a discourse far more effectively than a
single word analysis. There is an overall tendency
toward using multi-word fixed units in academic texts
as opposed to other genres.
Personality in texts: I, we and passives
 Kuo researched the use of the personal pronoun in
academic texts from an empirical viewpoint. The use of the
personal pronoun provides an environment creating an
interpersonal interaction between the writer and the
readers (Kuo 1999:123)
 Literature overuses “I”, while academic and newspapers
underuse it. Academic and literature use “we” more often
than newspapers.
 The passive voice is much more used in academic texts
Nominalization
 Biber et al. (1998:58) suggest that, “studying a
morphological characteristic in a corpus can teach us
both about the frequency and distribution of the
characteristic and about the differing functions of
particular variants”.
 nominalization creates forms ending with -tion -
sion, -ness, -ment and -ity, including plural forms.
 academic texts show nominalization at a higher ratio
than other genres and its texts tend to use
nominalizations ending with -ity, -ment, but at a
much lower frequency, -ness
 Newspapers show a similar use of nominalization as
academic texts, but the –ment form is predominant
 literature works use these three nominalizations
almost equally and the –ness form is the most salient
References
 ZHANG, Changu. An Overview of Corpus-based Studies of
Semantic Prosody. Asian Social Science, vol. 6, June 2010.
 CHISHMAN, Rove; TEIXEIRA, Lilian F. A semântica dos
compostos nominais em língua inglesa: um estudo de
corpus. Veredas on-line – Linguística de Corpus e
Computacional, 2/2009, P. 84-99
 NISHINA, Yasunori (2007) “A Corpus-Driven Approach to
Genre Analysis: The Reinvestigation of Academic,
Newspaper and Literary Texts”, ELR Journal, 1 (2).

Mais conteúdo relacionado

Mais procurados

Applied linguistic: Contrastive Analysis
Applied linguistic: Contrastive AnalysisApplied linguistic: Contrastive Analysis
Applied linguistic: Contrastive AnalysisIntan Meldy
 
Notional functional syllabus design
Notional functional syllabus designNotional functional syllabus design
Notional functional syllabus designMazharul Islam
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learningnfuadah123
 
Language description presentation
Language description presentationLanguage description presentation
Language description presentationTusro Mardio
 
Prague school slides
Prague school slidesPrague school slides
Prague school slidesnoreen zafar
 
Traditional grammar ppt
Traditional grammar pptTraditional grammar ppt
Traditional grammar pptMay Montemayor
 
19th century linguistics
19th century linguistics19th century linguistics
19th century linguisticsVenus Withers
 
Basic notions; language variation and levels
Basic notions;  language variation and levelsBasic notions;  language variation and levels
Basic notions; language variation and levelsAmna Malik
 
Firthian's Approach to Linguistics
Firthian's Approach to LinguisticsFirthian's Approach to Linguistics
Firthian's Approach to LinguisticsImtiaz Ahmad
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
Transformational Generative Grammar
Transformational Generative GrammarTransformational Generative Grammar
Transformational Generative GrammarHani Khan
 
The Urduization of English in Pakistan
The Urduization of English in PakistanThe Urduization of English in Pakistan
The Urduization of English in Pakistansawdeaa
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysisVivaAs
 
What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...
What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...
What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...AleeenaFarooq
 

Mais procurados (20)

Generative grammer
Generative grammerGenerative grammer
Generative grammer
 
Applied linguistic: Contrastive Analysis
Applied linguistic: Contrastive AnalysisApplied linguistic: Contrastive Analysis
Applied linguistic: Contrastive Analysis
 
Notional functional syllabus design
Notional functional syllabus designNotional functional syllabus design
Notional functional syllabus design
 
Corpus linguistics in language learning
Corpus linguistics in language learningCorpus linguistics in language learning
Corpus linguistics in language learning
 
Universal grammar
Universal grammarUniversal grammar
Universal grammar
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Language description presentation
Language description presentationLanguage description presentation
Language description presentation
 
Prague school slides
Prague school slidesPrague school slides
Prague school slides
 
Traditional grammar ppt
Traditional grammar pptTraditional grammar ppt
Traditional grammar ppt
 
19th century linguistics
19th century linguistics19th century linguistics
19th century linguistics
 
Basic notions; language variation and levels
Basic notions;  language variation and levelsBasic notions;  language variation and levels
Basic notions; language variation and levels
 
Generative grammar
Generative grammarGenerative grammar
Generative grammar
 
Intro to sociolinguistics
Intro to sociolinguisticsIntro to sociolinguistics
Intro to sociolinguistics
 
Firthian's Approach to Linguistics
Firthian's Approach to LinguisticsFirthian's Approach to Linguistics
Firthian's Approach to Linguistics
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Transformational Generative Grammar
Transformational Generative GrammarTransformational Generative Grammar
Transformational Generative Grammar
 
The Urduization of English in Pakistan
The Urduization of English in PakistanThe Urduization of English in Pakistan
The Urduization of English in Pakistan
 
Discourse analysis
Discourse analysisDiscourse analysis
Discourse analysis
 
Syllabus Design
Syllabus Design Syllabus Design
Syllabus Design
 
What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...
What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...
What is Style and Stylistics? Traditional, Modern and Linguistic Concept of S...
 

Semelhante a Corpus and semantics final

Corpus linguistics and multi-word units
Corpus linguistics and multi-word unitsCorpus linguistics and multi-word units
Corpus linguistics and multi-word unitsPascual Pérez-Paredes
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discoursePascual Pérez-Paredes
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Umm-e-Rooman Yaqoob
 
A Phrase-Frame List For Social Science Research Article Introductions
A Phrase-Frame List For Social Science Research Article IntroductionsA Phrase-Frame List For Social Science Research Article Introductions
A Phrase-Frame List For Social Science Research Article IntroductionsTye Rausch
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysisRubyaShaheen
 
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
 Establishment of a List of Non-Compositional Multi-Word Combinations for Eng... Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...Research Journal of Education
 
Spoken American English Idioms
Spoken American English IdiomsSpoken American English Idioms
Spoken American English IdiomsCompany
 
Phonaesthemes: A Corpus-based Analysis
Phonaesthemes: A Corpus-based AnalysisPhonaesthemes: A Corpus-based Analysis
Phonaesthemes: A Corpus-based Analysiskotis
 
A semantics theory of word classes.pdf
A semantics theory of word classes.pdfA semantics theory of word classes.pdf
A semantics theory of word classes.pdfSara Parker
 
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...CSCJournals
 
Linguistic approach by sheena bernal
Linguistic approach by sheena bernalLinguistic approach by sheena bernal
Linguistic approach by sheena bernalEdi sa puso mo :">
 
An Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsAn Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsTye Rausch
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysisAseel K. Mahmood
 
Macrolinguistics & Contrastive Analysis
Macrolinguistics & Contrastive AnalysisMacrolinguistics & Contrastive Analysis
Macrolinguistics & Contrastive Analysiszahraa Aamir
 
Written language - Discourse Analysis
Written language - Discourse AnalysisWritten language - Discourse Analysis
Written language - Discourse AnalysisH. R. Marasabessy
 
What can a corpus tell us about grammar
What can a corpus tell us about grammarWhat can a corpus tell us about grammar
What can a corpus tell us about grammarSami Khalil
 
ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES S.P.E.A.K.I.N.G. M...
ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES  S.P.E.A.K.I.N.G. M...ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES  S.P.E.A.K.I.N.G. M...
ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES S.P.E.A.K.I.N.G. M...Sara Alvarez
 

Semelhante a Corpus and semantics final (20)

Corpus linguistics and multi-word units
Corpus linguistics and multi-word unitsCorpus linguistics and multi-word units
Corpus linguistics and multi-word units
 
What can a corpus tell us about discourse
What can a corpus tell us about discourseWhat can a corpus tell us about discourse
What can a corpus tell us about discourse
 
Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics Corpus Analysis in Corpus linguistics
Corpus Analysis in Corpus linguistics
 
A Phrase-Frame List For Social Science Research Article Introductions
A Phrase-Frame List For Social Science Research Article IntroductionsA Phrase-Frame List For Social Science Research Article Introductions
A Phrase-Frame List For Social Science Research Article Introductions
 
Computer assisted text and corpus analysis
Computer assisted text and corpus analysisComputer assisted text and corpus analysis
Computer assisted text and corpus analysis
 
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
 Establishment of a List of Non-Compositional Multi-Word Combinations for Eng... Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
Establishment of a List of Non-Compositional Multi-Word Combinations for Eng...
 
Spoken American English Idioms
Spoken American English IdiomsSpoken American English Idioms
Spoken American English Idioms
 
Phonaesthemes: A Corpus-based Analysis
Phonaesthemes: A Corpus-based AnalysisPhonaesthemes: A Corpus-based Analysis
Phonaesthemes: A Corpus-based Analysis
 
A semantics theory of word classes.pdf
A semantics theory of word classes.pdfA semantics theory of word classes.pdf
A semantics theory of word classes.pdf
 
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
Differences in Frequencies between Linking Verbs and Relative Pronouns in Wri...
 
Specialist genres
Specialist genresSpecialist genres
Specialist genres
 
Linguistic approach by sheena bernal
Linguistic approach by sheena bernalLinguistic approach by sheena bernal
Linguistic approach by sheena bernal
 
An Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical SemanticsAn Outline Of Type-Theoretical Approaches To Lexical Semantics
An Outline Of Type-Theoretical Approaches To Lexical Semantics
 
Corpus approaches to discourse analysis
Corpus approaches to discourse analysisCorpus approaches to discourse analysis
Corpus approaches to discourse analysis
 
Macrolinguistics & Contrastive Analysis
Macrolinguistics & Contrastive AnalysisMacrolinguistics & Contrastive Analysis
Macrolinguistics & Contrastive Analysis
 
F3105460
F3105460F3105460
F3105460
 
Fillmore case grammar
Fillmore case grammarFillmore case grammar
Fillmore case grammar
 
Written language - Discourse Analysis
Written language - Discourse AnalysisWritten language - Discourse Analysis
Written language - Discourse Analysis
 
What can a corpus tell us about grammar
What can a corpus tell us about grammarWhat can a corpus tell us about grammar
What can a corpus tell us about grammar
 
ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES S.P.E.A.K.I.N.G. M...
ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES  S.P.E.A.K.I.N.G. M...ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES  S.P.E.A.K.I.N.G. M...
ANALYSIS OF A SELECTED BARGAIN DISCOURSE USING DELL HYMES S.P.E.A.K.I.N.G. M...
 

Corpus and semantics final

  • 1. M A R I A C A R O L I N A F I L I P E R O D R I G U E S F A B I O R O N D I N E L L I E R I C O C A E T A N O Corpus linguistics and semantics studies
  • 2. What is corpus linguistics? • Collection and analysis of a specific set of data • Corpus characteristics • Technology and corpora • Access to language in proper use • Quantification • It facilitates the access to the material
  • 3. Most known corpora • Brown Corpus • British National Corpus • Oxford English Corpus • International Corpus of English • Example: http://corpus.byu.edu/coca/
  • 4. Corpus linguistics in semantic prosody “Prosody” in the term “semantic prosody” is borrowed from Firth (1957), who used it to refer to phonological colouring which spreads beyond semantic boundaries. To give an example, the word animal has so strong a nasal prosody that the vowel sound of the letter a is endowed with a nasal quality through assimilation, simply because a is closely adjacent to the nasal sound of n. In the same way, lexical items share this particular phenomenon of “prosody” in lexical patterning. Enlightened by Firthian sense of a “prosody”, Bill Louw coins the term “semantic prosody” and endows it with its first definition, a “consistent aura of meaning with which a form is imbued by its collocates” (Louw, 1993: 157).
  • 5. Louw illustrates SP with several examples such as the adverbs utterly, the phrase bent on and the expression symptomatic of, which simultaneously carry negative SP. These three words are followed by expressions which refer to undesirable things, such as destroying, ruining, clinical, depression, multitude of sins, etc.
  • 6. Semantics x Pragmatics Semantic meaning and pragmatic meaning are the two extremes in meaning system, for semantic meaning can be seen as the meaning which arises only from linguistic factors in a piece of communication, while pragmatic meaning is that meaning imposed by the non-linguistic elements which has an impact on communication
  • 7. Studies on Corpus Linguistics and Semantics -Chishman and Teixeira (2009) provide us with an interesting study on nominal compounds based on Corpus Linguistics. -Data from 10 digital issues of National Geographic analyzed by a software. -It recalls a common question Brazilian students of English may ask: when trying to say “bolo de maça”, for instance, they may try “cake of apple” or even “apple‟s cake” before getting to “apple cake”, the correct nominal compound.
  • 8. - Identification and categorization of recurrent semantic relations between nominal compounds. Examples: in memory drugs we find a relation of telicity, for those drugs aim at serving memory purposes. In school play, there is a relation of localization, while in rice bag the effect is of meronimy, for one element contains the other.
  • 9. Such analysis could inspire us to observe the relation of compound nouns and even suggest that Brazilian students of English take a deeper look at them. For instance, what kind of relation would students find in the following compounds? How could they explain it with their own words? - car accident - fruit bat - skin cancer - island culture - lemon tree - cameraman - metal armor - ethanol production
  • 10. A Corpus-Driven Approach to Genre Analysis - The paper shows that an exhaustive corpus-driven approach, mixed with statistics, is the most effective analytical method for comparing texts across genres. - By using the resources above, the author examines the characteristics of each genre, looking at words and phrasal behavior - According to the author, such na approach can contribute much to the study of the pragmatic analisys of written texts
  • 11. Genres  Prior conceptions of genres considered external criteria. Biber(1988,1993)  With the new approach, genre can based on internal criteria  Instead of using a priori listings, genre can emerge through quantitative research in linguistics  Biber (1988) and the multianalytical approach: if some linguistic features are frequently in a text, other features will appear less frequently
  • 12. Corpus compilation  The general reference includes academic texts, newspaper and literature from 6 pre-existing corpora  The size of the resulting genre corpora are as follows: academic corpus (MicroConcord B + text category J of the 4 corpora), 1,662,106 running words; newspaper corpus (MicoroConcord A + text category A, B, C texts of 4 corpora), 1,760,664 running words; literature corpus (text category K-R texts from 4 corpora), 1,019,254 running words. The size of a general reference corpora derived from mixing the 4 corpora (hereafter re ferred to as the „GR‟ corpus) was 4,071,830 running words.
  • 13. Vocabulary variety and difficulty  The ranked order is, 1. newspaper, 2. literature and 3. academic. Therefore, both S-TTR and Guiraud values suggest that newspaper English uses the most varied vocabulary, literary English an intermediate one, and academic English the smallest, if estimators of lexical density are used.  The inclusion of longer words is taken to mean that texts have many difficult words from a solely empirical perspective - 1. academic, 2. newspaper and 3. literature.
  • 14. N-Gram analisys  This analisys was done by comparing multi-word units between genre corpora, in particular 4-word units occurring in each genre corpus. Coniam (2004) used KfNgram (Fletcher 2002) to compute 4-word units occurring in specific genre texts taken from applied linguistics articles  N-grams are able to identify the commonest collocations in a discourse far more effectively than a single word analysis. There is an overall tendency toward using multi-word fixed units in academic texts as opposed to other genres.
  • 15. Personality in texts: I, we and passives  Kuo researched the use of the personal pronoun in academic texts from an empirical viewpoint. The use of the personal pronoun provides an environment creating an interpersonal interaction between the writer and the readers (Kuo 1999:123)  Literature overuses “I”, while academic and newspapers underuse it. Academic and literature use “we” more often than newspapers.  The passive voice is much more used in academic texts
  • 16. Nominalization  Biber et al. (1998:58) suggest that, “studying a morphological characteristic in a corpus can teach us both about the frequency and distribution of the characteristic and about the differing functions of particular variants”.  nominalization creates forms ending with -tion - sion, -ness, -ment and -ity, including plural forms.
  • 17.  academic texts show nominalization at a higher ratio than other genres and its texts tend to use nominalizations ending with -ity, -ment, but at a much lower frequency, -ness  Newspapers show a similar use of nominalization as academic texts, but the –ment form is predominant  literature works use these three nominalizations almost equally and the –ness form is the most salient
  • 18. References  ZHANG, Changu. An Overview of Corpus-based Studies of Semantic Prosody. Asian Social Science, vol. 6, June 2010.  CHISHMAN, Rove; TEIXEIRA, Lilian F. A semântica dos compostos nominais em língua inglesa: um estudo de corpus. Veredas on-line – Linguística de Corpus e Computacional, 2/2009, P. 84-99  NISHINA, Yasunori (2007) “A Corpus-Driven Approach to Genre Analysis: The Reinvestigation of Academic, Newspaper and Literary Texts”, ELR Journal, 1 (2).