1. How to Use Corpora in
Language Teaching
Brody Bluemel
Department of Applied Linguistics
The Pennsylvania State University
LANGUAGE TEACHING WORKSHOP SERIES
The Pennsylvania State University, February 2014
Sponsored by the Center for Language Acquisition (CLA) and the
Center for Advanced Language Proficiency Education and
Research (CALPER).
2. Outline
URL: https://sites.google.com/site/corpusteaching/
Presentation:
What are language corpora?
Approaches to using corpora in language teaching
Introduction to several available resources
Collaborate:
What ideas do you have for using corpora in your classroom?
Discussion:
Share ideas
3. What are corpora?
Leech (1992): “an unexciting phenomenon, a helluva lot of
text, stored on a computer”
Sinclair (1991): “a collection of naturally-occurring language
text, chosen to characterize a state or a variety of language”
Sinclair (2004): “a collection of pieces of language text in
electronic form, selected according to external criteria to
represent, as far as possible, a language or language variety
as a source of data for linguistic research”
Corpora: A systematized set of texts, typically accessed
electronically, that are used for linguistics research and
pedagogy.
4. Types of Corpora
General vs. Specialized
Native vs. Learner Corpora
Monolingual vs. Translation Corpora
Parallel Corpora, Comparable Corpora, Equivalent
Corpora
Language Variation Corpora
Synchronic vs. Diachronic Corpora
Spoken vs. Written Corpora
5. Approaches to using corpora in
language teaching
General vs. Specialized Corpora
Grammar, lexicon, rhetoric, style, expressions, Form
ulaic Speech
British National Corpus
American National Corpus
BYU Corpus Interface
MiCase
6. Approaches to using corpora in
language teaching
Native vs. Learner Corpora
Comparison, Analysis, Error Analysis, L1 specific
challenges
International Corpus of Learner English (ICLE)
Extensive List of Multilingual Learner Corpora
7. Approaches to using corpora in
language teaching
Translation Corpora
Parallel Corpora
Phrasing, conceptualizing complex concepts, reading
comprehension
www.parallelcorpus.com
EU Joint Research Centre
E-C Concord
www.linguee.com
8. Approaches to using corpora in
language teaching
Language Variation Corpora
Exploration of dialects
Phonemica
International Corpus of English (ICE)
Synchronic vs. Diachronic Corpora
Language change, modern speech, Understanding
novels and other texts
Spoken vs. Written Corpora
Genre and use
9. Online Resources
Presentation URL: https://sites.google.com/site/corpusteaching/
Multilingual Corpora:
Additional Resources:
Non-English Corpora
Corpus Tools & Websites
www.linguee.com
Extensive list of Online Corpora
Learner Corpora
Bookmarks for corpus-based linguist
Athel Corpus Resources
The corpora list
CALPER Corpus Tutorial
One of my favorites:
http://dict.bing.com.cn/
10. Primary Resources
Books and journals
Aijmer (2009): Corpora and Language Teaching
Hunston (2002): Corpora in Applied Linguistics
McEnery (2006): Corpus-Based Language
Studies
Sinclair (2004): How to Use Corpora in
Language Teaching
International Journal of Corpus Linguistics
Corpora
10
11. Collaborate
In groups of 3-4, discuss ideas, innovations, and questions
you have about applying corpus technology in the classroom.
Specific questions to consider:
Questions or applications of corpora that haven’t been
discussed?
What challenges do you foresee in applying corpora in
teaching?
What unique features about YOUR classroom should be
considered? (characteristics of the language you teach,
student population, etc.)
How would this technology benefit you in your teaching?
How do you plan to use corpus technology in your classroom?
12. Discussion
Share:
Ideas and possible applications of corpora generated
in your group discussion
Any key features or aspects of corpora we haven’t
yet considered
Questions:
Any questions regarding using corpora, finding
resources, or anything else.
13. Thank You!
Contact: Brody Bluemel (btb5129@psu.edu)
The Pennsylvania State University
Department of Applied Linguistics
Notas do Editor
Dear Colleagues,A friendly reminder of tomorrow's language teaching workshop:"How to Use Corpus Tools in Language Teaching"Wednesday, February 5, 20144:40-5:45 p.m.267 Willard This workshop offers an overview of how language corpora--collections of authentic textual and/or spoken language samples--can be highly valuable resources for the teaching and learning of second languages. Examples of available corpora in various languages, including a new corpus tool for learning Chinese, will be shown as models. Topics to be addressed include:The event is free and open to the public. Light refreshments will be provided.For further information, please contact mcd15@psu.edu. We hope you will join us! This workshop is sponsored by the Center for Language Acquisition (CLA) and the Center for Advanced Language Proficiency Education and Research (CALPER).
What is a language corpus?How can learners benefit from working with corpus materials?What do corpus-based activities and assignments look like?How can teachers find and use language corpora in their teaching?
Chinese – learning and using the orthographic system. (Bluemel, in press; Tsai & Choi, 2005)German – Learning gender, case, prepositions, and word order. (St. John, 2001)EFL/ESL – Learning articles, prepositions, and aspect (Frankenberg-Garcia, 2005; McEnery & Wilson, 2001)Italian – Verb Tense (Laviosa, 2002)Spanish – lexical and semantic analysis and differentiation (Lavid, Hita, & Zamorano-Mansilla, 2010)
Source info – Learner: l1, gender, programSample – date, mode, task, genreWhich numbers matterNumber of tokens, types, categories, samples in each category, and words in each sampleDescriptive adequacyBigger corpus generally better for low frequency words, but note Zipf’s Law (1935) 100K words of spontaneous speech enough for descriptive studies of prosody0.5 million words enough for study of verb-form morphology0.5-1 million words enough for studies of most syntactic processes and high frequency vocabulary Reliability of smaller corpus can be empirically tested against larger corpus Biber (1990)Measured internal variation of 50 pairs of samples from same textsSamples: 2000-5000 words enoughBiber (1993)Used multivariate techniques of factor analysis and cluster analysis to study variationPilot studies necessary to fine-tune structureOne million words good for grammatical studies