2. What is a corpus?
A large selection of naturally
occurring spoken and/or written
language available in digital
format.
Corpora are a useful resource for
linguists, lexicographers,
teachers, and students because
they provide instant access to
large amounts of authentic
3. Examples of Corpora
Michigan Corpus of Academic Spoken
English (MICASE) 1.7 million word corpus of
transcribed lectures, discussion groups, etc.,
compiled by the English Language Institute at UM
British National Corpus (BNC), a 100 million
word corpus of written (90%) and spoken (10%)
language, including telephone conversations,
novels, student papers, and many other genres.
BYU Corpus of American English (BYU CAE),
a 360+ million word corpus of written and spoken
language.
4. MICUSP
The Michigan Corpus of Upper-level Student
Papers
829 „A‟ grade papers (roughly 2.6 million words)
Range of disciplines across 4 academic divisions*
of the University of Michigan (U-M), Ann Arbor
(*Humanities and Arts, Social Sciences, Biological
and Health Sciences, Physical Sciences).
It can be accessed through an online interface
called MICUSP Simple, available at:
http://search-micusp.elicorpora.info/simple
5. BROWSING PAPERS IN
MICUSP
16 DISCIPLINES PAPER TYPES
Argumentative Essay
Creative Writing
TEXTUAL FEATURES Research Paper
Abstract Report
Definitions Critique/Evaluation
Discussion of results Response Paper
Literature review Proposal
Methodology section
Problem-solution pattern
Reference to sources
STUDENT LEVELS
Tables, graphs or figures Final year undergraduate
students (level G0)
1st, 2nd and 3rd year
graduate students (levels
G1, G2 & G3)
6. General Uses of MICUSP
To search for use of particular word(s)
To search for collocations of words i.e.
information about with which word(s) a particular
word co-occurs
To search for examples of particular text types (a
research proposal in Biology, a report in
Psychology)
To search for examples of features of texts (e.g.
abstracts in Biology research papers, definitions
in Mechanical Engineering, how to introduce a
figure or graph)
To look at frequencies of a particular phrase/word
11. Potential uses for MICUSP
I don‟t know how to use “nevertheless.” Does it
always go first in a sentence?
Is it “different from” or “different to”?
How do I introduce a figure or graph in a Poli Sci
paper?
How do I write a lit review for a psychology
paper?
How do I write a definition in a biology paper?
Is it ok to use “I” in a Mechanical Engineering
report?
12. Acknowledgments
The Michigan Corpus Linguistics Team
(Ute Roemer and Matt O‟Donnell)
John Swales – the initiator of corpus linguistics at
Michigan and a great educator and inspiration.
13. What is a Concordancer?
Software which can be used to search, access,
and analyze language from a corpus.
Useful in exploring the relationships between
words, and can provide information about the way
language is used.
A concordancer allows us to enter a word or
phrase, and search for multiple examples of how
that word or phrase is used in everyday speech
or writing.
Notas do Editor
Corpus / Corpora
http://141.211.123.105:8000/search/main/; point out how each feature is useful for teachers (e.g. pdf versions added upon instructor’s request)