WebLEAP/DSR is a new implementation of the WebLEAP tool that helps English language learners learn usage by analyzing web corpora. It allows users to input sentences and see frequency graphs and keyword-in-context examples from search engines. The tool also allows domain specification to focus analysis. Examples show how it can help estimate appropriate prepositions and compare differences between UK and US English. The system records user interactions for computer-assisted language learning. Further research topics include improving precision and analyzing regional differences and collaborative writing.
Identifying Appropriate Test Statistics Involving Population Mean
Learning English Usage with WebLEAP/DSR KWIC Tool
1. Learning Usage of English KWICly
with WebLEAP/DSR
Takashi Yamanoue
Kagoshima University, Japan
Toshiro Minami
Kyushu Institute of Information Sciences &
Kyushu University, Japan
Ian Ruxton
Kyushu Institute of Technology, Japan
Wataru Sakurai
University of Tsukuba, Japan
ICITA2004@Harbin(04.01.08-11)
2. Contents
I. Motivation (Introduction)
II. WebLEAP/DSR: A New
implementation of WebLEAP
III. Examples and Experiments for
Evaluation
IV. Related Work
V. Concluding Remarks
3. Difficulties in writing in English
Is it really used?
?
?
•Spelling
→ spell checker
•Grammar
→ grammar checker
•Usage
→ Corpus Linguistic Tools
I. Motivation
4. Problems of Ordinary
Corpus Linguistic Tools
Time-consuming
Needs hard work in order to make a
good corpus
Copyright problem
(often) Outdated from the beginning
Tools are mainly for experts
Difficult to use for ordinary learners
5. A Solution
Use of “Web-Corpus”
= Using Web Documents as a Corpus
Maintenance free: Exists as it is
Always new, reflects current status of
languages
A lot of applications/services are available on
the Internet
6. WebLEAP
Shows
Frequencies of phrases in the given
sentence graphically
Using a search engine.
New Features
KWIC (Key Word In Context)
Domain Specification
8. II. WebLEAP/DSR: A New
implementation of WebLEAP
DSR: Distributed System Recorder
A Computer Assisted Teaching System,…
Recording and replaying every operation.
Draw, Programming, Web, WebLEAP, …
WebLEAP Basic:
Frequencies Graphically
New(using Google Web APIs):
KWIC
Domain Specification
19. Satoh’s system … webcorpus
SUIKO…detects wrong sentences
Applications using Google Web APIs
DSR: Distributed System Recorder
A Benchmarking tool for distributed
systems.
A Computer Assisted Teaching system
P2P, reliable multicast, …
IV. Related Work
20. WebLEAP:
A tool for helping with writing.
Popularities of expressions.
Frequencies from a Search engine.
KWIC
How the expression is used.
Filling the lacking word.
Domain specification
WebLEAP/DSR
An application of DSR.
V. Concluding Remarks
21. Precision
Discrimination of Native speakers to
non native speakers.
Differences from region to region
Collaborative Writing
Further Research
Topics
(Thank you, Mr. Chairman)
I’m Takashi Yamanoue from Kagoshima University, Japan.
I would like to talk about “Learning Usage of English KWICly with WebLEAP/DSR”.
This talk consists of, the introduction, WebLEAP/DSR, a new implementation of WebLEAP,
Examples and Experiments for evaluation, related work and concluding remarks.
It is hard work to write something. It is even harder when it is in a second language. We often cannot judge the appropriateness of sentences. We already have spell checkers and grammar checkers. However, it could happen that an expression is correct grammatically, but no native speakers actually use it. A corpus and a concordance program helps us in such cases.
A corpus is a large number of sample sentences.
Making a corpus is time-consuming.
Hard work is needed in order to make a good corpus,
and to solve copyright problems.
The corpus is often outdated from the beginning.
Concordancers, tools for using the corpus, are mainly for experts.
Many of them are difficult to use for ordinary learners.
In order to solve these problems, we use the web documents as a corpus.
We call this kind of corpus a ‘Web-corpus’.
The Web-corpus is maintenance free. It exists as it is.
It is always new, and it reflects current status of languages.
A lot of applications/services are available on the Internet.
WebLEAP is a program which shows frequencies of the phrases in the given sentence to the user graphically.
We added new two feature to the WebLEAP.
One is KWIC, Key Word In Context. Another is Domain specification.
This figure shows the inside of the WebLEAP.
The sentence which is given by the user is decomposed into phrases by this word sequence generator.
These phrase are sent to a search engine.
The search engine return the corresponding pages which include frequency of the phrase.
The frequency is extracted by the document analyzer.
These frequencies are shown to the user graphically by the user interface.
WebLEAP/DSR is a new implementation of the WebLEAP.
DSR is a distributed system recorder. It can be used as a computer assisted teaching system and a benchmark test tool for distributed systems.
It can record and replay of users’ operation of DSR’s application programs on a distributed system.
WebLEAP/DSR is an application program of DSR.
By using Google Web API, a web service of the google,
It can show a KWIC table of a phrase. It can specify domain of the source sentences.
This is the WebLEAP Window of the WebLEAP/DSR.
It is used to input the sentence and settings, and control the outputs.
Clicking this [eval] button after inputting a sentence in this field, The draw window will shown.
This is the draw window. This window shows the frequencies of phrases in the input sentence graphically.
A number in the colored bar shows the frequency of the phrase over the bar.
A pink bar shows a low frequency. A blue bar shows a high frequency.
When the user clicks a bar, for example this bar, the KWIC window is shown.
This is the KWIC window. This window shows the KWIC table.
In these fields, the keyword which corresponds to the clicked bar at the draw window
is shown in bold letters.
We can see how the keyword is used in the context.
When the user clicks a URL field, for example this field, the web browser window is shown.
This is the web browser window.
The page in this window includes the keyword and shows the context like this.
This is the setting window. This page is shown when the user clicks the setting button in the Webleap window.
We can select a search engine that is used in the evaluation together with setting search options of the search engine.
In this figure, we have selected google as the search engine and are going to set the
Search domain as a search option for google.
We have experimented with a variety of cases.
Let's have a look at two of them.
One is estimating the appropriate preposition. Another is comparing English in specific countries.
Let’s think about which preposition is the most appropriate for “your own risk”.
Is it by? With? At?
This figure shows the frequencies of “by your own risk”, “with your own risk” and “at your own risk”.
The frequencies are 41, 138 and 434000. It is easy to see that “at your own risk” is the most appropriate one.
Let’s think about when we couldn’t have the “at” in our mind at first.
This figure shows that frequencies of “by your own risk” or “with your own risk” is too small for the frequency
of “your own risk”. Then click the frequency bar which corresponds to the “your own risk”.
Then this KWIC window is shown. This KWIC table shows how the “your own risk” is used in each context.
In this table, “at” is used in the most cases.
Then we can ask the frequency for “at your own risk” and we can confirm that “at your own risk” is the most appropriate expression.
Non native English speakers are sometimes confused when she or he is writing a sentence in a specific English dialect such as British English or American English.
The WebLEAP/DSR has the ability to filter the Web corpus by a domain name in the page’s URL.
This figure shows WebLEAP outputs for comparing two sentences “living in a flat” and “living in an apartment “ in the UK domain and the US domain.
This figure shows that “living in a flat” is used much more than “living in an apartment” in the UK domain,
And “living in an apartment” is used much more than “living in a flat” in the US domain.
Satoh’s system is similar to our system in the sense that it also uses Web documents through a search engine. This system outputs the KWIC index of a keyword, whereas our system outputs not only KWIC but also a graphical representation of the frequencies of words or phrases. WebLEAP can also specify the domain of the web-corpus.
SUIKO detects wrong sentences of Japanese, It doesn’t show if an expression is really used or not.
There are other applications wich use the google web apis. Most of them provide only an interface of the search engine.
DSR is a distributed system recorder. It can be used as a benchmarking tool and computer assisted teaching systems.
WebLEAP is a tool for helping with writing, by showing the user popularities of expressions. This uses a search engine in order to get frequencies of the expressions.
We added two new freatures to the WebLEAP.
One is KWIC and another is domain specification.
By using KWIC, the user can see how the given expression is used. The user can also fill the lacking word using KWIC.
WebLEAP/DSR is an application of the DSR.
In the next step of this research, we would like to improve the precision of the
WebLEAP and to make the WebLEAP to support collaborative writing.
We’d like to discriminate the native speaker’s expressions to the non native speaker’s.
We’d like to know the differences of sentences more precisely from region to region.
We thank to google.com for putting the google web apis in public and letting us use them.