34016665 translation-and-paraphrasing

Translated and
Paraphrased Plagiarism
The cat and mouse game continues...

One man’s rigor is another man’s mortis.
- CF Bohren and DR Huffman, 1983

Over view

The ever changing counter-detection
landscape
Paraphrasing versus textual entailment
Ways to paraphrase
Tools of the trade

Many Roads to Plagiarism
The ‘old fashioned’ way

Translated plagiarism.

Paraphrased plagiarism
Back-translation: the latest form of plagiarism
Michael Jones University of Wollongong, Australia

4th Asia Pacific Conference on Educational Integrity (4APCEI) 28–30 September 2009

Paraphrased plagiarism is not new either. However, there are new
tools to aid in automatically paraphrasing text which are accelerating
this form of detection avoidance.

Paraphrase plagiat n'est pas nouveau non plus. Toutefois, il existe
de nouveaux outils pour l'aide dans le texte paraphrase
automatiquement qui sont l'accélération de cette forme d'évasion de
détection.

Paraphrase plagiarism is not new either. However, there are
new tools to help in paraphrasing the text automatically, which are
accelerating this form of escape detection.

Paraphrased plagiarism

Paraphrasing vs Textual
Entailment
Two sentences are paraphrased if they
“mean the same thing”:
1) Similarity: they share a substantial
amount of information
2) Dissimilarities are extraneous: if
extra information in the sentences
exists, the effect of its removal is not
signiﬁcant.

Paraphrasing vs Textual
Entailment
A paraphrase is a special case of textual
entailment. A paraphrase is reflexive
whereas textual entailment indicates
that t wo sentences overlap to a degree
with one sentence being subsumed by
the other.

Ways to Paraphrase
Lexical substitution/synonymy
Hypo/Syno/Hyper-nym replacement: article,
paper or red, crimson
• Acronym replacement: Mr., mister
• Contractions: do not, don’t
Compounding/decompounding: ballgame, ball
game
• Numeric/Alphabetic numbers: 11, eleven;
12/1/2010, December first t wo-thousand-ten

Ways to Paraphrase

Active and passive exchange
The gangster killed 3 innocent people.
vs Three innocent people are killed by
the gangster.
• Re-ordering of sentence components
Tuesday they met vs They met Tuesday
Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Ways to Paraphrase
Realization in different syntactic
components
Palestinian leader Arafat vs
Arafat, Palestinian leader
Prepositional phrase attachment
The Alabama plant vs
A plant in Alabama

Ways to Paraphrase
Change into different sentence types
Who drew this picture? vs
Tell me who drew this picture.
Morphological derivation
He is a good teacher. vs
He teaches well. vs
He is good at teaching.

Ways to Paraphrase

Light verb construction
The film impressed him. vs
The film made an impression on him.
Comparatives vs. superlatives
He is smarter than everyone else. vs
He is the smartest one.

Ways to Paraphrase

Converse word substitution
John is Mary's husband. vs
Mary is John's wife.
Verb nominalization
He wrote the book. vs
He was the author of the book.

Ways to Paraphrase
Substitution using words with
overlapping meanings
Bob excels at mathematics. vs
Bob studies mathematics well.
Inference
He died of cancer. vs
Cancer killed him.

Ways to Paraphrase
Different semantic role realization
He enjoyed the game. vs
The game pleased him.
Subordinate clauses vs separate
sentences lined by anaphoric pronouns.
The tree healed its wounds by growing
new bark. vs
The tree healed its wounds. It grew
new bark.

Tools of the Trade
Microsoft paraphrase corpus
Used to test algorithms
WordNet: English only :(
Synonyms, hypernyms, hyponyms,
and antonyms.
Algorithms: Finite State Transducers
(FSTs) and/or iterative Longest Common
Sequence (LCS) on sets.

Tools of the Trade

Stemming or lemmatization
am, are, is be
car, cars, car's, cars' car

Word Alignment Examples
According to the MS paraphrase corpus:
This is a paraphrase

12/14 = 86%
12/16 = 75%
Not Paraphrased (However, the first sentence is textually entailed by the second.
Turnitin would currently match this.)

18/19 = 95%
18/26 = 69%

Slippery Slope
When does textual entailment become arbitrary/noise?

14/48 = 29%
14/34 = 41%

Slippery Slope
When does textual entailment become arbitrary/noise?

13/24 = 54%
13/21 = 62%

Translated Plagiarism

Non-English markets, in particular, are
concerned about their English as a
second language students submitting
English documents that have been
translated to their native language.

Initial approach:
Non-English documents searched as
they are now
Additional search performed:
Translate document to English, search
English documents, and then display
English matches with translations (or
vice versa)

Our new strategic partner:

On demand SaaS statistical machine translation

Translated Plagiarism: Need
for Paraphrasing?
Machines and humans translate text in
many different ways.
Paraphrase detection allows us to
match the variations.
Google translate: The zeitgeist is thinking and feeling one age. The term describes
the characteristics of a particular period, or an attempt to remind us it. The German
word Zeitgeist is transferred through English as a loanword into numerous other
languages been.

Bing translate: Zeitgeist is thinking and feeling how an age. Is the nature of a
particular era or trying to understand them. The German word Zeitgeist is taken from
English as a loanword in many other languages.

http://de.wikipedia.org/wiki/Zeitgeist

Finis

Thank you for listening!
Questions?

34016665 translation-and-paraphrasing

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (17)

Semelhante a 34016665 translation-and-paraphrasing

Semelhante a 34016665 translation-and-paraphrasing (20)

Último

Último (20)

34016665 translation-and-paraphrasing

Notas do Editor