The document provides information about the University of Wolverhampton's Research Group in Computational Linguistics and Statistical Cybermetrics Research Group. It discusses the groups' expertise in various areas of natural language processing and information retrieval. Key personnel are mentioned, including Ruslan Mitkov, Constantin Orasan, and Mike Thelwall. Ongoing and past projects funded by sources like the EC and NBME are summarized.
4. Mission Statement
RIILP produces internationally leading research,
offers first class research supervision and teaching
in the interdisciplinary areas of information and
language processing and delivers cutting-edge
practical (including commercial) applications to the
benefit of the society based on its research output
5. Structure and context
Research Group in Computational Linguistics
Statistical Cybermetrics Research Group
Benchmark: the very best national and international
expertise in every area
Both groups enjoy considerable national and
international reputation
External income generation > £4,000,000 over last
five years
6. Statistical Cybermetrics
Research Group
Statistical Cybermetrics entered in Unit of Assessment “Library and
Information Management”: in national context, Wolverhampton
ranked joint second with four more universities.
According to league tables (The Guardian, The Times and
Research Fortnight), research in Library and Information
Management at the University of Wolverhampton is one of the six
best in the UK.
Head of SCRG – Prof. Mike Thelwall
Rated 3rd most successful UK library and information science
researcher of all time (Jan. 2007)
8. RAE’2008 results
Computational Linguistics entered in Unit of Assessment
“Linguistics”: Wolverhampton ranked joint third with two
more universities in a large company of old, researchintensive universities.
According to league tables (The Guardian, The Times
and Research Fortnight), research in Linguistics at the
University of Wolverhampton is one of the six best in the
UK
Due to Computational Linguistics, in Linguistics we are
ahead of Oxford, Cambridge, UCL, Lancaster,
Manchester, Reading...
9. Research Group in Computational Linguistics:
People
Founded in 1997 by Ruslan Mitkov
Currently:
1 full-time Professor
2 part-time Professors
2 Readers
1 Senior Lecturer
7 research fellow and research associates
12 PhD students
4 Administrators
Project research assistants, Masters students, Visiting
Professors, Honorary Research Fellows, guest
researchers
10. Research Group in Computational Linguistics: key personnel
Ruslan Mitkov
Publications: more than 200 publications in areas including:
anaphora resolution (>2,000 citations, >40 keynote speeches
generation of multiple-choice tests (>15 keynote speeches)
Key books:
Mitkov R. 2002. Anaphora resolution. Longman.
Mitkov R. (Ed). 2003, 2005. Oxford Handbook of Computational Linguistics. Oxford
University Press.
Current editorial distinctions:
Executive Editor of the Journal of Natural Language Engineering (Cambridge University
Press)
Editor of the Oxford Handbook of Computational Linguistics (Oxford University Press)
Editor-in-Chief of John Benjamins’ book series in Natural Language Processing (NLP)
Editor Consultant of Oxford University Press publications in Computational Linguistics
Chair or member of a number of Programme Committees and Editorial Boards
11. Research Group in Computational Linguistics: key personnel
Dr. Constantin Orasan (Deputy Head of the Group)
Reader in Computational Linguistics; 60+ publications, PI on a
number of projects, leading figure in summarisation, extensively
involved in Master programme teaching
Dr. Michael Oakes
Reader in Computational Linguistics, leading figure in areas
such as information retrieval, authorship identification, statistical
methods for linguistics and translation
12. Research Group in Computational Linguistics: key personnel
Prof. Patrick Hanks:
Professor of Lexicography, leading figure in
(computational) lexicography and corpus-based methods
to dictionary compilation
Dr. Le An Ha:
Lecturer in industrial Natural Language Processing, Project
manager of the US NBME-funded project, involved in NLP
application to e-learning and industrial projects
Richard Evans
Research fellow, involved in NLP application to healthcare
Led the FIRST project proposal process
13. Delivering cutting-edge research in
Coreference/ anaphora
resolution
Automatic generation of multiple
choice texts
Text summarisation
Question answering
Temporal processing
Named entity recognition
Lexical knowledge acquisition
Discourse processing
Information extraction
Computational lexicography
Text simplification
Plagiarism detection
Evaluation of NLP
•
•
•
•
•
•
•
Topics related to translation
Term extraction
Machine Translation
Multilingual NLP
Translation Memory
Translation Universals
Comparable corpora compilation
for translators
• Statistical methods to translation
• Generation of test items
14. Some recent highlights
RAE’2008 feedback: research output internationally leading, internationally
excellent and internationally recognised
World best performing system in temporal processing (Georgiana Puscasu)
World best cross-lingual information retrieval system with English as target
language (Iustin Dornescu, Constantin Orasan, Georgiana Puscasu)
World best GikiP system (a competition dealing with geographical questions
on Wikipedia) (Iustin Dornescu)
Best anaphora resolution system (Iustin Dornescu)
Oxford University Press statement that the Oxford Handbook of
Computational Linguistics has been the most successful OUP Handbook
ever.
15. Project in focus/success story: Rapid Item Generation
Two projects funded by a major US Board of Medical
Examiners (NBME) on generation of test questions for the
medical domain
Pioneering computer-aided approach
First stage successfully passed real user testing
For English but possibility to extend to other languages
Since then, the NBME have gone on to request an annual
rolling contract of around £100,000 for us to continue working
on items for them.
They are currently trialling a second project with us which, if
successful, will bring in an additional £45,000 p.a.
16. Recent EC-funded projects
QALL-ME (Question Answering Learning technologies in a
multiLingual and Multimodal Environment)
Funding body: European Commission FP6 ICT
Total EC contribution: €2,400,000. WLV share: €700,000.
Runs from October 2006 – September 2009
TELL-ME (Towards English Language Learning for
MEdical professionals)
Lifelong Learning Programme Leonardo da Vinci
Total EC contribution: €370,401. WLV share: €95,375.
Runs from January 2012 – December 2013
FIRST (A Flexible Interactive Reading Support Tool)
EC FP7
Total EC contribution: €2,008,754. WLV share €487,440.
Runs from October 2011 – September 2014
17. Other ongoing projects
DVC, AHRC, £605,586
Funding body: AHRC
Total contribution: £605,586. WLV share: £605,586.
Runs from October 2012 – September 2015
NBME projects, NBME, more than £1,000,000
Funding body: NBME
Total contribution: > £ 1,000,000. WLV share: > £ 1,000,000.
Runs from January 2004
18. Strategic topics
Language technology for medical applications (including
language disorders)
E-learning
Translation Technology
Bridging the gap between academia and the industry
Impact on society
20. University of Málaga (Spain)
Research Group in Lexicography and
Translation (Lexytrad, HUM-106)
21. Index
1. Aims and activities of UMA
2. Research Group HUM-106
3. Expertise - HUM 106
4. Key staff involved in TELL-ME
22. Aims and activities of UMA
The Universidad de Málaga (UMA): over 36,000 students
and over 2,500 teaching staff.
Well established history in regional, national and European
project management: 73 international projects (at present
23 onging European projects).
National and international patents for the results of its
research.
UMA is an International Campus of Excellence
(Andalucía TECH) since 2010.
Watch http://www.youtube.com/watch?v=_nXoV8oiGvo
23. Research Group HUM 106 (I)
The research group Lexicography and Translation (HUM-106) at UMA is
an international leader in the field of corpus-based Translation
Studies, E-Learning and Translation Technologies.
Directed by Prof. Gloria Corpas since 1997.
The group comprises 14 researchers and is a recognised leader in
areas of E-Learning, Linguistics, Corpus Compilation, Multilingual
Lexicography, Terminology, Translation Training, Translation Studies,
including Revision, Quality Control, Translation Technologies and Usercentred Translation Evaluation.
24. Research Group HUM 106 (II)
The group works with a number of languages, including Spanish,
German, Italian, French and English.
The research group HUM-106 was rated as one of the top
performing units within Arts and Humanities in the 2010
Autonomic assessment exercise by the Andalusian regional
government (97 points out 100).
Further information at http://www.uma.es/hum106
25. Expertise - HUM 106 (I)
International R&D Projects
2004-2006
- Standard Linguistico Europeo per il Settore del Turismo (SLEST) [Linguistic
standard for the tourism industry]. Funding source: European Comission (20042006).
Funding source: Lifelong Learning Programme (LLP)
2004-2007
- HESPERIA. Repertorio analítico de lexicografía bilingüe: diccionarios italianoespañol y español-italiano. [HESPERIA: Analytical bilingual lexicography index:
Spanish/Italian – Italian/Spanish dictionaries].
Funding source: Italian Ministry of University and Scientific Research (MIUR).
26. Expertise - HUM 106 (II)
2005-2008
- ACTUAL: Lingüística contrastiva [Actual: Contrastive Linguistics]. Funding
source: Italian Ministry of University and Scientific Research (MIUR).
2008-2010
- CHINESECOM – Competences in Elementary Chinese as a mean to improve
competitiveness of European Union companies. Funding source: Lifelong
Learning Programme, (LLP) - Key Activity 2 - Multilateral project.
2012-2013
- TELL-ME (Towards European Language Learning for MEdical professionals).
Funding source: Lifelong Learning Programme, (LLP) - Key Activity 2 Multilateral project.
27. Expertise - HUM 106 (III)
National R&D Projects
1999-2002
- Diseño de un tipologizador textual para la traducción automática de textos jurídicos
español → inglés/alemán/italiano/árabe). [A Textual Typologiser for Machine-Translation
of Legal Texts (Spanish « English/German/Italian/Arabic)].
Funding source: Spanish Ministry of Education: Research & Development National
Programme.
2003-2006
- TURICOR: Compilación de un corpus de contratos turísticos (alemán, español, inglés,
italiano) para la generación textual multilingüe y la traducción jurídica. [TURICOR: A
multilingual corpus of tourism contracts (German, Spanish, English, Italian) for automatic
text generation and legal translation].
Funding source: Spanish Ministry of Science and Technology.
28. Expertise - HUM 106 (IV)
2008-2011
- Espacio único de sistemas de información ontológica y tesauros sobre el
medio ambiente: Ecoturismo
Funding source: Spanish Ministry of Education: Research & Development
National Programme.
2012-2015
- INTELITERM: Sistema inteligente de gestión terminológica para traductores.
Funding source: Spanish Ministry of Education: Research & Development
National Programme.
29. Expertise - HUM 106 (V)
Regional R&D Projects
2006-2009
- La contratación turística electrónica multilingüe como mediación intercultural: aspectos
legales, traductológicos y terminológicos. [Multi-lingual Tourism E-contracts: legal,
translational and terminological aspects].
R&D Project for Excellence. Andalusian Ministry of Education, Science and Technology.
2008-2012
- Nuevo diccionario de aprendizaje (learners' dictionary) del español como lengua
extranjera de difusión on-line.[New on-line learners’ dictionary of Spanish as a Foreign
Language].
R&D Project for Excellence. Andalusian Ministry of Education, Science and Technology.
30. Expertise - HUM 106 (VI)
Others
2 Coordinated Research Activities
5 Networks
More than 20 E-learning and Innovation Projects
More than 20 Thesis Dissertations
More than 40 M.A. Dissertations
For further information see http://www.uma.es/hum106/investigacion_en.html
31. Key staff involved in EXPERT (I)
1. Prof. Gloria CORPAS (gcorpas@uma.es)
-
Professor in Translation and Interpreting at UMA.
-
Prof. G. Corpas in no. 2 in the Spanish national ranking of Translation and Interpreting
(http://hindexscholar.com)
-
She acts as a Ministry advisor on the Bologna Process via the Spanish Agency ANECA.
-
She has been actively involved in the development of the UNE-EN 15038:2006 as AEN/CTN
174 and CEN/BTTF 138 Spanish delegate. Spanish expert for the future ISO Standard (ISO
TC37/SC2-WG6 "Translation and Interpreting").
-
Her publications also deal with didactic innovation, the design of virtual university knowledge
communities for Translation studies, virtual collaborative environments, e-learning platforms and
virtual teaching of subjects specializing in scientific and technical translations.
-
She has one patent (ReCor), and she received in 1995 the Euralex Verbatim Award and in
2007 Spanish Translation Technologies Observatory Award, with Dr. M. Seghiri.
32. Key staff involved in EXPERT (II)
1. Dr. Jorge LEIVA (leiva@uma.es)
- Senior Lecturer in Translation and Interpreting at UMA
UMA and professional translator.
at
- His research fields include specialised translation and phraseology.
- From September 2008 to March 2009 was awarded a research grant
for Harvard University (Massachusetts, USA).
- He has also been a member of a variety of research projects
focusing on specialised translation, text corpora and e-learning.
- University’s 2005 Ph. D. Best Student Prize.
33. Key staff involved in EXPERT (III)
3. Dr. Miriam SEGHIRI (seghiri@uma.es)
-
Senior lecturer in Translation and Interpreting at UMA.
-
She has also worked at Dickinson College (PA, USA), the University of Murcia and the
University of Cordoba.
-
She has participated in several European, national and regional R&D projects.
-
She has been awarded several research grants for Dickinson College (PA, USA) and Università
di Perugia (Italy).
-
Her research fields range from specialised translation to corpus linguistics and ICTs, the
outcome of which has been made public in national and international academic conferences
and publications.
-
-She has one patent (ReCor), and she received in 2007 Spanish Translation Technologies
Observatory Award, with Dr. G. Corpas. University’s 2006 Ph. D. Best Student Prize.
34. Key staff involved in EXPERT (V)
5. ESRs
ESR1:
Anna Zaretskaya, from Russia. Investigation of translators’
requirements from translation technologies (Supervisor: Miriam Serghiri at UMA, and
co-supervised by Elia Yuste from Pangeanic). Permit Visa: pending.
ESR3:
Hernani Costa, from Portugal. Collection and preparation of multilingual
data for multiple corpus-based approaches to translation (Supervisor: Dr. Gloria Corpas
at UMA and co-supervised by Marco Trombetti from Translated and ER1). ERS3 signed
his contract on the 2nd of September 2013 .
6. ERs
ER1 will work on Investigation of automatic methods for collection and preparation of
multilingual data (Supervisor name: Marco Trombetti at Translated and co-supervised by
Jorge Leiva from UMA).
36. - An academia-industry research consortium dedicated to delivering
disruptive innovations in digital media and intelligent content such
as multilingual content analysis
- Led by Trinity College Dublin and co-hosted by Dublin City
University
- Sponsor by both Science Foundation Ireland and Industry
Partners including Symantec, DNP, Microsoft, Intel, Xanadu,
WeLocalize, Alchemy
37. CNGL Research Themes
Tuning Text
Analytics
Event & Opinion
Extraction
Content Aware
Multilingual
Search
Contextualisation
Modality
Independent
Intelligent Machine
Translation
Social Localisation
Intelligent
Post-Editing
38. CNGL @ Dublin City University
Professor Josef van Genabith: NLP, MT
Professor Qun Liu: MT, NLP
Dr. Gareth Jones: IR, Multi-Modal
Dr. Sharon O'Brien: Translation Technology
Dr. Jennifer Foster: NLP
40+ staffs and PhD students
41. 15th company in
Southern Europe and
154th in the world
according to
Common Sense
Advisory’s 2013 listing
42. hermestr@hermestrans.com
www.hermestrans.com
Madrid Office:
Cólquide, 6 - portal 2, 3.º - I
Edificio Prisma
28230 Las Rozas (Madrid, Spain)
Teléfono: (+34) 91 640 7640
Fax: (+34) 91 637 8023
Malaga Office:
Parque Tecnológico de Andalucía
Av. Juan López Peñalver, 17 - 3.ª - 6
Edificio Centro de Empresas
29590 Campanillas (Malaga, Spain)
Teléfono: (+34) 952 020525
Fax: (+34) 952 020529
43. COMMITMENT WITH QUALITY:
Cooperation with official agencies
•
Company present in the Spanish Technical Committee #174 at
AENOR for quality translation services, with the support of the
European Committee for Standardisation (CEN), the Spanish
Standardisation Association (AENOR) and the European Union of
Translation Companies Association (EUATC).
•
Juan José Arevalillo, Hermes Traducciones Managing Director, is
the current Chairman of the Spanish Technical Committee #174 in
AENOR for translation and related services.
44. SGR
PERFORMANCE MANAGEMENT SYSTEM
PRODUCTIVITY AND QUALITY CONTROL
• Daily monitoring of quality and productivity of our team in order to
guarantee an improved control over our translations
• Review, revision and edit of our translations by a second or third
specialist other than the original translator
• Use of proprietary templates for revising,
reviewing and editing our translations in
compliance with EN15038 quality standard
• Use of the LISA QA MODEL standard for
localisation review and SAE J2450
standard for automotive translation review
45. PLUNET-BASED
TRANSLATION PROJECT MANAGEMENT
• End to end translation project management system through a Plunet
platform
• Compliant with our double quality certification requirements
46. HERMES DIFFERENCES
•
Founded in 1991 by former employees of the Localisation Group of Digital
Equipment Corporation (currently Hewlett-Packard).
•
Specialising in software and website localisation, as well as technical
translation.
•
70% of our production is done by our own in house resources.
•
Translation services in 30 language pairs.
•
Ongoing training of our staff.
•
End-to-end solutions for our customers.
•
Internal department of applied technology, including MT.
Image of Hermes god at
Louvre Museum.
47. HERMES EXPERIENCE
• 28 years of localisation experience (22 as a company and 6 at Digital
Equipment Corporation, currently Hewlett-Packard).
• Over 60,000 localisation projects in 22 years, including multi-lingual projects.
• Comprehensive expertise and know-how in computer-assisted translation and localisation-specific
applications: SDL-Trados product family,SDL Studio 2011, memoQ, Déjà-Vu, IBM Translation Manager,
Star Transit, WordFast, Catalyst, Passolo, across, Idiom World Server, Microsoft Helium, Microsoft
Localisation Studio and many others.
• Comprehensive expertise and know-how in quality control programs: HelpQA, HTML HelpQA, Apsic
Xbench, MS Help Workshop, MS HTML Help Workshop and others.
• Comprehensive know-how in DTP, text processing and imaging applications: Adobe FrameMaker,
Microsoft Word, Adobe InDesign, Adobe PageMaker, PaintShop Pro,
Adobe Illustrator, Adobe Photoshop, etc.
• Proprietary terminology database covering more than 1,000,000 entries of different languages and
domains.
• 35 million of managed words per year, and an average of 6,000 translation per year.
• Centralized Plunet-based translation project management system.
52. Pangeanic
• Pangeanic took the initial versions of Moses in 2009 as an in-house project
to help translation production needs. It was the first company in the world to
transition Moses from academic to a commercial environment, as reported
in Euromatrixplus.
• The small in-house project grew into a full platform overcoming many of its
limitations with a full set of new features and offering the translation
community the opportunity to have machine translation for the masses.
• The platform now includes full re-training features, glossary upload, a full
TMX / training material management system, the ability to create engines on
the fly as well as the possibility to hybridate with pre- and post- modules.
• Our presentation will describe the tool we have made available for the
project.
54. Who is Translated?
Web-based Language Service Provider
Since 1999, providing human translation in 80 languages to over
35,000 customers thanks to 70,000 professional translators.
Tech Company
Focus on technology to automate processes and make
translation more efficient.
55. Workflow Automation
Fully automated translation management system
that connects customers and translators.
Automate all repetitive tasks and focus only on
what brings value to our customers.
56. Content Reuse
MyMemory
Largest translation memory server (6 billion words)
Integrated in most computer-assisted translation tools
100% Free
Leverage existing linguistic content to make translators
more productive.
57. Translation Environment
MateCat
Deep integration of MT - MT technology that learns from the
users in real time
Collaborative environment - Online translation with multiple
users
Fast and easy to use - Virtually no learning curve
Increased privacy protection – Clients’ documents are not
sent out to translators
60. USAAR: institution
# students: 18 500
16% international students
• Dept. of Applied linguistics, Translation
and Interpreting
• Dept. of Computational Linguistics &
Phonetics
• German Research Centre for Artificial
Intelligence
• Cluster of Excellence on Multimodal
Computing and Interaction
• Max Planck Institute for Computer Science
• Max Planck Institute for Software Systems
61. USAAR: WP4
WP4 Language technology,
domain ontologies and terminologies
Dr. Paul Schmidt
Chair of Machine Translation
in charge of scientific and
technical/technological
aspects
Prof. Elke Teich
Chair of English Linguistics
and Translation Science
in charge of administrative,
legal and financial aspects
José Manuel Martínez
research assistant
administration
62. USAAR: ESRs
Santanu Pal – ESR2
Investigation of an ideal translation
workflow for hybrid translation
approaches
India
B.Tech, Computer Science &
Engineering
Certification course on Linguistics
M.Tech, Computer Technology
Thesis: “Improved Alignment in
Statistical Machine Translation”
Liling Tan – ESR5
Use of terminologies and
ontologies to improve corpusbased approaches to translation
Singapore
BA in Linguistics
MA in Computational Linguistics
Thesis: Examining Crosslingual
Word Sense Disambiguation
64. University of Sheffield
Natural Language Processing Group
•
Since 1993
•
Areas: language resources and architectures (GATE), information access
•
Q&A, summarisation), foundational topics
•
Collaboration with Machine Learning and Speech groups
•
Newly created MT lab
Academics doing research on MT
• Lucia Specia
• Trevor Cohn
• Rob Gaizauskas
Other MT people
• 3 post-docs, 2 ESRs/PhD students, 5 PhD students
65. Projects and areas
of interest (I)
•
Modist (EPSRC): Modeling Discourse in Statistical
Translation
•
Barista (EPSRC): Non-Parametric Models of Phrase-based
Machine Translation
•
Expert (EU): EXPloiting Empirical appRoaches to Translation
•
QTLaunchpad (EU): Preparation and Launch of a LargeScale Action for Quality Translation Technology
66. Projects and areas
of interest (II)
•
SlaTr (Google): A Joint Model of Spoken Language
Translation
•
QuEst (PASCAL2 Harvest): Open source tool for MT
Quality Estimation
•
TaaS (EU): Terminology as a Service
•
ACCURAT (EU): Analysis and Evaluation of Comparable
Corpora for Under Resourced Areas of Machine Translation