According to CNGL's Prof. Vincent Wade, one of the most significant changes to people’s lives in recent years has been the explosion of content. The volume generated, the accuracy of targeted audience and the means of accessing content via different media and devices is changing continuously. While organisations, communities and individuals are increasingly seeking content and services to be delivered in their own language, according to their needs, preferences and context, corporations are increasingly seeking to leverage content created by users and engage in closer interaction with global customers and communities. This presentation outlines the embedded intelligence for interaction with dynamic global content – i.e. content that can leverage cloud services for advanced analytics, search optimization, translation, localisation, personalisation and interaction across the delivery chain, enabling the aggregation, machine translation and interaction with multilingual user-generated, corporate and open content to satisfy the global customer.
2. Objective
• To Examine the Possible influences and
Future Solutions for Global Content
Ask the Questions
• Revolution or Evolution in Global Content ?
• What Scientific Challenges for Next
Generation Content ?
• How will it impact Business in the Future?
2
4. • Commoditisation of Content and its localisation
– Business Model on ‘cost per word’ rather than value
– Constant optimisation of worklfow process
• Translation/Localistion becoming increasingly
‘continuous’ rather than ‘batch’ oriented
– Granular & continuous
• Increase in Variety of Content
– Traditional, multimadia, games based, social media,
multimodal, video, ……
• Velociity (of Content) :
: 4
6. But Localisation is just a component in the
Content Value Chain
– Upstream dependencies
– Down stream implications
6
7. MultiLingual Web
English only 27% of all languages on Internet in 2010
Facebook 50M 2007 600M 2010 …. 1L 2007 77L 2010
Twitter 55% tweets non-English February 2010
The Mobile Web
Multi
Moda
l
Mobil
e
2 Billion Web users
5+ Billion mobile phones
Multimodal access/interactions
Text
Speech
Video
Pictures
Audio
User Generated Content
Beginning to outpace professionally
edited content
New business models
Business Intelligence
Customer support
Customer loyalty
Challenge:
Noisy data, ungrammatical, spelling,
…:
“Thank you for the wiki links, do you
compiled adn updated list ?”
Diversity of Users
Prior Knowledge &
Expertise
Device and
Modality
Context of Use
Aims and Goals
History,Preferences &
Culture
Language
& Communication
Style
10. 10
As Velocity Climbs…
– Comprehension Falls
As Variety Climbs…
– Consumability Falls
As Volume Climbs…
– Findability Falls
11.
12. 12
Build and Sustain world leading centre of excellence in
Global Intelligent Content and revolutionise technology for
the global content value chain
Deliver significant impact for industry and create
commercialisation benefits for Irish-based enterprises and
Irish entrepreneurs
Empower people and organisations in Ireland to achieve
global impact via social and commercial improvements
16. PAPER: Computational Stylometry: Who's in a Play
(Vogel & Lynch, 2008)
PATENT: A Data Processing System and Method
for Assessing Quality of a Translation
17.
18. Personalisation in the wild: providing personalisation
across semantic, social and open-web resources.
(Steichen, et al 2009)
PATENT: A network system for generating application
specific hypermedia content from multiple sources
19. 19
As Velocity Climbs…
– Comprehension Falls
As Variety Climbs…
– Consumability Falls
As Volume Climbs…
– Findability Falls
21. Where the content
proactively enables its
Own Discovery,
Analysis, Transformation
& Delivery
By embedding new
levels of knowledge and
intelligence into
the content
21
22. • Understand +
In Perspective
• Discover + Relate +
Link
• With regards to
Language + Culture
• Adapting to Individual
and context of use
• Considering Right-Time
+ Right-Way Delivery
22
39. CASE STUDIES Global Intelligent Content :
Personalisation for Corporate & User
Generated Content
PROF VINCENT WADE CNGL DIRECTOR,
TRINITY COLLEGE DUBLIN
40. Problem / Market Need
One size doesn’t fit all!
People have different:
• Roles, Needs, Abilities, Preferences,
Contexts of use, Devices …
Technology Solution
Bringing the right Content and
Services to the end-user
• Driven by Strategies, Models,
Meta-data & Behaviour
• Flexible and configurable
• Service based
41.
42. 1. Personlised Information Portal (automated
presentation generation)
2. Dynamic Personalisation of Activities, Content &
Portal Services
3. Personalised Corporate Portals for Informal
Learning (across Social, Open and Corporate
Media)
4. Personalisation in Casual Games
47. Key Innovation: Personalized, point-of-need learning
•e-Learning content disaggregated into learning objects and
meta-tagged
•Recomposed into a Learning Episode tailored to searcher’s role
within organisation
•Learning Episode also presents searcher with content related
to their search term which they may not have considered
Increases the potential reuse of content
• 67 Courses (402 topics) from Leadership
domain
• 402 topics tagged to 45 competencies
Example: Delivering a difficult message tagged
31 times in different contexts
Personalised Corporate Portal
48. Key Innovation: Enhanced Search
Combination of Semantic and Social Search
searches over all content repositories to
return resources that are:
• Context-relevant
• Highly rated by community of peers
Personalised Corporate Portal
49. Key Innovation: Informal Learning Data
Analytics
•Quantitive data (contribution level)
•Qualitative data (contribution value)
•Manager View for 360°Appraisals
•Employee View
•Updates in real time
52. • Instantaneous provision of context-appropriate
analysis, translation, (re)composition and
delivery
• Dynamic adaptation, personalisation and
recomposition for user empowerment and
content contextualisation
• Enhancement of interaction through multi-
modal and multi-device delivery
• Integration and optimisation of intelligent
content processes across the value chain
52
53.
54. Effective Content Curation, Management and
Personalisation
for Corporate & User Generated Content
PROF VINCENT WADE CNGL DIRECTOR,
TRINITY COLLEGE DUBLIN
55. Content Curation and Search
for Corporate & User Generated Content
PROF VINCENT WADE CNGL DIRECTOR,
TRINITY COLLEGE DUBLIN
56. • Automated Generation of Metadata
from Corporate and User Generated
Content for usage across the content
value chain
– Text Analytics
– Entity Extraction
– Event Extraction
– Sentiment analysis
– Metadata Generation
– Content Quality Estimation
– Data Visualisation 56
Welcome to CNGL II Director’s OverviewDelighted to introduce the CNGL Centre plans for the next five years and beyond
A "perfect storm" is an expression that describes an event where a rare combination of circumstances will aggravate a situation drastically
1. Content has become overwhelming2. I know for me: -> average about 250 emails (and that’s not just from Stev and his colleagues in SFI!) but responding to my work colleagues, I’m on organising committees of about five conferences per year and also have continuous email contact with research and undergrad students, - not to mention friends, not to mention the important ones – wife, children - such content includes simple planning of meetings/event to draft Project Proposal, reviewing PhD thesis and journal paper collaborations!2B -> But it doesn’t end there - I get annoucements via my numerous Social and Professional Networks ( through different social network providers: facebook, linkedin) -> I regularly use forums, wikis, blogs for my day time work as well as home life -> that’s all before I mention the SMS texts, tweets, track announcements, texts, tweets blogs etc… -> location aware applications – notifying me based on proximity3. -> PROBLEM – Content Coming So Fast; DIFFIICULT MAKE SENSE & PRIORITISE -> SO MUCH => hard to keep track of it even if you can recognised it as important (of the conversation) -> USEFUL CONTENT difficult to find even if you’ve used it before -> INTERTANGLED with so much other content e.g. email to do with work, home, family/friends, events, lifestyle interests….The future can only bring even more content and even faster velocity and variety!
Today our approach is to develop yet more applications: - twitter for SHORT, messaging/broadcasting - sms’s for inter personal messaging - location aware for ‘in context’ messaging and notification - email for longer messages - forums, Wikis, Blogs, voice mail, conferencing, We are even now inventing applications to manage our applications!!!AND we consuming on more mobile devices & embedded devices
Volume:: Sheer size of UGC, Big Data Variety: TEXT, Multilingual, Multi-Modal, Velocity: Speed at which such content becomes available
WORLD LEADING centre for GLOBAL INTELLIGENT CONTENTSignificant Impact for INDUSTRY and create COMMERCIALBENEFITS for Irish Empower People and Organisation for GLOBAL IMPACT
Unique blend of world leading Industry and Academia5 YEARS track record in successful scientific collaboration and commercialisation impact.Leading technology companies MICROSOFT, SYMANTEC, DNP, WELOCALISE to(indigenous) industry in the multilingual content technology space ANCHEMY, VISTATEC, innovative Startups looking to exploit and commercialise early results of CNGLXcelerator – in the cloud based translation tool industry, Emizar in the Multi Lingual Customer Care (Self Service),Rosetta Foundation – which looks beyond the scope of normal translations tto support and enable community based(crowdsourced) translations)leading academic in their fields from world class universities Trinity College Dublin, Dublin City University, University College Dublinand University of Limerick.The provide a complementary set of excellent academics and researchers
Volume:: Sheer size of UGC, Big Data Variety: TEXT, Multilingual, Multi-Modal, Velocity: Speed at which such content becomes available
What if we could attack the problem at source – rather than by applicationsWhat if the Content could be more intelligent!!!!!What if the content itself could - TRANSFORM – FROM UNSTRUCTURED structured structure content to meta-data rich structured content? - MAKE ITSELF MORE DISCOVERABLE (multilingually), make speech content more searchable - TRANSLATE itself OR ASSIST its translation into multiple langauges - RECOMPOSE Itself based on the indended context of use, the reactions/interaction with users, - ADAPT to the device or delivery environmentThis is the challenge for CNGL II over the next five years and beyond…..
CNGL 2 is focusing on the grand challenge of ‘Global Intelligent Content’Where the content proactively embles is OWN DISCOVERY, ANALYSIS, TRANSFORMATION and DELIVERYWe do this by embedding new levels of knowledge and intelligenceIn to the content Intelligent Multi Modal Content & Communications enabled by advanced embedded Language & Content Technologies Multilingual Personalised Mobile Accessible & Deliveryed Multimedia & MultiModal (text, graphic, video, speech)
Localisation and CNGL I experience has taught us that the content technolgies permeate all Parts of the lifecycle THAT to address the CONTENT PROBLEM we MUST LOOK ACROSS the content value chain and not just focus on specific pieces. It’s the COMBINATIONS of functionalities that provide the IMPACTNeed for a CONTENT CENTRIC APPROACHNeed to UNDERSTAND the content IN PERSECTIVENeed to DISCOVER, RELATE and INTERCONNECT the content ACROSS LANGAUGES and VarietyNeed to TAKE LANGUAGE AND CULTURE int CONTEXTNeed to EMPOWERING THE INDIVIDUAL and COMMUNITIESNeed to CONSIDER RIGHT-TIME and RIGHT WAY DELIVERYUnderstand the content & analyse it In PerspectiveDiscover, Relate and inter Link the contentTransform the content for native Language Access and SensitivityAdapt, Recompose and Personalise the Content for the individuals need, content or communityDeliver the content in the right way, at the right level of interactionManage the Content for appropriate persormance and optimisation
Semantically aware – The word semantic means “meaning”. Semantically aware content is content which has been tagged with metadata to identify the kind of content within it. For example, you might tag your content with industry, role or audience, and product, making it possible to automatically build customized information sets based on audience or industry. As more organizations begin to offer personalized content on demand (content that matches the specific needs of an individual user and that is dynamically assembled and delivered upon request) semantically-rich content becomes critical. Where content is pushed to others, or repurposed in mashups, it becomes even more important to ensure that our content is semantically tagged. Semantic metadata makes it possible to automatically locate the right content, deliver it at the right time, to the right person, for the right purpose, in the right language, and in the right format. Discoverable – Content that is semantically-aware and structurally-rich is more easily discoverable content, especially when it’s in XML, it’s tagged with rich metadata, and you’re using XQuery to help you find, retrieve, prepare and publish it. Reusable – Reusable content is content created once and used many times throughout a variety of information products. A warning statement, for example, might be created for use on a product box, but it may also be reused in a printed user guide, on the product website, in training materials, etc. Reusable content is common in technical documentation, but it is fast becoming an approach used in the creation of business documents like marketing materials, proposals, contracts, and policies and procedures. There are various reuse types that can be employed to help us identify, locate, retrieve, and reuse content. Reconfigurable – Structured content is content separate from format, in other words the look and feel of the content is not embedded in the content. That makes it very powerful. Knowing the structure of the content in advance, we can programmatically output it to multiple channels reconfiguring it to best meet the needs of the channel, or we can automatically mix and match content to provide us with specific information personalized to meet customers needs. We can even transform content (reconfigure it) from one structure to another, but only if we know what the structure is in the first place. Adaptable – We frequently create our content for a particular need or audience, but content can be adapted (used in a different way), often without our knowledge, to meet a new need. Think of mashups, we don’t know how our content is being aggregated, but it can be because we have structured and tagged it intelligently.
Semantically aware – The word semantic means “meaning”. Semantically aware content is content which has been tagged with metadata to identify the kind of content within it. For example, you might tag your content with industry, role or audience, and product, making it possible to automatically build customized information sets based on audience or industry. As more organizations begin to offer personalized content on demand (content that matches the specific needs of an individual user and that is dynamically assembled and delivered upon request) semantically-rich content becomes critical. Where content is pushed to others, or repurposed in mashups, it becomes even more important to ensure that our content is semantically tagged. Semantic metadata makes it possible to automatically locate the right content, deliver it at the right time, to the right person, for the right purpose, in the right language, and in the right format. Discoverable – Content that is semantically-aware and structurally-rich is more easily discoverable content, especially when it’s in XML, it’s tagged with rich metadata, and you’re using XQuery to help you find, retrieve, prepare and publish it. Reusable – Reusable content is content created once and used many times throughout a variety of information products. A warning statement, for example, might be created for use on a product box, but it may also be reused in a printed user guide, on the product website, in training materials, etc. Reusable content is common in technical documentation, but it is fast becoming an approach used in the creation of business documents like marketing materials, proposals, contracts, and policies and procedures. There are various reuse types that can be employed to help us identify, locate, retrieve, and reuse content. Reconfigurable – Structured content is content separate from format, in other words the look and feel of the content is not embedded in the content. That makes it very powerful. Knowing the structure of the content in advance, we can programmatically output it to multiple channels reconfiguring it to best meet the needs of the channel, or we can automatically mix and match content to provide us with specific information personalized to meet customers needs. We can even transform content (reconfigure it) from one structure to another, but only if we know what the structure is in the first place. Adaptable – We frequently create our content for a particular need or audience, but content can be adapted (used in a different way), often without our knowledge, to meet a new need. Think of mashups, we don’t know how our content is being aggregated, but it can be because we have structured and tagged it intelligently.
Whole focus on semantic uplifting…That involves 1) tuning text analytics2) Event & opinion extraction3) Purpose driven curation
Whole focus on making MULTILINGUAL information more discoverableCONTEXT AWARE MULTILINGUAL DISCOVERY – ADAPTIVE QUERIESSEARCH CONTEXTUALISATIOModality Independed
Gaps/Challenges: Content 1) Adaptive video 2) “Useful” Social Content – What are the actionable elements of social content that are out there…beyond sentiment what are the individual viewpoints/perspectives. Allows for a finer grain personalisationGaps/Challenges: Users 1) Context – How do we connect different perspectives & types of information in different ways 2) BehaviourGaps/Challenges: Intelligence 1) Service Adaptation – How do we compose new environments & connect information sources together 2) Game-inspired Adaptivity – (Gamification = High-fructose-corn-syrup of interface design!) More interested in this notion of flow.Research Projects 1) Community Adaptation 2) Multimodal Adaptation – How can affective & emotional queues allow us to improve adaptation and what can we flow backwards 3) Video Adaptation – Recomposing, summarising & personalising 4) Social Media Adaptation 5) Micro-adaptation 6) Semantic slicing 7) Linked-data search
Corporate Portal which providing access to personalised point of need learning….PERCOLATE technologies deployed within a social and collaborative learning environmentViews learners as active users and co-producers of learning resources
Training content Meta-tagged according to different contexts for use and role appropriateness. Has applications in the integrated eHR (Sigal) system you use in Bombardier in that each role has associated training…
Searches over all content repositories within an organisation – formal and informal to provide context relevant point of need learning A Google search relies on keywords would not pick up related terms or contextAs well as returning personalised Formal Content, the same search also returns relevant, highly rated Informal Content from the company Wiki and Communities of Practice. Mark finds relevant Wiki material near the top of the returned list. The informal learning list is ranked according to items which other learners, of similar profile to himself, have found useful. He, in turn, rates items which he finds useful so that others will find it in future searches.Employees are encouraged to contribute to the Wiki (you can add articles) and rate articles in the Wiki so that the most useful ones rise to the top of the search. It is the same situation in the role based communities of practice. Managers monitor such contributions as part of the 360 appraisal process.
Provides data analytics on informal learning activities… He also realises that he has appraisals coming up – his own with his manager and also the appraisals that he conducts with his own team. First he has a look at how he himself is doing in terms of contributing to the Wiki, Communities of Practice (not for this iteration) and Reflections. He can see not only the level of his contributions in relation to the company average but also , by clicking the More button the quality of his contributions (how they are rated by others). Similarly, because he is a manager, he can get the same information for each of his team members through the Management Tab which will improve the efficiency of the 360 review process identifying rising stars and employees to retain. Walkthrough the different report types using the More button. There are explanations for each on the left hand side of the reports. Show option of selecting specific dates etc to monitor levels of contributions over time.
Improved Learner SatisfactionImproved Quality and EffectivenessImproved MotivationImproved RelevanceReduced TimeReduced Cognitive LoadBased on Evaluations at Schools (K12), University and Corporate Learning
Instantaneous provision of context-appropriate translation and localisationDynamic adaptation, personalisation and recomposition for user empowerment and content contextualisationEnhancement of interaction through multi-modal and multi-device deliveryIntegration and optimisation of intelligent content processes across the value chain
Web becomes fluid or maleable - tranforms itself to suit your INTENT - Content & Services, Have to moMove beyond Personalised Search and Discovery / RecommendationPersonalisation proven for Effectiveness, Efficiency, User Satisfaction
You have copies of the slides, so I am not going to dwell on these, but suffice to say that there is a strong connection from our previous research that is informing our current research themes.Jennifer Foster, 2010. "cba to check the spelling" Investigating Parser Performance on Discussion Forum Posts. Proceedings of Human Language Technologies: 2010 Annual Conference of the North American Chapter of the ACL, , pp. 381--384, Los Angeles, CA. Best Short Paper. Joel Tetreault, Jennifer Foster and Martin Chodorow, 2010. Using Parse Features for Preposition Selection and Error Detection. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden. Jennifer Foster, Joachim Wagner and Josef van Genabith, 2008. Adapting a WSJ-Trained Parser to Grammatically Noisy Text Proceedings of ACL08:HLT, Columbus, Ohio Joachim Wagner, Jennifer Foster and Josef van Genabith, 2007. A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors. Proceedings of EMNLP-CoNLL 2007 , pp.112--121, Prague, Czech Republic The shared task system description paper is: Joseph Le Roux, Jennifer Foster, Joachim Wagner, Rasoul Samad Zadeh Kaljahi and Anton Bryl, 2012. DCU-Paris-13 Systems for the SANCL 2012 Shared Task. Working Notes on the First Workshop on Syntactic Analysis of Non-canonical Language.
RIAO 2010 - Jinming Min, Johannes Leveling, Dong Zhou, Gareth JF Jones (Doc expansion for image retrieval)SPIRE 2011 - Min & Jones SIGIR - Leveling et al 2010UMAP 2012 – Rami et al UMAP
Moore, A., Macarthur, V., & Conlan, O. (2012). Core Aspects of Affective Metacognitive User Models. In L. Ardissono & T. Kuflik (Eds.), Advances in User Modeling: selected papers from UMAP 2011 workshops. Lecture Notes in Computer Science 7138 (pp. 47-59). Springer-Verlag GmbH. doi:10.1007/978-3-642-28509-7Berthold, M., Moore, A., Steiner, C., Gaffney, C., Dagger, D., Albert, A., Kelly, F., Donoghue, G., Power, G. and Conlan, O. (2012); An Initial Evaluation of Metacognitive Scaffolding for Experiential Training Simulators, Seventh European Conference on Technology Enhanced Learning (EC-TEL 2012), Saarbrücken (Germany), 18 - 21 September 2012 (In Press)
GESPIN 2011 – Campbell and Scherer (2011)HCI 2011 - Schlogl Schneider Lux and Doherty 2011 Human computer interaction 2011T3L conference -Vazquez & Filip 2012, Campell & Tabata 2010 LREC 2010VSPSI 2011 - Campbell 2001 International workshop on Voice and speech processing social interaction 2011ICASSP 2011 - Campbell & Kane 2011 –Steichen et al - Hypertext 2011MLMI 2007- Campbell & Douxchamps 2007, CSCW Journal 2012 (accepted) Gavin, et alBachwerk, M., & Vogel, C. (2010). Modelling social structures and hierarchies in language evolution. In M. Bramer, M. Petridis, & A. Hopgood, (Eds.), Research and Development in Intelligent Systems XXVII (pp49-62). London: Springer.Bachwerk, M., & Vogel, C. (2011). Establishing linguistic conventions in task-oriented primeval dialogue. In A. Esposito et al. (Eds.), Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues (pp48-55). Heidelberg: Springer.Bachwerk, M., & Vogel, C. (2012). Language and friendships: a co-evolution model of social and linguistic conventions. In the 9th International Conference on the Evolution of Language. Kyoto, Japan.Campbell, N., & Scherer, S. (2011). On the fragmented nature of spoken interaction and an attempt to model its apparent regularities. In M. Karpinski, Gesture & Speech in Interaction.Campbell, N., Kane, J. (2011). Processing 'yup' and other short utterances in interactive speech. Proceedings of ICASSP 2011. Prague, Czech Republic.