Natural Language Generation and Natural Language Processing are going to be the third stage of Automation. Are you ready?
The machines are coming. Not only to automate content – that has been in play with Enterprise Content Management (ECM) for many decades – but to generate it as Natural Language Generation (NLG).
1. TWB_ / LEADER IN COMMUNICATING TECHNOLOGY 1
The Impact of Automation
on Enterprise Content
Rakesh Shuklai
, Founder & CEO, TWB_ii
Natural Language Generation and Natural Language
Processing are going to be the third stage of Automation.
Are you ready?
The machines are coming. Not only to automate content – that has been in play with
Enterprise Content Management (ECM) for many decades – but to generate it as
Natural Language Generation (NLG).
Let’s begin with a simple version of a Turing test. One of these passages was written
by a seasoned sports correspondent of a national newspaper, the other, by a machine
in a few seconds. Can you spot the machined text?
Sample 1
“Having jumped 13 places in a year, leaping from fourteenth to first,
Leicester City are easily the most improved side in the league and Jamie
Vardy’s role in their staggering rise cannot be overstated. The second top
scorer in the league with 24 goals, Vardy has scored 35.29% of Leicester’s 68
goals. Only Harry Kane and Odion Ighalo were a bigger source of goals for
2. 2 INDIA 0 96860.59020 | ISRAEL +972.544.603.612| US 832-689-6456/ WWW.TWB.IN
their team, with Kane scoring 25 of third-placed Tottenham Hotspur’s 69
goals (36.23%) and Ighalo scoring 15 of 13th-placed Watford’s 40 goals
(37.5%).
That underlines Leicester’s overall effectiveness. Although they conceded as
many goals as second-placed Arsenal, and one more than Tottenham, they
have been more consistent. They were first at Christmas, while Arsenal were
second and Tottenham were fourth. ‘It’s a magical season,’ Claudio Ranieri,
Leicester’s manager, says, justifiably so, given that a summer expenditure of
£26.7m on transfers made them the eighth lowest spenders.”
Sample 2
“It was a season for the ages for Leicester City as they lifted the Premier
League Trophy and were crowned champions of England. Leicester City
featured one of the league’s most skillful attacks, netting 68 goals. Jamie
Vardy led the way with an incredible 24 goals. In addition to their offensive
prowess, Leicester City possessed one of the strongest defenses in England.
Shipping only 36 goals all season, their defense was able to frustrate even the
most potent of attacks. Hoping to finish in the top ten after a fourteenth place
finish last season, Leicester City splashed out 26.70 million in the summer
transfer period. Leicester City sat in first place at Christmas after an
incredible start to the season, and they continued to impress the second half
of the season. After taking a few moments to reflect on the season, the
Leicester City manager weighed in with, ‘It’s a magical season.’”
As it happens, sports and financial services have led the march on content automation
with NLG engines. Once the data is fed into a spreadsheet – whether match and
individual scores or annual or quarterly reports – whole articles and investment
research and fund fact-sheets can be created using today's commercial technology.
And this content is device-appropriate, and interactive. In the case of financial
reporting, it supplies investment managers with content that enables split-second
investment decisions.
Content automation has been growing in banking and insurance too, with standard
operating procedures to be made available in a regulatory environment across
languages and geographies.
Stages of Enterprise Content Automation
Why does an enterprise need content? For the simple reason that better content makes
for better decisions – both, by external as well as internal customers. Nearly everyone
has come across poorly presented content, and recognizes the reach of its impact,
whether it takes place before purchase, during use, or while troubleshooting. This
could impact consumers such that they may not buy the product, or if they do, they
may not recommend it to others.
3. TWB_ / LEADER IN COMMUNICATING TECHNOLOGY 3
Content quality depends on people and teams with three principal attributes: great
English, a good grasp of the technical domain, and the understanding of tools and
technologies to present content. As any C-level executive or hiring manager will tell
you, this combination of skills isn’t easy to find. Content quality has always included
the dimensions of style, accuracy, consistency, and ease of comprehension. In the
digital age, the timeliness of content – getting it published and delivered faster – has
been increasingly flagged as high-priority. In the mobile age, quality concerns now
include ideas such as “device-appropriate” and “interactive” which are becoming the
new minimum requirements to satisfy content consumers.
Content Automation allows for enterprises to meet their deadlines, while matching
the quality and pace of production of content that is required in today’s business
world. Beyond solving today's problems, content automation can also enable a
company to be more agile including the ability to create new information products
and communications dynamically, as well as quickly support new generations of
devices and formats such as eyewear displays, smart watches, and more.
As AI becomes stronger, non-data-led content recognition and generation will
explode. Already the capability to create product descriptions for e-commerce,
datasheets for print and PDF exists. Product companies that make easily
componentized or versioned products have been using content automation for
technical support and technical documentation for at least a decade. But of course
there was no capability to parse the actual content, and as it becomes available, we
will see a surge in actual content generation.
Content Automation comes into being in three stages.
Content Automation 1.0: Enterprise Content Management
The problem with the traditional process of writing content in independent
organizations is the huge amount of rework and time involved. The quality is low,
and the content is not publication-ready.
• Tightly-coupled Content and Design: Content is often locked to one media
type because the author and designer commit content to design very early in
the process.
• Low reuse: Documents are typically disconnected, often leading to the same
content being recreated and translated for multiple documents and mediums.
With traditional content creation tools, content is reused by authors and
designers by copying and pasting content between documents and media,
increasing the opportunity for errors and inconsistencies.
• Updates and Collaboration: Reviews and approvals involve Word documents
and PDFs being emailed back and forth, which is time-consuming, error-
prone and expensive. Updates to content must be made manually across
multiple documents and media, requiring further rounds of review and
approval.
4. 4 INDIA 0 96860.59020 | ISRAEL +972.544.603.612| US 832-689-6456/ WWW.TWB.IN
• No Metadata: Content has traditionally been very document-centric, and
doesn’t contain metadata, which makes the reuse of content for different
audiences a manual and lengthy process.
The first generation of Content Automation helps global organizations streamline
their content processes, and enables them to deliver business-critical content with
precision – typically with the use of an Enterprise Content Management system. This
automation of process and single instance of content (single sourcing) found two
uses:
• Billing Statements have taken advantage of this automation to improve the
quality and timeliness. It is typically limited to converting and publishing
relational data from a database as a PDF and increasingly as Web and Mobile
HTML.
• Technical Documentation teams use this to improve efficiency for publishing
to multiple formats such as a printed user manual, one or more customer help
systems, and sometimes custom applications such as aircraft maintenance
systems. One of the biggest challenges in Technical Documentation is that
target formats are always changing and expanding.
Content Automation immediately shows key benefits:
• Productivity: Subject matter experts such as financial and legal analysts,
product managers, and government officials who contribute authored content
are 30-70 per cent more productive. They no longer have to waste time
manually “formatting” the content and they the ability to reuse already
existing components of content.
• Reduced Content Maintenance and Increased Agility: Content
componentization and managed reuse removes the need to copy/paste or
rewrite content that already exists. Rather than store content as monolithic
documents, a scalable content automation system enables authors to create,
manage, and deploy text, data, and media components as “single source of
truth” assets. For example, if a publication requires one or more legal
disclaimers, that disclaimer is stored once and used by reference in multiple
publications. If changes are required, the disclaimer is edited once and all
references to that disclaimer are automatically updated as well. Usability of
components is similar to copy/paste, but without the associated problems of
trying to manually update hundreds of different documents where paste was
used.
• Collaboration: The ability for cross-department teams to work in parallel on
complex and/or large publications by leveraging componentization and
automation creates better results faster.
• Quality: When managed reuse is deployed and content maintenance costs are
reduced, information quality increases dramatically: accuracy is improved;
consistency is dramatically improved; and time-to-market is reduced
significantly. Further, content automation can also generate omni-channel
outputs without manual intervention. So the resulting publications have
5. TWB_ / LEADER IN COMMUNICATING TECHNOLOGY 5
consistent style and branding, as well as enabling the inclusion of
interactivity features such as slideshows, pop-ups, animated text, and more.
• Time-To-Market: Content Automation can reduce the cycle time of a content
production workflow from months to days. A custom electronic component
manufacturer, who used to take 4 weeks of human effort to create a 40-page
product data sheet for each customer request for a specific configuration, can
now do so in hours and can produce output in for print, web, and mobile
seamlessly.
Content Automation 2.0: Natural Language Generation on data underlay
The cost of the Content Automation 1.0, custom, hand-crafted (and often
programmer-intensive) automaton is high and adaptability low. Also for single
sourcing to work the initial content is still written by the team tasked with a grasp of
English, technology and presentation. The InfoTrends™ Content Automation
Research surveyiii
illustrates this well:
• 50% respondents say increasing customer satisfaction is the cornerstone of
their content strategy in the next 12 months
• 76% respondents say their stakeholders want more mobile & interactive
content
• 30% respondents say their current ECM is difficult to configure for their
specific requirements
• 25% respondents state that their ECM doesn't support automated content
reuse and updating
• 50% respondents said PDFs are difficult to review and annotate
• 70% found email an inefficient way to review and approve content
The next stage of Content Automation addresses two aspects:
• Generation of actual language output based on underlying data context, and
• Separation of the content from its presentation
The initial example of the Leicester City reporting falls into this stage of automation
where the machine is generating text the text based on data and a larger pool of
contextual data. Already Google Rankings is a AI program and Google Analytics
uses Narrative Science tech to present dense analytics in a readable format, and the
resultant reports provide context in an accessible way.
But penetration in the business content is still limited. While the early-adopters see
the clear benefit of single sourcing, re-use and republishing content branded, design-
6. 6 INDIA 0 96860.59020 | ISRAEL +972.544.603.612| US 832-689-6456/ WWW.TWB.IN
rich, and interactive content with an overlay of analytics still requires a large amount
of new and different business language to be created, along with the creative. The
answer to that is through Language Processing.
Content Automation 3.0: Machine parses, understands and generates content
Kris Hammond, Chief Scientist at Narrative Science and a Professor of Computer
Science at Northwestern University, estimates that content written by algorithms will
make up 90% of journalistic reporting by 2030! This suggests that long-term,
automated content is going to have to get vastly more interesting as well as more
personalized.
AI can research swathes of data far quicker than a person and compile relevant
information and present it in relevant ways. So the author comes in to analyze the
automated report and add insight, context and flair to the piece. This is not so
different to the way national papers currently work. Local press agencies source
stories and send copy to the national dailies. The newspaper then uses in-house
writers to meet house guidelines.
AI relies on two things: smart algorithms and data points to create the context base on
which to parse and understand context. Thanks to the Internet of Things, data will be
available from cars, CCTV, social media, the internet, live video, people’s homes and
much more.
How should you go about Automating Content?
If you want to start on Content Automation, first prepare the groundwork. It starts
with creating a Content Automation strategy.
Laying down the strategy should be simple: figure out the structure, pour your content
in, automatically extract content as needed, and publish it everywhere. Once that is
achieved, create engines that generate content to fill in. Therein lies the complexity.
Even before we get to the automation piece, we need to recognize that information
exists in multiple areas and it differs in content, style, tone, and message. Customers
don’t know which one is correct, most up-to-date, or comprehensive. This can be
confusing and lead to poor customer experience.
• Pre-automation stage: First, create the framework to provide the right content
to the right person at the right time. Few brands know how to do this well. If
you want to create an exceptional experience, you need to figure this out.
• Content Automation: Second, we need processes and workflows to manage
all this content. While quality content is always a priority, we also need to
figure out how to automate what we can so that our efforts scale.
• Automating content generation: Once this is achieved we figure how to
generate content; make it distributable as marketing content, learning content,
and technical content; and make it ubiquitous across distribution channels
7. TWB_ / LEADER IN COMMUNICATING TECHNOLOGY 7
TWB_ and Content Automation
Ten years ago, TWB_ pioneered the offshore creation of technology content which
was hitherto created by companies either internally or with ‘consultants’ hired by
companies from vendors. TWB_ changed that by creating domain depth that mirrored
the customer’s own technology capability with deep SME teams for a variety of
industries including Information Technology, Defense & Aerospace, Engineering,
Life Sciences. TWB_ then integrated this technology capability with high-quality
content creation teams that could create product, learning and marketing content
across these industries, with the rigor of the software engineering process.
As TWB started scaling the it was natural to look for efficiencies in developing
content and helping customers break through the stovepipes in which their content lay
. That need coupled with our understanding of content technologies became the base
of the first level of automation we could provide our customers.
Content Automation 1.0 available since 2010
TWB released India’s first cross ECM integration platform called the “TWB Center
of Excellence in Technical Communication” (TWB COE) in 2010 and is a
copyrighted and trademarked process. The TWB COE allowed integration of
different ECM, technologies and programming to give enterprise customers a unified
view of content, automate content flow while integrating existing ECMs. It also
allowed enterprise customers to measure and manage the ‘quality of content’ at each
content node. TWB Consulting and research teams proved single sourcing, managing
quality of content and automation of information flow can save enterprises up to 80
percent cost and time savings. The platform brought together (a) content reuse by
having information available where required, when required, without duplication of
effort (b) integration of existing information and content management architectures to
deliver content automation.
Legacy, ECM, and open source integrations include TWB partners such as
Microsoft®, Author-it®, Madcap® , Adobe® suites, various opensource ECM, CMS
and LMS solutions as well as native XML/DITA implementations.
The 2 examples with automating content with 1.0 with the TWB COE platform are:
• Content Automation 1.0 for the world’s largest software company: The
customer which develops its own editors and workflow platforms needed to
bring in collaboration, workflow and versioning capability for internal use for
its distributed teams while using a familiar interface. The data is not XML
however and is proprietary to the company. The TWB COE solution included
workflow came from SharePoint®, versioning and single sourcing came from
Madcap® and manual conversion of some legacy data to XML and SGML
content migration to XML. Bringing in XML into proprietary system.
• TWB COE for Europe’s leading insurers: The customer serviced 13 markets
in as many languages in Europe with different regulatory requirements, and
2x as many markets in the America’s, Asia & Africa. The base legal
documentation was managed manually and different versions maintained
combining (a) One of ‘n’ insurance products for the market (b) regulatory
8. 8 INDIA 0 96860.59020 | ISRAEL +972.544.603.612| US 832-689-6456/ WWW.TWB.IN
compliance(s) for the market (c) localization of this content from English.
The flow of information was via e-mail. The TWB COE content automation
solution combined workflow and versioning from AuthorIT® information
fields from the SAP HANA® databases and some content migrated to XML
Content Automation 3.0 available for pilots
TWB_ is already piloting Pāṇini which uses NLP/Machine learning algorithms and
supervised learning techniques that analyzes documents for scope, legal, financial and
regulatory compliance provides meaningful insights, redlines documents for user
intervention.
i
Rakesh Shukla is the Founder of TWB_ the pioneer in technology content. At a graduate
level he wrote several papers on Fuzzy Set Theory and Self-learning in Neural Networks, and
now there is the exciting possibility of combining both AI and content in creating new levels
of content automation
ii
TWB_ has a customer base of more than forty Fortune 500 technology majors and a
footprint of customers ranging from the Silicon Valley, the US East Coast, France,
Luxembourg, Germany, India, Taiwan, Korea and Japan that have consumed more than 1
million person hours of content. AXA, ABB, Cisco, Fidelity, IBM, Intel, Microsoft, Oracle,
Société Générale, Siemens, Toyota are some of the clients that leverage TWB’s content back
office to delivery higher content quality, faster, and at lower cost.
iii
InfoTrends Content Automation Research Survey 2016