An update to an earlier presentation that talks about DITA Metrics looking at my experiences while at AMD, and looking at production metrics as well as ROI.
Information Architecture Course Part 2 - Spring 2013 - Class 1
Dita Metrics in Production: How, When, Where, and Why (and How Much) Redux
1. DITA METRICS IN
PRODUCTION:
How, When, Where, and Why
(and How Much?)
Keith Schengili-Roberts, Manager
Documentation & Localization
June 16, 2011
2. BIOGRAPHICAL INFORMATION | a.k.a. Who is this Guy?
Keith Schengili-Roberts
– Manager for AMD’s Documentation and
Localization department for the Professional
Graphics division (formerly ATI); formerly its
Information Architect (migrated team to DITA)
– Award-winning lecturer at University of Toronto’s
Professional Learning Center since 1999,
teaching courses on information architecture and
content management
– Author of four titles on Internet technologies; last
title was Core CSS, 2nd Edition (2001)
– Co-Founder of the Semiconductor DITA
Implementer’s Group (SDIG) that holds monthly
free talks/webinars on using DITA for our domain:
http://tech.groups.yahoo.com/group/semidita
– DITA blog at: www.ditawriter.com
2 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
3. WHAT MY GROUP DOES
Documentation & Localization Group at AMD's Graphics Product Group
(GPG)
– Formerly ATI, based in Markham, Ontario
– 4 writers, 1½ process engineers, 2 localization co-ordinators,
1 manager
– CMS: DITA CMS from Ixiasoft (www.ixiasoft.com)
– Have been using DITA in production for over four years
Responsible for:
– End-user documentation, including online help
– Engineering documentation for ODM/OEM partners
– Technical training documentation for partners
– Localize in up to 25 languages (mostly end-user and UI)
– Primary outputs are PDF and XHTML
3 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
4. DITA METRICS AND R.O.I.
Most typical context in which one hears about metrics in relation to DITA:
– Scenario: documentation team is trying to justify expense of a DITA-
based Content Management System to upper management. The two
key costing arguments are:
Higher re-use rates = more efficient process → faster delivery/lower costs
Separation of form from content = more efficient work process, greatly
reduced localization costs
– If estimated $ R.O.I. > $ CMS and attendant costs, get CMS
4 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
5. PROVING THE PRODUCTIVITY R.O.I. ARGUMENT
How?
– Take metrics before switching over to CMS in order to set a baseline
– Here are comparative numbers from 2007, covering two quarters
immediately pre- and post-CMS implementation:
Demonstrates that using the DITA CMS made us faster, and that we more
than doubled output (docs were comparable)
5 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
6. PROVING THE PRODUCTIVITY R.O.I. ARGUMENT (CONT.)
What’s happened since 2007?
6 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
7. PROVING THE REDUCED LOCALIZATION COST ARGUMENT
Track localization budget and compare to actual amount spent; any
positive difference goes towards R.O.I.
“Bad Old Days” CMS ROI
$
Content audit +
Single-sourcing
Blue line = localization budget for year, Red line = actual localization spend
7 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
8. R.O.I. PROVEN, NOW WHAT?
In our case the Ixiasoft DITA CMS had demonstratively paid for itself in
terms of increased efficiency and reduced localization cost by the end of
the second year
– Any additional savings from that point on is “gravy”
Metrics now take on a related but different purpose:
– Focus is now on relative production and quality using the CMS; or in
other words:
“how do we track what we are doing?”
“how do we do what we do better?”
“what is the ongoing cost of production?”
8 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
9. DITA + CMS = ABILITY TO MEASURE OUTPUTS
DITA CMS has capability to process large amounts of data in its
repository, so managers can answer the following questions that relate to
production in a DITA-based environment:
– What is the average size of a topic?
– How much content was created?
– How much does it cost to create a topic?
– What is the topic reuse rate?
– What DITA tags are most/least used? Are we using them efficiently?
– What is the ratio of topic types used?
– How much of the same document type have we produced year over
year?
9 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
10. WHAT NOT TO MEASURE | DITA Makes these More Problematic
Technical Writing metrics that are of questionable use:
– Hours of days to produce a page
What exactly is a “page” in XML?
Different output formats change outcome (double-spaced output anyone?)
Documents released per writer
Differences in size (Datasheets vs. Databooks)
Documents with high reuse rates vs. low reuse rates
Measure aggregate values rather than individual effort
– Otherwise you run the risk of getting more what you measure
10 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
11. GOOD BACKGROUND MATERIAL ON DOC METRICS
Pre-DITA, good general overviews:
– Documentation Metrics: What Do You Really Want to Measure, by
Donald S. Le Vie Jr. http://www.stc.org/intercom/PDFs/2000/200012_06-
09.pdf
– Practical and Effective Metrics, by Geoffrey J.S. Hart
http://www.stc.org/PDF_Files/myjob/geofferyHart.pdf
DITA –specific:
– What is the Best Metric to Measure the Success of Your Reuse of DITA
Topics? by Bill Hackos
http://www.infomanagementcenter.com/enewsletter/200806/third.htm
– DITA Metrics: Cost Metrics, by Mark A. Lewis
http://dita.xml.org/resource/dita-metrics-cost-metrics (look for PDF link)
– The Illusive, Writing Productivity Metric: Making Unit Cost a Competitive
Advantage, by Mike Eleder (“Best Practices”, Feb 2011)
11 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
12. DITA TOPICS AS “ATOMIC” UNIT OF MEASURE
Each DITA topic can be thought of
an “atomic” unit of production; while
they vary in size, that difference can
be measured and averaged in a
meaningful way
– Different topic types have their
own characteristics; graph shows
average, max, min, and count for
concepts, references and tasks
in our repository
– References average = 2.6 pages
of lorem ipsum text in MS Word;
are roughly twice the size of
concepts and tasks
12 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
13. ONE EXAMPLE: TOPIC PRODUCTION PER QUARTER
Tracked items:
– # of topics created per author
– # of topics modified per author
Results then collated, showing amount of work being done in DITA CMS
Sample search:
Authoring cycle selected (current work),
All document types selected Select a date range for topic
creation, then select a person
13 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
14. TOPIC PRODUCTION PER QUARTER | One View of the
Resulting Data
Collated results show all topic production over a quarter
# of Topics
– Results show ongoing work necessary to meet product release cycle
An effective tool with upper management; shows productivity and
predicted trends based on previous cycles
14 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
15. TOPIC PRODUCTION PER QUARTER | Another View of the
Resulting Data
Can further break this down to show how much groups are contributing:
# of Topics
– Same total, but this shows relative contribution per group
Uses document type/author to group topic outputs
15 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
16. COST-PER-TOPIC | Identifying the cost of production
Idea came from “Best Practices” article by Mike Eleder (“The Illusive,
Writing Productivity Metric: Making Unit Cost a Competitive Advantage”)
Basically:
cost per topic = monthly tech writer team cost
topics produced monthly
Is a unit cost measure, producing monthly average cost for producing
topics
– Can estimate the cost of future work based on result
– Original article suggests using only published topics; I use topics
created + modified (i.e. using the same figures used in previous two
charts) as those numbers are effectively normalized and better reflect
ongoing work
16 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
17. COST-PER-TOPIC | Example chart
$
Initial version doesn’t provide a clear picture; cost varies based on
production figures (dips in August of both years represent busy times:
more topics created in a given timeframe = lower cost per topic)
17 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
18. COST-PER-TOPIC | Example chart with trend line added
$
The trend line records the average cost-per-topic over the time measured
This shows an overall downward trend: Avg $28 per topic in 2009, and $25 per
topic in 2010
18 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
19. COST-PER-TOPIC | Example chart based on published data
$
This graph shows what is published (i.e. what goes out the door) each month;
release cycle is more variable than our production schedule; it does show costs of
publishing content (High = $81.60, Low = $6.61, Avg = $17.79)
Previous topics created/modified charts provide a more accurate result as it
reflects producing topic content as a running cost
19 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
20. COST-PER-LOCALIZED-TOPIC | Example chart
$
This looks at the cost per localized topic; compares total localized topics produced against
actual localization costs invoiced for that month
Invoicing cycles don’t always match our localization cycles, so same caveat applies here as
with published doc
Removing outliers, results are interesting: High = $28.15, Min = $0.02, Avg = $5.25
20 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
21. REUSE RATES | What to Measure?
On the surface, this seems like a simple question to answer: if a topic is
used again, that equals 100% reuse, doesn’t it?
– True, but that’s not the whole picture; topics are often copied (cloned)
and then only a small portion of the new topic is changed (a sentence is
added, a numerical value is changed, etc)
– So to get a true value for reuse, you need to go past the topic level
Localization tools provides us with a usable option: segments
21 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
22. SEGMENTS AS “QUARKS” OF MEASUREMENT FOR REUSE
If topics are the “atomic” unit of production measurement,
then segments are “sub-atomic”
– Topic reuse can take place at the whole topic level, but
more often it happens sub-topic level, as topics are copied and then
modified
– Segmentation is a concept from the localization realm: a segment is a
sentence or (more usually) a portion of a sentence
Defined by segmentation rules, which ensures that you are always comparing
like to like
Localization tools look for changes (the “delta”) at this level
– Data within our CMS enables us to do a look at this
– If you don’t have a CMS, try a localization workbench
22 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
23. SAMPLE OF SEGMENTATION AS APPLIED TO REUSE
% # of
topics
Average monthly topic reuse rate for this time period = 53.53%
23 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
24. OTHER QUESTIONS RELATED TO PRODUCTIVITY
IMPROVEMENTS
Constraints added to DITA 1.2 spec allow you to limit the number of tags
you use
– This is useful, because it helps enforce a “house style” of DITA usage,
where authors can more readily find the tags that they need to use
– But how do you usefully determine what you should and should not
keep?
If you have an existing base of content, examine how often tags are used and
then set a threshold below which tags are excluded
24 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
25. WORDCLOUD SHOWING RELATIVE TAG USAGE
Produced by doing a search on the frequency of use of each DITA tag (e.g. #
of times <note> is used in all topics) in our repository, and then using a
WordCloud app (wordle.com) to produce the results
In this case it is not the “big” words in which we are interested, but the
smallest, which ought to be excluded from ongoing use
25 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
26. EXAMINING THE OUTLIERS
In this case we are looking at the
tags that are the least used
– These are candidates for
exclusion in the constraint file
– This type of search does not
reveal tags with a count = 0;
these would need to be
identified and excluded as well
Note that at 73 tags total, we are
using well under half all DITA
tags available
Based on final results, an
effective DITA 1.2 constraint can
be created
26 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
27. TRACKING TOPIC TYPE USAGE | Tracking History
Year-over-year comparisons of topic types used provides an idea as to
the composition of a “typical” DITA-based document
– While I expected a high percentage of reference topics, I wondered
whether there were more topics that ought to be tasks which were
instead done as references
27 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
28. TRACKING TOPIC TYPE USAGE | Making Change
Asked writers to be more diligent
about writing task topics where
they might be temped to write
them as references instead
– Result was a measurable
increase in the percentage of
task topics created over the
course of the year
– A larger number of our
documents now include a
procedural, step-by-step
approach to the subject, which
is beneficial to the end-user
28 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
29. COMPARING LIKE PRODUCTION YEAR OVER YEAR
Comparing the number of datasheets vs. databooks not useful, but
comparing how many of those are made over several years is
– Provides insight into relative production efficiency
# of docs
produced
per year
29 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
30. LOOKING MORE CLOSELY AT YEARLY PRODUCTION
This was created by outputting a list of all maps published within the year,
and then manually sorting them by type of publication
30 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
31. BREAKING PRODUCTION NUMBERS DOWN FURTHER
31 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
32. PRODUCTION AND LOCALIZATION NUMBERS
Following was produced by outputting a list of all non-English maps
created within the year, and then sorting (and counting) by language
32 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
33. COMPARING ENGLISH TO LOCALIZED PUBLISHED MAPS
A year-over-year comparison of published map output
33 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
34. OTHER POSSIBILITIES | Looking at Quality
Another avenue we are looking at is automated quality metrics
– Are actively investigating the Flesch Reading Ease metric
It follows a simple formula, which could be applied to the content of any topic
We do a lot of engineering documentation, so low scores (indicating words
using many syllables aimed at those with more than a basic education) is
expected
However, topics containing basic information that score especially high (such
as most concepts) may serve as a guide to the writer to re-examine a topic to
make it easier to read
34 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
35. OTHER POSSIBILITIES | Localization
There are further gains to be had on the localization side:
Ixiasoft (DITA CMS) and XML Int’l (XTM Suite) have integrated the CMS
with a translation memory (TM) suite that will allow us to use TM within
CMS
– Currently use segmentation matching, producing 100% matches that still need
to be verified (i.e. 100% matches which still need to be verified); with integrated
TM, only new content will be sent to localization vendor.
35 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
36. SLIGHT TANGENT #1: CREATING A DITA STYLE GUIDE
A recommendation for any tech docs group that uses DITA extensively:
– Helps new writers/contributors come up to speed
– Usefully narrows the scope of the XSL work that needs to be done
– Many things are “legal” in DITA but may be poor from a “house style”
standpoint, for example:
Can have unformatted block content between a header and a table in a
section
Tables and figures do not have to have a title
Can have unlimited nested lists
Alpha lists can contain more than 26 items
Lists can contain only a single item
36 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
37. SCHEMATRON CAN HELP ENFORCE DITA STYLE
What is Schematron? “Schematron is a rule-based validation language for
making assertions about the presence or absence of patterns in XML
trees.” (Wikipedia)
We use Schematron to point out to the writers potential errors/lapses in
our DITA House Style:
Text between a section and table not wrapped in block tags:
A list ought to have more than one item:
37 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
38. XSL CAN ALSO HELP ENFORCE DITA HOUSE STYLE
We have a DITA house style that says nested lists should be no more
than two levels deep.
Here’s Schematron doing it’s job:
And here is the result if you try to output it:
38 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
39. SLIGHT TANGENT #2:
AUTOMATION EFFORTS
General rule: if a process can be
automated, do so.
We have a dedicated programmer who
builds tools and automation processes for
our group.
Sample tools built include:
“Table-izer”, app that creates DITA XML
table from clipboard
Automated T.O.C.
DITA Topic Editor, a tag-less editor
aimed at SMEs
Automated index based from a list of
keywords
Change-bar mechanism that diffs output
Dynamic trademark attribution
39 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public
40. QUESTIONS AND ANSWERS
40 | DITA Metrics in Production: How, When, Where and Why (and How Much)? | June 16, 2011 | Public