302 sargent word2007-ssp2008

Math Editing and Display
in Word 2007

Murray Sargent III
Publisher Text Services
28-may-2008

Overview
 8 math infrastructures enable better math
display/editing
 New Office math edit/display environment
 Interoperate with math programs such as
Mathematica, Maple, publisher workflow
 Input methods and formats
 Layout
 Math font

Complex Project
 Intricacies of math typesetting
 Creating and using a large set of glyph variants
 Vagaries of math notation
 Embedding math zones into international text
environments
 Interaction with complex scripts
 Math in other objects like hyperlinks, ruby
 Input with nonASCII keyboards

Eight Math Infrastructures
 [La]TeX: current tech-doc standards
 Unicode 5.0: includes ~2000 math symbols
 MathML 2.0: math K – 12 and beyond
 OpenType font technology: special math tables
 New math font (Cambria Math)
 Math layout handler
 Shared math input components
 MS Office environment, autocorrect

[La]TeX
 Widely used, high-quality tech document
preparation language
 Simple ASCII keyboard entry
 Usage and math typography are very well
documented
 Stable since 1990
 Complex scenarios are hard to edit
 Numerous dialects, user macros, and lack of
Unicode complicate interchange
 Fonts aren’t well suited to screen display

Unicode 5.0
 340 math chars exist in ASCII, U+2200 block,
arrows, combining marks
 1016 math alphanumeric characters are in
Unicode Plane 1 or Letterlike Symbols
 591 new math symbols and operators are on
BMP
 One math variant selector
 One new combining character (reverse solidus)
 New math characters were requested by STIX

Basic Set of Alphanumeric
Characters
 Latin digits (0 - 9)
 Upper- & lowercase Latin letters (a - z, A - Z)
 Uppercase Greek letters Α - Ω plus the nabla ∇
and a variant of theta Θ
 Lowercase Greek letters α - ω plus the partial
differential sign ∂ and glyph variants of ε, θ, κ, φ,
ρ, and π
 Only unaccented forms of letters are used

Legibility Loss
Without math alphabetics, the Hamiltonian formula

H = ∫dτ [εE2 + μH2]

becomes an integral equation

H = ∫dτ [εE2 + μH2]

Math Alphanumeric Characters
• Math needs various Latin and Greek styles like
normal, bold, italic, script, Fraktur, and open-face
• May appear to be font variations, but have distinct
semantics and spacings
• Without these distinctions, you get gibberish, violating
Unicode rule: plain text must contain enough info to
permit text to be rendered legibly, and nothing more
• Plain-text searches should distinguish between
alphabets, e.g., a search for script H shouldn’t match
H, etc.

MathML
 MathML 1.0 (April, 1998) was the first World
Wide Web Consortium (W3C) endorsed XML
vocabulary
 Low-level format for describing mathematics as
a basis for machine to machine communication
 MathML facilitates the use and re-use of
scientific content on the Web
 MathML 2.0 released in late 2003 is now widely
used in exchanging mathematical text
 MathML 2.0 spec has a wealth of math info

MathML Presentation Markup
 Presentation markup directs how the math
should be rendered.

<mrow>
<mi>E</mi>
<mo>=</mo>
<mrow>
<mi>m</mi>
<mo>⁢</mo>
<msup>
E = mc2
<mi>c</mi>
<mn>2</mn>
</msup>
</mrow>
</mrow>

Office MathML (OMML)
<m:oMath>
<m:r><m:t>E=m</m:t></m:r>
<m:sSup>
<m:e>
<m:r><m:t>c</m:t></m:r>
</m:e> E = mc2
<m:sup>
<m:r><m:t>2</m:t></m:r>
</m:sup>
</m:sSup>
</m:oMath>

MathML with Custom XML
 Can put arbitrary namespace attributes in
MathML tags
 More complicated embellishments can use
<semantics>
MathML representation
<annotation-XML>
Enhancements
</annotation-XML>
</semantics>

MathML Parsing
MathML can be tricky to parse. For sin x:
<mrow>
<mi>sin</mi>
<mo>&FunctionApply;</mo>
<mi>x</mi>
</mrow>
Don’t know it’s a function-apply object until
reaching &FunctionApply: have to analyze
expressions as with the linear format

Linear Format

E=mc^2

E = mc2

Math RTF
 Math RTF is OMML in RTF syntax
 Somewhat simplified (doesn’t need text tag)
 For example,
<m:f> ... </m:f> → {mf ... }
 Thoroughly defined in latest RTF spec
 Reading spec is great way to learn how Word
represents math

Accented characters
 Accents are handled by math accent
object
 Accents may apply to multiple characters
 Accents may be flattened

Vagaries of Math Notation
 Choice of subscript/superscript base
 Function arguments like
 Integrands and n-aryands
 Absolute value ambiguities like ||a|-|b||.
Actually this example is unambiguous, but
|a|b - c|d| has two possible meanings
 Context sensitive ellipses: … vs ⋯

Math Spacing
 Operators have math spacing given by extended
TeX spacing rules
 Function object gives correct spacing between
object and neighbors, and between function
name and argument
 n-aryand object gives correct spacing between
n-ary operator and its n-aryand
 Automate much need for TeX spacing “tweaks”
 Context-dependent operator spacing like + - . , :

Font Sizing
 Text style, script style (70%), script script
style (60%)
 Sub/sups…, fractions in line
 Cramped

Confusables
 1 vs ll
1 vs
𝑎𝑎vs �
vs
 � vs � vs �
vs vs


 𝒳 vs �
𝒳 vs
 Y vs Υ
Y vs Υ
Other letter similarities are so close that they
Other letter similarities are so close that they
are avoided, e.g., UC alpha and LC omicron
are avoided, e.g., UC alpha and LC omicron
are never used.
are never used.

Math Input Methods
 Linear format input and manual buildup
 Formula autobuildup (FAB)
 Math ribbons
 Recognition of handwritten formulae
 Hex code input
 WYSIWYG editing
 Hybrid editing (combination of WYSIWYG
and FAB)

Hex to Unicode Input Method
 Type Unicode character hexadecimal code
 Make corrections as need be
 Type Alt+x to convert to character
 Type Alt+x to convert back to hex (useful
especially for “missing glyph” character)
 Resolve ambiguities by selection
 Input higher-plane chars using 5 or 6-digit code
 MS Word and RichEdit standard

Autocorrect Examples
 Type delta and get δ, Delta and get Δ
 Define quadratic to be
x = (-b ± √(b^2 - 4ac))/2a
 Then typing quadratic<space> inserts:

Math Alphabetics
 scriptA, frakturA, doubleA, etc., are used to
insert math script, Fraktur, and double-struck
alphabetics
 Italic and bold are controlled by italic & bold
format tools and only apply to math alphabetics
 Italic and/or bold is ignored for characters that
don’t have corresponding Unicode

Linear format math
• Simple operand is a span of alphanumeric
characters
• E.g., simple numerator or denominator is
terminated by any nonalphanumeric
character
abc
• abc/d gives d
• More complicated operands use parentheses
( ), brackets [ ], or { }
• Outermost parens in fractions aren’t
displayed in built-up form

Linear format math (cont)
E.g., plain text (a + c)/d displays as
• Easier to read than TEX’s, e.g., {a + cover d}
• MathML: <mfrac><mrow><mi>a</mi><mo>+</mo>
<mi>c</mi></mrow><mrow><mi>d</mi>
</mrow></mfrac>
• Neat feature: linear-format text looks like math

Subscripts and Superscripts
 Unicode has numeric subscripts and
superscripts along with some operators
(U+2070-U+208E): convert to regular
 Others need some kind of markup like <msup>…
</msup>
 Use TeX’s _ and ^ subscript/superscript ops for
input; they can be displayed as a subscripted
down arrow and superscripted up arrow
 Use parentheses as for fractions to overrule
built-in precedence order

Formula Autobuildup
 Enter formulas in linear format in a math zone
 When a character is typed that renders an
expression syntactically unambiguous, the
expression is built up
 Edit expressions in built-up form or in linear form
 For integrals, type int (which autocorrects to ∫ )
optionally followed by subscript and superscript
for limits, which auto build up
 Can autocorrect <letters> to built-up characters
or expressions

Roles of Space (U+0020)
 The ASCII space is rarely needed inside math
expressions, since math spacing is automatic
 Use to terminate autocorrect entries and to
terminate expressions. When so used, is deleted
 Use as command to build up math objects
 Use to define spacings for , . and : and to force a
unary operator to display with binary spacing
 A space builds up one subexpression; other
operators build up as many as they can

Unicode Spaces
Space Unicode Autocorrect
0 em U+200B zwsp
1/18 em U+200A hairsp
3/18 em U+2009 thinsp
4/18 em U+205F medsp
5/18 em U+2005 thicksp
6/18 em U+2004 vthicksp
9/18 em U+2002 ensp
18/18 em U+2003 emsp
(digit width) U+2007 numsp
(space width) U+00A0 nbsp

Operators
Operator Precedence
CR 0
opOpen 1
opClose 2
opSeparator 3
concatenation 4
/ atop 5
opNary 6
_ ^ opFApply above below 7
□ ∛ ∜ ■ opHbracket 8
opAccent 9
opUniSubSup 10

Four Math Invisibles
There are four “invisible” math control codes

Math control code Unicode
Invisible Function Apply U+2061
Invisible Times U+2062
Invisible Comma U+2063
Invisible Plus U+2064

Used for semantic content and usually don’t
display a glyph. May have a small width, e.g.,
Function Apply has thinsp

Math Layout
Collaboration between 5 entities:
 Unicode rich-text text processing program
such as Word or RichEdit
 LineServices math handler
 Page/TableServices math handler
 Math font, e.g., Cambria Math
 Math-font handler

Equation Breaking & Numbering
 PTS math handler can break equations into
multiple lines automatically or by user breaks
 PTS can handle layout of equation numbers
 Client needs to support “math paragraph”
 Two kinds of user breaks: at operator via context
menu, at line break (Shift+Enter)
 At operator indentation: each TAB indents to
next binary/relational operator
 Line break: align at specific operators, e.g., =

Glyph Variants
 Subscripts/superscripts
 Primes
 Dotless i, j used in bases of accent objects
 Flattened and wide accents
 Growable brackets, integrals, arrows
 Display of differentials using U+2146
 Mirror images for right-to-left math
 Variation selector U+FE00

Cambria Math Font
 Cambria typeface designed by Jelle Bosma
 Extended for math by Ross Mills and Andrei
Burago in collaboration with the ClearType and
math-layout groups
 Contains extensive math tables, glyph variants
and much of the Unicode math set
 Is designed with ClearType and excellent screen
readibility in mind
 Enables best screen-resolution display of math

New Math Fonts
 Cambria Math has new version with more math
characters, e.g., U+2900..U+2AFF
 202 math characters still needed for Unicode 5.1
 STIX Times Roman math font is in beta; doesn’t
support Word 2007 math well
 STIX has full math character set + some
 STIX font is Type I, so it doesn’t work with the
Office pdf writer
 Font demos

Font Math Tables
 Specialized math tables have been created to
control glyph placements
 Position subscripts/superscripts horizontally
using cut-ins and italic corrections
 Many math constants: axis height, fraction rule
thickness, etc.
 Compare kerning of
 The math tables are formalized as OpenType
tables accessible via mathfont.dll

User Spacing Adjustments
 Layout engine attempts to render with high
typographic quality
 Users can spoil layout by inserting space where
engine would insert it automatically
 Have autocorrect procedure to reduce this
 Users can insert Unicode spaces
 Phantoms and smashes
 Size and placement overrides

Phantoms and Smashes
 Phantoms have size but no display. Can
have both width & height, ascent only,
descent only
 Smashes display, but remove one or more
sizes, e.g., descent, ascent, and/or width

Word 2007 Math Facility
 Elegant math entry and display
 Display is competitive with TeX
 Automatic line breaking, special kerning
 More math semantics than TeX: greater
interoperability (Presentation MathML)
 Input with math ribbon, context menus
 Formula autobuildup input method
 WYSIWYG editing as well as linear format
 MS Math graphing calculator add-in

What Word 2007 doesn’t have
 Built-in equation numbering
 Math Find/Replace
 OpenType enhancements (aside from math
table functionality)
 Optimal line breaking
 Configurable math-zone vertical spacing
 [La]TeX import/export
 Document wide MathML support (only MathML
for a single math zone)

Conclusions
 Eight infrastructures allow us to do math display and
editing better than ever before
 High quality math handler and font enable typography
competitive with or better than TeX
 Best screen-resolution display of mathematics
 Streamlined input methods such as Formula Autobuildup
 Incorporated into Word 2007, Word down-level
converter, Microsoft Math calculator
 Cambria Math font: state-of-art math font

302 sargent word2007-ssp2008

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a 302 sargent word2007-ssp2008

Semelhante a 302 sargent word2007-ssp2008 (20)

Mais de Society for Scholarly Publishing

Mais de Society for Scholarly Publishing (20)

302 sargent word2007-ssp2008

Notas do Editor