O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Markup For Dummies (Russ Ward)

36 visualizações

Publicada em

By now, you have heard how important structured content is. But, maybe you poked around with something like DITA and were baffled by the complexity. Or, maybe you still aren’t sure what XSLT stands for. This workshop will take participants back to the basics, to provide a foundation for higher-level concepts that have taken hold of our industry. Topics will include:

- What XML looks like, what it does, and how to create it.
- How to define a structure model, including whether to use a - DTD, Schema, etc.
- What XSLT looks like, what it does, and how to make it work.
- What DITA and DocBook really are and whether one is right for you.

Russell Ward is an experienced technical writer and structured technologies developer. He has spent many years working with structured content to maximize efficiency in the techcomm environment, both as an employee and as an independent consultant. He is also an experienced trainer and speaks periodically at conferences and other peer events.

Publicada em: Educação
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Markup For Dummies (Russ Ward)

  1. 1. Markup for Dummies Workshop Russell Ward STC-PMC 2018 CONDUIT Conference – Willow Grove, PA
  2. 2. Speaker contact information Russ Ward  Senior Technical Writer at Spirent Communications in Frederick, MD.  Owner of West Street Consulting, a part-time enterprise specializing in Structured FrameMaker plugins and custom development. 5280 Corporate Drive, Suite A100 Frederick, MD 21703 301.444.2489 russ.ward@spirent.com www.spirent.com 357 W. North St. Carlisle, PA 17013 717.240.2989 russ@weststreetconsulting.com www.weststreetconsulting.com
  3. 3. Workshop purpose Overall purpose  To give you a functional knowledge of markup at a basic level, so you know where to get started. Things we will cover in some depth  What markup really means (and clarify some of the jargon).  The essential structure of XML  How to make rules about the structure of XML (DTD, Schema, etc.)  How to do anything interesting with XML once you have it (XSLT, DOM, etc.) Things we will cover with less depth  Common applications for XML in the technical communications space, such as DITA, DocBook, and other standards.  Other types of markup.
  4. 4. Disclaimers  You cannot go from a dummy to an expert in an afternoon! If you really want to make markup work for you, plan for a dedicated pursuit of knowledge which could last your entire career.  The concepts of markup are as big as the universe of technology. We will attempt to focus on areas of interest within the technical communications field.  If you want expertise with markup, you have to want it.
  5. 5. XML/markup myths (according to Russ Ward)  I’m a technical writer, so I’m concerned about words, not XML.  Markup is easy… it’s just a bunch of tags.  I only care about markup if I want to use a CMS / reuse content / (insert specific reason here).  All I have to do is convert to XML and the magic starts to happen.  I don’t have the time to study this stuff / convert my content / (insert task here)  I’m a lone writer, so it’s not worth my time to fuss with markup.  XML authoring = DITA.
  6. 6. XML!  eXensible Markup Language… one of many ways to mark up content in text format. By far, it is the most widely-used format in technical communication and the greater IT universe. Therefore, it is the primary subject of this workshop.  XML markup follows a hierarchical tree format, similar to the inherent structure of a written document. Therefore, not only does XML provide technical advantages, the words of a document naturally fit into it.  By itself, XML is just a text file that does nothing! The magic happens when you use XML-aware tools to read the markup and do cool stuff.  XML must be well-formed and normally should be validated.  Several validation formats are available, with DTD and Schema as the most popular.
  7. 7. DTD vs. Schema  Two competing methods to define a structure and validate any compliant document.  DTD: • Older and originated with SGML. • Uses a unique syntax. • Still works just fine; that is, remains supported by any mainstream XML tool.  Schema: • Newer, applicable to XML only. • Uses an XML syntax which makes it easier for a computer to read, but also more difficult for a human to read. • More features than DTD.
  8. 8. DTD vs. Schema (cont’d)  The choice should be made based on what works for you and the tools you intend to use. Here are some reasons you might choose Schema*: • It is easier to describe allowable document content • It is easier to validate the correctness of data • It is easier to define data facets (restrictions on data) • It is easier to define data patterns (data formats) • It is easier to convert data between different data types *(taken from https://www.w3schools.com/xml/schema_intro.asp)
  9. 9. RELAX NG  Another method to define a data structure, less common than DTD or Schema but is used.  Stands for REgular LAnguage for XML Next Generation.  Can be written in XML like Schema or an alternative compact syntax.  Simple example: • XML document: <book> <page>This is page one.</page> <page>This is page two.</page> </book> • Corresponding RELAX NG schema in XML format: <element name="book" xmlns="http://relaxng.org/ns/structure/1.0"> <oneOrMore> <element name="page"> <text/> </element> </oneOrMore> </element> • Compact syntax: element book { element page { text }+ } *(Some info taken from https://en.wikipedia.org/wiki/RELAX_NG)
  10. 10. Element vs. attribute markup Data can be stored within element tags or attribute values. Why choose one or the other? Here are some reasons why attribute data is more limited*: • Attributes cannot contain multiple values (child elements can). • Attributes are not easily expandable (for future changes). • Attributes cannot describe structures (child elements can). • Attributes are more difficult to manipulate by program code. • Attribute values are not easy to test against a DTD. *(taken from https://www.w3schools.com/xml/xml_dtd_el_vs_attr.asp) In technical communication, the most common use of attributes is to store formatting, filtering, and reuse information. The body of elements is reserved for the literary content.
  11. 11. So now we have markup. What to do with it? Markup is not just for fun, although some people think it is fun. Markup should serve some useful purpose, like: • Facilitate content reuse • Direct automated formatting processes • Enhance options for content storage and portability ABOVE ALL ELSE, REMEMBER THIS: Markup provides a roadmap for your content that a computer can read. That is, it makes your content look like data. Once you make your content easily processable by a computer algorithm, the computer can do more work for you. In other words, it can automate things like: • Repetitious busywork • The movement of content for any nature of enhanced reuse or publishing process The more busywork that the computer does, the more reliable the results. Furthermore, you have more time to WRITE THE CONTENT THAT YOUR AUDIENCE NEEDS.
  12. 12. A brief intro to XSLT and publishing concepts  XML is not useful by itself! Nobody wants to read an XML file. Therefore, some type of publishing process is necessary.  A publishing process should: • Use the markup as the fundamental guide to generate output. • Be as automated as possible. • Have a close relationship with the original design of the structure definition, typically having been developed in parallel.  Countless publishing processes exist in the world… some simple, some complex… some based on OTS tools and others completely custom… there is no right answer for every situation, although some consultants and vendors will say otherwise!  Many publishing processes, OTS and custom, use XSLT as the foundation for creating something new from an XML source. “Something new” might include common human-readable formats such as HTML (in a browser) and PDF (in a reader).
  13. 13. What is XSLT?  Stands for eXstensible Stylesheet Language Transformation  It is a mature standard for converting XML to some other text format, such as HTML, CSV, other XML, or anything text.  Well-supported in the IT community through forums, tools, tutorials, etc.
  14. 14. Components required to make XSLT happen  An XML file with the data to transform  An XSLT stylesheet with the instructions for the transformation  An engine that applies the stylesheet to the XML file; that is, does all the work  Some mechanism to capture the output XML file Engine Style- sheet Output
  15. 15. Key concepts about the XSLT processing flow  By default (using an empty stylesheet), the output of a transform is all the text node data of the original XML file. Effectively, the process starts at the root element and automatically walks through every branch of the tree.  When you want something specific to happen, your stylesheet must effectively put up a “red light” to stop this flow at some node. Once you stop the flow, you can start to customize the output however you want.  The key element to stop the automatic flow is <xsl:template>. All instructions for a customized output live in one of these elements.  To resume the flow (if desired), the<xsl:apply-templates> element is effectively a green light.
  16. 16. About processing engines Many different processing engines are available. Some are free and some are not. All are designed for operation within some particular context. For example: • XML editors – Any worthy XML editor will include XSLT processing. In this context, it is often used for stylesheet design and testing. The output is typically rendered in some window within the editor interface. To learn more, Wikipedia has a decent article on XML editor comparisons. • Programming and scripting languages – All mainstream languages have some nature of built-in libraries for XSLT. When invoking XSLT with a language, it typically means that you have your stylesheet ready and you are looking to automate the process, for whatever reason. • Web browsers – All major browsers can do XSLT. For a browser, the output is normally the browser window. Therefore, the typical use case for XSLT in a browser is to dynamically transform some kind of XML content into a browser-ready format (HTML).
  17. 17. Off-the-shelf XML standards  DITA • Stands for Darwin Information Typing Architecture. • The newest and most mainstream structure standard. • A downloadable package includes all DTDs to write your XML and a full toolkit for publishing a variety of formats. • Very technically complex, both in the structure definition and the publishing components. • Has a strong emphasis on topic-level authoring. • Implements a clever mechanism that allows customization of DTDs but still allowing any DITA-compliant tool to render the content, at least basically. • Is an OASIS standard and has an organized committee that maintains it.
  18. 18. Off-the-shelf XML standards (cont’d)  DocBook • Older than DITA, less popular but still used. • Does not have a toolkit as advanced as DITA, but stylesheets and other components for publishing are available. • Maintained by Norman Walsh and a DocBook Project development team. • Traditionally focused on full-document authoring and print publishing, although supporters are quick to note that it is not limited to this methodology. • Sample file: <?xml version="1.0" encoding="UTF-8"?> <book xml:id="simple_book" xmlns="http://docbook.org/ns/docbook" version="5.0"> <title>Very simple book</title> <chapter xml:id="chapter_1"> <title>Chapter 1</title> <para>Hello world!</para> <para>I hope that your day is proceeding <emphasis>splendidly</emphasis>!</para> </chapter> <chapter xml:id="chapter_2"> <title>Chapter 2</title> <para>Hello again, world!</para> </chapter> </book>
  19. 19. Just for fun, another type of markup - AsciiDoc A super-simplified markup language designed to get readable words on the page, as quickly as possible. Sample from https://en.wikipedia.org/wiki/AsciiDoc:
  20. 20. Email from the IRS – A good reason to know XML Dear Free File Taxpayer: The IRS has rejected your federal return. This means that your return has not been filed. . . . Here's the reason for the rejection: Issue : Business Rule X0000-005 - The XML data has failed schema validation. cvc-complex- type.2.4.b. The content of element 'EgyPropCrMainHomeUSAddress' is not complete. One of '{"http://www.irs.gov/efile":ZIPCd}' is expected. The following information may help you determine the form at issue: Field/Xpath: /efile:Return[1]/efile:ReturnData[1]/efile:IRS5695[1]/efile:NonBusinessEgyEffcntPropCrGr p[1]/efile:EgyPropCrMainHomeUSAddress[1]/efile:StateAbbreviationCd[1] If you are unable to fix the issue, you will have to print the return and file by mail.