Pandoc: the deep dive (PDXFunc presentation)

•

1 gostou•1,564 visualizações

Luc Perkins

Exposing the guts of Pandoc, the marvelous Haskell-driven document conversion tool.

Tecnologia

Pandoc:
The Deep Dive
All that is great
stands in the storm

● Universal markup converter == " the swiss
army knife of text markup formats"
● ALL HASKELL
● Example:
pandoc -o myDoc.md myDoc.html
pandoc -f html -t latex hackage.org
pandoc myDoc.txt -o myDoc.pdf
What is Pandoc?

● Reads:
○ Markdown (GitHub, Strict, etc.), HTML, LaTeX,
Textile, reStructuredText, JSON,
● Writes:
○ Markdown, reStructuredText, HTML, Docbook
XML, OpenDocument XML, ODT, RTF, groff
man, MediaWiki markup, GNU Texinfo, LaTeX,
ConTeXt, EPUB, Textile, Emacs org-mode, Slidy,
S5
● Extensions for LaTeX math, tables, etc.
● Note to self: Pandoc in the CLI
What is Pandoc? (pt. 2)

● Performance vis-à-vis scripting languages
● Type safety
● Text.Parsec library
● Hypermuscular list processing (more
about FP more generally than about
Haskell)
Why Haskell?

● One possibility: functions devoted to each
type-to-type combination
○ markdownToHTML
○ HTMLtoEPUB
○ 12^31 possibilities
○ FUCK THAT
● Vastly better possibility?
Reader -->
Neutral Haskell data type -->
Writer -->
Converted document
Possible approaches

● Semi-stateful, non-opinionated REGEX
machine
○ Accumulative — return (x:xs)
○ getParserState
○ modifyState
● Core functions
○ parse
■ parse parser filePath input
■ parse numbers "" "a,b,2,3"
○ many
○ skipMany
○ manyAccum
● type Parser t s = Parsec t s
Text.Parsec

● Neutral data types
○ Pandoc = [Block]
○ Block = [(Inline || Block)]
○ Inline
○ etc.
● Reader
○ Applies parsers to documents
○ Documents are treated as lists
● Writer
○ Converts neutral data type into document
○ Again, documents are just structured lists
Basic flow

● Readers/Markdown.hs
● Writers/HTML.hs
● Pandoc/Builder.hs
Markdown to HTML

● When doing big, complex things with FP,
you're probably going to end up thinking in
terms of lists
● Lists are infinitely flexible
● Hard to escape state entirely
○ ReaderState
○ WriterState
● Don't give up
● Force yourself to give a presentation at
PDXFunc
General lessons

Mais conteúdo relacionado

Mais procurados

FluentDomThomas Weinert

Automata Invasionlucenerevolution

Stripe CTF3 wrap-upStripe

Learning groovy -EU workshopadam1davis

Tips and Tricks for Increased Development EfficiencyOlivier Bourgeois

Jadesiva ram

Restinio (actual aug 2018)Nicolai Grodzitski

TANET 2018 - Insights into the reliability of open-source distributed file sy...Hua Chu

Introduction to Web Development - JavaScriptSadhanaParameswaran

ActiveDocIvan Nečas

In a Nutshell: RancherJeffrey Sica

Atmosphere 2014: Centralized log management based on Logstash and Kibana - ca...PROIDEA

Compress and the other sideYoungChoonTae

Rust system programming languagerobin_sy

Mongodb meetupEytan Daniyalzade

Introduction to Sublime text 2Mahmoud Alqam

Writing Groovy DSLsadam1davis

KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...Yiran Wang

Caffe + H2O - By Cyprien noelSri Ambati

Mais procurados (19)

FluentDom

Automata Invasion

Stripe CTF3 wrap-up

Learning groovy -EU workshop

Tips and Tricks for Increased Development Efficiency

Jade

Restinio (actual aug 2018)

TANET 2018 - Insights into the reliability of open-source distributed file sy...

Introduction to Web Development - JavaScript

ActiveDoc

In a Nutshell: Rancher

Atmosphere 2014: Centralized log management based on Logstash and Kibana - ca...

Compress and the other side

Rust system programming language

Mongodb meetup

Introduction to Sublime text 2

Writing Groovy DSLs

KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...

Caffe + H2O - By Cyprien noel

Semelhante a Pandoc: the deep dive (PDXFunc presentation)

A Multiformat Document Workflow With DocutilsMatthew Leingang

NANO266 - Lecture 9 - Tools of the Modeling TradeUniversity of California, San Diego

Why go ?Mailjet

Grant Rogerson SDEC2015Grant Rogerson

sphinx-i18n — The True StoryRobert Lehmann

Balisage - EXPath - A practical introductionFlorent Georges

ROS distributed architecturePablo Iñigo Blasco

Introduction to MapReduce and HadoopMohamed Elsaka

Language-agnostic data analysis workflows and reproducible researchAndrew Lowe

From XML to eBooks Part 2: The DetailsRichard Hamilton

Programming languagesDmitry Zinoviev

IAS for IBM WebSphere MQ UsersInvenire Aude

The Go features I can't live without, 2nd roundRodolfo Carvalho

Go Is Your Next Language — Sergii ShapovalGlobalLogic Ukraine

Latex workshop: Essentials and PracticesMohamed Alrshah

Fscons scalable appplication transfersDaniel Stenberg

In the DOM, no one will hear you screamMario Heiderich

Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease

LaTeX for beginnersStéphane Péchard

數位出版2.0 itCYJ

Semelhante a Pandoc: the deep dive (PDXFunc presentation) (20)

A Multiformat Document Workflow With Docutils

NANO266 - Lecture 9 - Tools of the Modeling Trade

Why go ?

Grant Rogerson SDEC2015

sphinx-i18n — The True Story

Balisage - EXPath - A practical introduction

ROS distributed architecture

Introduction to MapReduce and Hadoop

Language-agnostic data analysis workflows and reproducible research

From XML to eBooks Part 2: The Details

Programming languages

IAS for IBM WebSphere MQ Users

The Go features I can't live without, 2nd round

Go Is Your Next Language — Sergii Shapoval

Latex workshop: Essentials and Practices

Fscons scalable appplication transfers

In the DOM, no one will hear you scream

Lecture 4: Data-Intensive Computing for Text Analysis (Fall 2011)

LaTeX for beginners

數位出版2.0 it

Último

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Histor y of HAM Radio presentation slidevu2urc

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

A Domino Admins Adventures (Engage 2024)Gabriella Davis

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Pandoc: the deep dive (PDXFunc presentation)

1. Pandoc: The Deep Dive All that is great stands in the storm

2. ● Universal markup converter == " the swiss army knife of text markup formats" ● ALL HASKELL ● Example: pandoc -o myDoc.md myDoc.html pandoc -f html -t latex hackage.org pandoc myDoc.txt -o myDoc.pdf What is Pandoc?

3. ● Reads: ○ Markdown (GitHub, Strict, etc.), HTML, LaTeX, Textile, reStructuredText, JSON, ● Writes: ○ Markdown, reStructuredText, HTML, Docbook XML, OpenDocument XML, ODT, RTF, groff man, MediaWiki markup, GNU Texinfo, LaTeX, ConTeXt, EPUB, Textile, Emacs org-mode, Slidy, S5 ● Extensions for LaTeX math, tables, etc. ● Note to self: Pandoc in the CLI What is Pandoc? (pt. 2)

4. ● Performance vis-à-vis scripting languages ● Type safety ● Text.Parsec library ● Hypermuscular list processing (more about FP more generally than about Haskell) Why Haskell?

5. ● One possibility: functions devoted to each type-to-type combination ○ markdownToHTML ○ HTMLtoEPUB ○ 12^31 possibilities ○ FUCK THAT ● Vastly better possibility? Reader --> Neutral Haskell data type --> Writer --> Converted document Possible approaches

6. ● Semi-stateful, non-opinionated REGEX machine ○ Accumulative — return (x:xs) ○ getParserState ○ modifyState ● Core functions ○ parse ■ parse parser filePath input ■ parse numbers "" "a,b,2,3" ○ many ○ skipMany ○ manyAccum ● type Parser t s = Parsec t s Text.Parsec

7. ● Neutral data types ○ Pandoc = [Block] ○ Block = [(Inline || Block)] ○ Inline ○ etc. ● Reader ○ Applies parsers to documents ○ Documents are treated as lists ● Writer ○ Converts neutral data type into document ○ Again, documents are just structured lists Basic flow

8. ● Readers/Markdown.hs ● Writers/HTML.hs ● Pandoc/Builder.hs Markdown to HTML

9. ● When doing big, complex things with FP, you're probably going to end up thinking in terms of lists ● Lists are infinitely flexible ● Hard to escape state entirely ○ ReaderState ○ WriterState ● Don't give up ● Force yourself to give a presentation at PDXFunc General lessons

Pandoc: the deep dive (PDXFunc presentation)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Pandoc: the deep dive (PDXFunc presentation)

Semelhante a Pandoc: the deep dive (PDXFunc presentation) (20)

Último

Último (20)

Pandoc: the deep dive (PDXFunc presentation)