SlideShare uma empresa Scribd logo
1 de 69
Baixar para ler offline
Apache POI
           Recipes
           Paolo Mottadelli - ApacheCon Oakland 2009




  http://chromasia.com
Thursday, November 5, 2009
paolo@apache.org



   my to-do list




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI @ Content Tech
      ✴ Document to application (and back)
               ✴ Publish data

               ✴ Build a doc from your content

      ✴ Know your documents
               ✴ Extract text

               ✴ Extract content



                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             1
                             A-B-C
paolo@apache.org




   POI modules (1): OLE2
      ✴ POIFS: reading/writing Office
               Documents
      ✴ HSSF r/w Excel Spreadsheets
      ✴ HWPF r/w Word Docs
      ✴ HSLF r/w PowerPoint Docs
      ✴ HPSF r/w property sets

                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI modules (2): OOXML
      ✴ XSSF: r/w OXML Excel
      ✴ XWPF: r/w OXML Word
      ✴ XSLF: r/w OXML PowerPoint




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
POI 3.5
  http://chromasia.com
Thursday, November 5, 2009
paolo@apache.org




   OOXML dev status
      ✴ XSSF: Final in POI-3.5
      ✴ XWPF: Draft (basic features)
      ✴ XSLF: Not covered (only text ext.)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   HSSF & XSSF
      ✴ Common user model interface
      ✴ User model based on existing HSSF
      ✴ Using OpenXML4J and SAX




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             2
                             Same recipe,
                             different flavours
paolo@apache.org




   Common H/XSSF access
      ✴ org.apache.poi.ss.usermodel




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Upgrading to POI-3.5
      ✴ HSSFFormulaEvaluator.CellValue
               ✴ convert from .hssf. to .ss.

      ✴ HSSFRow.MissingCellPolicy
               ✴ convert from .hssf. to .ss.

      ✴ RecordFormatException in DDF
               ✴ convert from .hssf. to .util.                           Dreadful Drawing
                                                                             Format


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             3
                             Meet
                             Office Open XML
paolo@apache.org



                                               made (very) simple
   Open XML
      ✴ XML based
               ✴ WordprocessingML

               ✴ SpreadsheetML

               ✴ PresentationML

      ✴ Stored as a package
               ✴ Open Packaging Conventions



                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Package concepts
      ✴ Package (the container)
      ✴ Part (xml file)
      ✴ Relationship
               ✴ package-relationship

               ✴ part-relationship




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Expanded package, Excel




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   WordprocessingML
      ✴ body
               ✴ paragraphs
                      ✴ runs


      ✴ properties (for runs and pars)
      ✴ styles
      ✴ headers/footers ...

                               - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   SpreadsheetML
      ✴ workbook
               ✴ worksheets
                      ✴ rows

                             ✴ cells



      ✴ styles
      ✴ formulas
      ✴ images ...
                                       - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   PresentationML
      ✴ presentation
               ✴ slides

               ✴ slides-masters

               ✴ notes-masters

      ✴ layout, animation, audio, video,
               transitions ...

                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             4
                             openxml4j
paolo@apache.org




   openXML4J
      ✴ Package, parts, rels


                                                                          "/xl/worksheets/sheet1.xml"




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             5
                             Text Extraction
paolo@apache.org




   Extractors
      ✴ POITextExtractor
               ✴ POIOLE2TextExtractor
                                                                    getT xt()
                                                                        e
               ✴ POIXMLTextExtractor
                      ✴ XSSFExcelExtractor

                      ✴ XWPFWordExtractor

                      ✴ XSLFPowerPointExtractor


      ✴ If text is all what you need

                              - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Text extraction
      ✴ made simple




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             6
                             EXCEL
                             Simple Tasks
paolo@apache.org




   New Workbook




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   New Sheet




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Creating Cells




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Cell types




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Fills and colors




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             7
                             EXCEL
                             Imp/Exp to XML
paolo@apache.org




   Export to XML




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   xmlMaps.xml




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   XML Import/Export




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             8
                             WORD
                             Simple Doc
paolo@apache.org




   A simple doc




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             9
                             Use Case 1
                             Alfresco Search
paolo@apache.org




   Use Case
      ✴ Upload a document
      ✴ Detect document mimetype
      ✴ Extract text and metadata
      ✴ Create search index
      ✴ Search (and find) the document


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Without Tika
   ✴ Detect the document mimetype
               ✴ (source/target mimetype)

      ✴ Get the proper ContentTransformer
               ✴ (ContentTransformerRegistry)

      ✴ Tranform Doc Content to Text
               ✴ (PoiHssfContentTransformer) I here
                                          PO
      ✴ Create Lucene index
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   With Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Extension use case
      ✴ Adding support for Office Open
               XML documents (Office 2007+)
               ✴ Word 2007+

               ✴ Excel 2007+

               ✴ PowerPoint 2007+




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   POI text extractors
      ✴ Remember?




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Excel)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Apache Tika (Word)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             10
                             Use Case 2
                             JM Lafferty
                             Financial Forecasting
paolo@apache.org




   Make your wb look pro-
      ✴ Rich text
      ✴ Graphics
      ✴ Formulas & Named Ranges
      ✴ Data validations
      ✴ Conditional formatting
      ✴ Cell comments
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation
      ✴ The evaluation engine enables you
               to calculate formula results from
               within a POI application
      ✴ Formulas may be added to your
               workbook by POI
      ✴ Evaluation is available for .xls
               and .xlsx
                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation (continued)
      ✴ All arithmetic operators are
               implemented
      ✴ Over 280 Excel built in functions
               are supported




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Formula evaluation (code)




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
                             11
                             Use Case 3:
                             CQ5 Import
Thursday, November 5, 2009
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   getParagraphs(...)
      ✴ Makes use of
               ✴ org.apache.poi.hwpf.usermodel.Range




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   getTitle(...)
      ✴ Gets the first paragraph’s text




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   importDocument()




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
Thursday, November 5, 2009
                             12
                             Want more?
paolo@apache.org




   More Examples
      ✴ http://poi.apache.org/spreadsheet/examples.html




                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




   Even more
      ✴ Get in touch
               ✴ http://poi.apache.org/

      ✴ Get informed
               ✴ dev@poi.apache.org

      ✴ Get involved
               ✴ http://svn.apache.org/repos/asf/poi/trunk/


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009
paolo@apache.org




      ✴ Get slides
               ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes




   Thanks


                             - ApacheCon US 2009, Oakland - Apache POI Recipes -
Thursday, November 5, 2009

Mais conteúdo relacionado

Destaque

Intro to Apache Spark - Lab
Intro to Apache Spark - LabIntro to Apache Spark - Lab
Intro to Apache Spark - LabMammoth Data
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesSpark Controles
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache TikaPaolo Mottadelli
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learninghoondong kim
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMeeraj Kunnumpurath
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 

Destaque (20)

Intro to Apache Spark - Lab
Intro to Apache Spark - LabIntro to Apache Spark - Lab
Intro to Apache Spark - Lab
 
Catalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark ControlesCatalogo Planet Network da Spark Controles
Catalogo Planet Network da Spark Controles
 
Introdução ao desenvolvimento de jogos com unity3d
Introdução ao desenvolvimento de jogos com unity3dIntrodução ao desenvolvimento de jogos com unity3d
Introdução ao desenvolvimento de jogos com unity3d
 
Content Analysis with Apache Tika
Content Analysis with Apache TikaContent Analysis with Apache Tika
Content Analysis with Apache Tika
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Spark machine learning & deep learning
Spark machine learning & deep learningSpark machine learning & deep learning
Spark machine learning & deep learning
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

Mais de Paolo Mottadelli

Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Paolo Mottadelli
 
Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Paolo Mottadelli
 
Evolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkEvolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkPaolo Mottadelli
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkPaolo Mottadelli
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisPaolo Mottadelli
 
Jira as a Project Management Tool
Jira as a Project Management ToolJira as a Project Management Tool
Jira as a Project Management ToolPaolo Mottadelli
 
Interoperability at Apache Software Foundation
Interoperability at Apache Software FoundationInteroperability at Apache Software Foundation
Interoperability at Apache Software FoundationPaolo Mottadelli
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaPaolo Mottadelli
 

Mais de Paolo Mottadelli (11)

Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014Open Architecture in the Adobe Marketing Cloud - Summit 2014
Open Architecture in the Adobe Marketing Cloud - Summit 2014
 
Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014Integrating with Adobe Marketing Cloud - Summit 2014
Integrating with Adobe Marketing Cloud - Summit 2014
 
Evolve13 cq-commerce-framework
Evolve13 cq-commerce-frameworkEvolve13 cq-commerce-framework
Evolve13 cq-commerce-framework
 
AEM (CQ) eCommerce Framework
AEM (CQ) eCommerce FrameworkAEM (CQ) eCommerce Framework
AEM (CQ) eCommerce Framework
 
Adobe AEM Commerce with hybris
Adobe AEM Commerce with hybrisAdobe AEM Commerce with hybris
Adobe AEM Commerce with hybris
 
Java standards in WCM
Java standards in WCMJava standards in WCM
Java standards in WCM
 
JCR and Sling Quick Dive
JCR and Sling Quick DiveJCR and Sling Quick Dive
JCR and Sling Quick Dive
 
Open Development
Open DevelopmentOpen Development
Open Development
 
Jira as a Project Management Tool
Jira as a Project Management ToolJira as a Project Management Tool
Jira as a Project Management Tool
 
Interoperability at Apache Software Foundation
Interoperability at Apache Software FoundationInteroperability at Apache Software Foundation
Interoperability at Apache Software Foundation
 
Content analysis for ECM with Apache Tika
Content analysis for ECM with Apache TikaContent analysis for ECM with Apache Tika
Content analysis for ECM with Apache Tika
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Apache Poi Recipes

  • 1. Apache POI Recipes Paolo Mottadelli - ApacheCon Oakland 2009 http://chromasia.com Thursday, November 5, 2009
  • 2. paolo@apache.org my to-do list - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 3. paolo@apache.org POI @ Content Tech ✴ Document to application (and back) ✴ Publish data ✴ Build a doc from your content ✴ Know your documents ✴ Extract text ✴ Extract content - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 4. Thursday, November 5, 2009 1 A-B-C
  • 5. paolo@apache.org POI modules (1): OLE2 ✴ POIFS: reading/writing Office Documents ✴ HSSF r/w Excel Spreadsheets ✴ HWPF r/w Word Docs ✴ HSLF r/w PowerPoint Docs ✴ HPSF r/w property sets - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 6. paolo@apache.org POI modules (2): OOXML ✴ XSSF: r/w OXML Excel ✴ XWPF: r/w OXML Word ✴ XSLF: r/w OXML PowerPoint - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 7. POI 3.5 http://chromasia.com Thursday, November 5, 2009
  • 8. paolo@apache.org OOXML dev status ✴ XSSF: Final in POI-3.5 ✴ XWPF: Draft (basic features) ✴ XSLF: Not covered (only text ext.) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 9. paolo@apache.org HSSF & XSSF ✴ Common user model interface ✴ User model based on existing HSSF ✴ Using OpenXML4J and SAX - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 10. Thursday, November 5, 2009 2 Same recipe, different flavours
  • 11. paolo@apache.org Common H/XSSF access ✴ org.apache.poi.ss.usermodel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 12. paolo@apache.org Upgrading to POI-3.5 ✴ HSSFFormulaEvaluator.CellValue ✴ convert from .hssf. to .ss. ✴ HSSFRow.MissingCellPolicy ✴ convert from .hssf. to .ss. ✴ RecordFormatException in DDF ✴ convert from .hssf. to .util. Dreadful Drawing Format - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 13. Thursday, November 5, 2009 3 Meet Office Open XML
  • 14. paolo@apache.org made (very) simple Open XML ✴ XML based ✴ WordprocessingML ✴ SpreadsheetML ✴ PresentationML ✴ Stored as a package ✴ Open Packaging Conventions - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 15. paolo@apache.org Package concepts ✴ Package (the container) ✴ Part (xml file) ✴ Relationship ✴ package-relationship ✴ part-relationship - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 16. paolo@apache.org Expanded package, Excel - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 17. paolo@apache.org WordprocessingML ✴ body ✴ paragraphs ✴ runs ✴ properties (for runs and pars) ✴ styles ✴ headers/footers ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 18. paolo@apache.org SpreadsheetML ✴ workbook ✴ worksheets ✴ rows ✴ cells ✴ styles ✴ formulas ✴ images ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 19. paolo@apache.org PresentationML ✴ presentation ✴ slides ✴ slides-masters ✴ notes-masters ✴ layout, animation, audio, video, transitions ... - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 20. Thursday, November 5, 2009 4 openxml4j
  • 21. paolo@apache.org openXML4J ✴ Package, parts, rels "/xl/worksheets/sheet1.xml" - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 22. Thursday, November 5, 2009 5 Text Extraction
  • 23. paolo@apache.org Extractors ✴ POITextExtractor ✴ POIOLE2TextExtractor getT xt() e ✴ POIXMLTextExtractor ✴ XSSFExcelExtractor ✴ XWPFWordExtractor ✴ XSLFPowerPointExtractor ✴ If text is all what you need - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 24. paolo@apache.org Text extraction ✴ made simple - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 25. Thursday, November 5, 2009 6 EXCEL Simple Tasks
  • 26. paolo@apache.org New Workbook - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 27. paolo@apache.org New Sheet - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 28. paolo@apache.org Creating Cells - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 29. paolo@apache.org Cell types - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 30. paolo@apache.org Fills and colors - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 31. Thursday, November 5, 2009 7 EXCEL Imp/Exp to XML
  • 32. paolo@apache.org Export to XML - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 33. paolo@apache.org xmlMaps.xml - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 34. paolo@apache.org XML Import/Export - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 35. Thursday, November 5, 2009 8 WORD Simple Doc
  • 36. paolo@apache.org A simple doc - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 37. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 38. Thursday, November 5, 2009 9 Use Case 1 Alfresco Search
  • 39. paolo@apache.org Use Case ✴ Upload a document ✴ Detect document mimetype ✴ Extract text and metadata ✴ Create search index ✴ Search (and find) the document - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 40. paolo@apache.org Without Tika ✴ Detect the document mimetype ✴ (source/target mimetype) ✴ Get the proper ContentTransformer ✴ (ContentTransformerRegistry) ✴ Tranform Doc Content to Text ✴ (PoiHssfContentTransformer) I here PO ✴ Create Lucene index - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 41. paolo@apache.org With Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 42. paolo@apache.org Extension use case ✴ Adding support for Office Open XML documents (Office 2007+) ✴ Word 2007+ ✴ Excel 2007+ ✴ PowerPoint 2007+ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 43. paolo@apache.org POI text extractors ✴ Remember? - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 44. paolo@apache.org Apache Tika (Excel) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 45. paolo@apache.org Apache Tika - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 46. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 47. paolo@apache.org Apache Tika (Word) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 48. Thursday, November 5, 2009 10 Use Case 2 JM Lafferty Financial Forecasting
  • 49. paolo@apache.org Make your wb look pro- ✴ Rich text ✴ Graphics ✴ Formulas & Named Ranges ✴ Data validations ✴ Conditional formatting ✴ Cell comments - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 52. paolo@apache.org Formula evaluation ✴ The evaluation engine enables you to calculate formula results from within a POI application ✴ Formulas may be added to your workbook by POI ✴ Evaluation is available for .xls and .xlsx - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 53. paolo@apache.org Formula evaluation (continued) ✴ All arithmetic operators are implemented ✴ Over 280 Excel built in functions are supported - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 54. paolo@apache.org Formula evaluation (code) - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 55. Thursday, November 5, 2009 11 Use Case 3: CQ5 Import
  • 58. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 59. paolo@apache.org getParagraphs(...) ✴ Makes use of ✴ org.apache.poi.hwpf.usermodel.Range - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 60. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 61. paolo@apache.org getTitle(...) ✴ Gets the first paragraph’s text - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 62. paolo@apache.org importDocument() - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 63. paolo@apache.org - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 66. Thursday, November 5, 2009 12 Want more?
  • 67. paolo@apache.org More Examples ✴ http://poi.apache.org/spreadsheet/examples.html - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 68. paolo@apache.org Even more ✴ Get in touch ✴ http://poi.apache.org/ ✴ Get informed ✴ dev@poi.apache.org ✴ Get involved ✴ http://svn.apache.org/repos/asf/poi/trunk/ - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009
  • 69. paolo@apache.org ✴ Get slides ✴ http://www.slideshare.net/paolomoz/apache-poi-recipes Thanks - ApacheCon US 2009, Oakland - Apache POI Recipes - Thursday, November 5, 2009