SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
Excel and R: data
                            exchange (updated)
                                               R-meetup of Los Angeles
                                                     Eric Kostello
                            (Corrections appreciated, especially in summary capability table.)




Sunday, December 19, 2010
Topics
                   •        General remarks on spreadsheets
                   •        Overview of options for R and Excel to exchange
                            data
                   •        Example: Pre-formatted template + automated
                            population
                   •        Possible comprehensive solution: package xlsx
                            •   but only for .xlsx file formats currently
                   •        Sorry: nothing on non-Excel spreadsheets


Sunday, December 19, 2010
On spreadsheets
                   •        Power of spreadsheets: you can do “anything”
                            •   Unfortunately, anything can happen
                   •        Spreadsheets are ubiquitous
                            •   Very handy for certain types of problems
                            •   Users like the control they give
                   •        When the scope of the spreadsheet part of the data creation/
                            analysis/production activities is appropriate, they may be useful
                            •   More generally: when the scope of [ specific technology ] is
                                appropriate to [ the problem ] it may be useful to use it.



Sunday, December 19, 2010
A few perils (among many)
                   •        Spreadsheets are typically built with little or no enforcement of data
                            integrity
                            •   Errors can creep into spreadsheets
                            •   Huge challenge for automated interaction
                                •   No solution proposed here. Often the only way to work with data
                                    from such sources is by manual cleaning.
                   •        This is not a talk about why not to use spreadsheets, but check these out...
                            •   http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html
                                •   Encyclopedia of the Evils, but acknowledges utility when limited in
                                    scope
                            •   “spreadsheet addiction”: search the web with this phrase to see that
                                problems with spreadsheets are not confined to data analysis




Sunday, December 19, 2010
Living with spreadsheets
                   •        R users often must exchange data with spreadsheet
                            users
                            •   Data is stored in spreadsheets because...
                                •   That is the way it was archived/sent/obtained
                                •   It is still being created that way and change is
                                    difficult/impossible
                   •        So, communication is essential
                            •   Easier communication may make your day easier and
                                your exchange more reliable
                   •        With that in mind...


Sunday, December 19, 2010
Data exchange between R and Excel
         Method/                                                                                                                          Cross
                              RW            Details                             Pros                             Cons
         package                                                                                                                         platform
    Avoid                     RW   Import/Export CSV              Avoid some Excel pitfalls          Manual steps required              Yes

                                                                                                                                        With driver
                                                                  Can read rows and columns.
    RODBC + drivers           R    Adaptation of SQL APIs                                          Complexity & inconsistencies         purchase (if
                                                                  Some writing ability on Windows.
                                                                                                                                        non-MS OS)


                                                                                                     Data frame to sheet only.
    read.xls                       Automates creation of CSV
                              R                                   Reads xls and xlsx                 Trouble with quotes.               Yes
    (gdata, Perl)                  using Perl, then imports
                                                                                                     Perl dependencies nuisance

    write.xls
                                   Automates creation of CSVs,                                       data frame to sheet only.
    (dataframe2xls,           W                                   Some formatting ability                                               Yes
                                   then converts                                                     (Coerces to dataframe.)
    Python)
    WriteXLS                       Automates creation of CSVs,    Some formatting ability            Limited flexibility.
                              W                                                                                                         Yes
    (WriteXLS & Perl)              then converts                  Multiple sheets/one call           Data frame to sheet only
    RDCOMClient               RW   via Windows APIs               Cell level control                 Not fully vectorized?              No

                                   Free version and Pro version   Fast, mature                       .xls format only (.xlsx a future
    xlsReadWrite              RW   (shareware) without            Pro version can read/write rows/   possibility)                       No
                                   dependencies                   columns, ranges.                   No formatting.

                                                                  Data frames and smaller. Fine      Slow
                                   Using Java library from
                              RW                                  formatting control.                xlsx format only.                  Yes
    xlsx (rJava & xlsxJars)        Apache
                                                                  xlsx file format.                   Not fully vectorized.




Sunday, December 19, 2010
Write to pre-formatted
               spreadsheet using RDCOMClient
                   •        Hybrid approach to repeated report creation
                            •   Windows only approach to creating .xls Excel spreadsheets without
                                programmatic formatting
                   •        Inherit/create formatting and/or formulas in Excel (“by hand”)
                            •   Save a template file to copy and populate for each new report
                   •        Use shell commands to
                            •   Copy the template into a new version with an appropriate name
                   •        Use RDCOMClient functions to...
                            •   Open the copy
                            •   write to specific cells in the spreadsheet
                            •   Close the copy




Sunday, December 19, 2010
RDCOMClient example
              library ( "RDCOMClient")


              exampleTemplateFilename <- "Example_Template.xls" # This would have all necessary formatting and formulas in place
              newExcelReportInstance <- paste ( "reportsDirectoryReport_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '')
              copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance )
              shell ( copyCommand, shell = 'cmd      %WINDIR%')
              print ( "Ignore the error message about UNC paths if it occurs; it does not matter.")


              exampleData <- data.frame (X = 10:19, Y = 656:647 )
              .COMInit() # Start server
              exl <- COMCreate("Excel.Application") # Hook to Excel
              books <- exl[["workbooks"]] # Talk to workbooks


              exampleBook <- books$open(newHOfile) # Talk to book
              exampleSheets <- exampleBook[["sheets"]] # Talk to sheets
              exampleSheet      <- exampleSheets$Item(as.integer(1)) # Talk to a specific sheet


              # But I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows. Do range only apply to
              rows[??]


              headerRowPadding <- 1 # Allow for this many header rows
              for ( ithRow in 1:nrow ( exampleData ) ) {
                            cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to
                            worksheet Column A, row ithRow + headerRowPadding
                            cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ]
                            cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') )
                            cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ]
                            }
              exampleBook$save()
              exampleBook$close()




Sunday, December 19, 2010
xlsx package overview
                   •        Philosophy: Use Excel interface capabilities created in a more
                            widely used codebase: The Apache Java API to Microsoft
                            documents.
                   •        Many capabilities are obtained “for free.”
                   •        Full-featured cross platform solution
                   •        This is a suitable candidate for one stop shopping in R to Excel
                            communications
                            •   but requiring it may be a problem for some installations
                                (rJava dependency)
                            •   It is somewhat slow, which is noticeable for larger Excel files



Sunday, December 19, 2010
xlsx package capabilities
                   •        Easy data frame import/export: read.xls and write.xls
                            •   write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”)
                            •   read.xlsx ( file = ..., sheet = ... )
                                •   One sheet at a time. Can keep formulas, provide colClasses.
                   •        Reads/writes at the cell level (but writing not fully vectorized)
                   •        Formatting control (using Excel native capabilities, such as
                            borderColor)
                   •        Reads/Writes comments in cells
                   •        Merging regions, freezing panes, set print area, set zoom
                   •        Can insert images (dib, emf, jpeg, pict, png, wmf).



Sunday, December 19, 2010

Mais conteúdo relacionado

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Los Angeles R users group - Dec 14 2010 - Part 4 updated

  • 1. Excel and R: data exchange (updated) R-meetup of Los Angeles Eric Kostello (Corrections appreciated, especially in summary capability table.) Sunday, December 19, 2010
  • 2. Topics • General remarks on spreadsheets • Overview of options for R and Excel to exchange data • Example: Pre-formatted template + automated population • Possible comprehensive solution: package xlsx • but only for .xlsx file formats currently • Sorry: nothing on non-Excel spreadsheets Sunday, December 19, 2010
  • 3. On spreadsheets • Power of spreadsheets: you can do “anything” • Unfortunately, anything can happen • Spreadsheets are ubiquitous • Very handy for certain types of problems • Users like the control they give • When the scope of the spreadsheet part of the data creation/ analysis/production activities is appropriate, they may be useful • More generally: when the scope of [ specific technology ] is appropriate to [ the problem ] it may be useful to use it. Sunday, December 19, 2010
  • 4. A few perils (among many) • Spreadsheets are typically built with little or no enforcement of data integrity • Errors can creep into spreadsheets • Huge challenge for automated interaction • No solution proposed here. Often the only way to work with data from such sources is by manual cleaning. • This is not a talk about why not to use spreadsheets, but check these out... • http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html • Encyclopedia of the Evils, but acknowledges utility when limited in scope • “spreadsheet addiction”: search the web with this phrase to see that problems with spreadsheets are not confined to data analysis Sunday, December 19, 2010
  • 5. Living with spreadsheets • R users often must exchange data with spreadsheet users • Data is stored in spreadsheets because... • That is the way it was archived/sent/obtained • It is still being created that way and change is difficult/impossible • So, communication is essential • Easier communication may make your day easier and your exchange more reliable • With that in mind... Sunday, December 19, 2010
  • 6. Data exchange between R and Excel Method/ Cross RW Details Pros Cons package platform Avoid RW Import/Export CSV Avoid some Excel pitfalls Manual steps required Yes With driver Can read rows and columns. RODBC + drivers R Adaptation of SQL APIs Complexity & inconsistencies purchase (if Some writing ability on Windows. non-MS OS) Data frame to sheet only. read.xls Automates creation of CSV R Reads xls and xlsx Trouble with quotes. Yes (gdata, Perl) using Perl, then imports Perl dependencies nuisance write.xls Automates creation of CSVs, data frame to sheet only. (dataframe2xls, W Some formatting ability Yes then converts (Coerces to dataframe.) Python) WriteXLS Automates creation of CSVs, Some formatting ability Limited flexibility. W Yes (WriteXLS & Perl) then converts Multiple sheets/one call Data frame to sheet only RDCOMClient RW via Windows APIs Cell level control Not fully vectorized? No Free version and Pro version Fast, mature .xls format only (.xlsx a future xlsReadWrite RW (shareware) without Pro version can read/write rows/ possibility) No dependencies columns, ranges. No formatting. Data frames and smaller. Fine Slow Using Java library from RW formatting control. xlsx format only. Yes xlsx (rJava & xlsxJars) Apache xlsx file format. Not fully vectorized. Sunday, December 19, 2010
  • 7. Write to pre-formatted spreadsheet using RDCOMClient • Hybrid approach to repeated report creation • Windows only approach to creating .xls Excel spreadsheets without programmatic formatting • Inherit/create formatting and/or formulas in Excel (“by hand”) • Save a template file to copy and populate for each new report • Use shell commands to • Copy the template into a new version with an appropriate name • Use RDCOMClient functions to... • Open the copy • write to specific cells in the spreadsheet • Close the copy Sunday, December 19, 2010
  • 8. RDCOMClient example library ( "RDCOMClient") exampleTemplateFilename <- "Example_Template.xls" # This would have all necessary formatting and formulas in place newExcelReportInstance <- paste ( "reportsDirectoryReport_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '') copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance ) shell ( copyCommand, shell = 'cmd %WINDIR%') print ( "Ignore the error message about UNC paths if it occurs; it does not matter.") exampleData <- data.frame (X = 10:19, Y = 656:647 ) .COMInit() # Start server exl <- COMCreate("Excel.Application") # Hook to Excel books <- exl[["workbooks"]] # Talk to workbooks exampleBook <- books$open(newHOfile) # Talk to book exampleSheets <- exampleBook[["sheets"]] # Talk to sheets exampleSheet <- exampleSheets$Item(as.integer(1)) # Talk to a specific sheet # But I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows. Do range only apply to rows[??] headerRowPadding <- 1 # Allow for this many header rows for ( ithRow in 1:nrow ( exampleData ) ) { cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to worksheet Column A, row ithRow + headerRowPadding cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ] cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') ) cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ] } exampleBook$save() exampleBook$close() Sunday, December 19, 2010
  • 9. xlsx package overview • Philosophy: Use Excel interface capabilities created in a more widely used codebase: The Apache Java API to Microsoft documents. • Many capabilities are obtained “for free.” • Full-featured cross platform solution • This is a suitable candidate for one stop shopping in R to Excel communications • but requiring it may be a problem for some installations (rJava dependency) • It is somewhat slow, which is noticeable for larger Excel files Sunday, December 19, 2010
  • 10. xlsx package capabilities • Easy data frame import/export: read.xls and write.xls • write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”) • read.xlsx ( file = ..., sheet = ... ) • One sheet at a time. Can keep formulas, provide colClasses. • Reads/writes at the cell level (but writing not fully vectorized) • Formatting control (using Excel native capabilities, such as borderColor) • Reads/Writes comments in cells • Merging regions, freezing panes, set print area, set zoom • Can insert images (dib, emf, jpeg, pict, png, wmf). Sunday, December 19, 2010