SlideShare uma empresa Scribd logo
1 de 19
Top 10 challenges of making big data real
– and tips to overcome them

   Rich Dill
   Solutions Engineer, SnapLogic
   rdill@snaplogic.com
A play on Dave Letterman’s top 10

• 1. A miracle occurs here
    - Of course we can connect to it…
• 2. There is always more data than you expected
    -   Unless there is not enough data to be meaningful
• 3. Never mistake a memo for reality
    - Did you hear what I said or what I meant?
• 4. It is logically impossible to schedule for the unknown
    - Or the relationship between developers and weathermen
• 5. There is life beyond American English
    - Eventually you will have to deal with other languages



2
A play on Dave Letterman’s top 10

• 6. Of course the data is accurate, clean and ready
    - Data quality issues can kill project schedules
• 7. Dealing with unstructured data is fun
    - Somewhere buried inside is your delimiter where you least
      expect it
• 8. The data and process is subject to…
    - Pick your acronym PCI, FIX, HIPAA, SOX
• 9. The requirements once defined are set in stone
    - Requirements almost always evolve
• 10. The most critical data will be on the most difficult
  platform to access
    - “a good deal of our case data is on Notes running on AS400”
3
A miracle occurs here

• Of course we can connect to it…




4
And we know the image resonates, v2…




5
SnapLogic Solution


                           Users
 ESB            RDBMS




 Data Center               Mobile




   Enterprise

        Amazon Redshift



       Cloud              Big Data
There is always more data than you expected



• Unless there is not enough data to be
  meaningful
    - It’s feast or famine
    - Distributed systems replicate data
      • At the site level and at the network level
         - 3x at the data center in Houston and 3x in Chicago
         - Replicated data can increase the cost of hardware,
           network and software
    - We are far from normal
      • Data is organized for performance and reliability
        not space efficiency
7
It is logically impossible to schedule for the unknown


• Or my theory of the relationship between developers
  and weathermen




• The accuracy of an estimate is a function of the
  number of variables and the length of the project
8
Never mistake a memo for reality

• Did you hear what I said or what I meant?
•   Are you a literal listener?
     -   Psycholinguistics should be required reading for project managers
• Waterfall process
     - Allows you to build something the user wants today that you deliver in
       9 months or two years
• Iterative process
     - We’ll figure it out as we go along
     - Not really suited for deep architectural designs
•   Process
     -   Listen
     -   Process
     -   Repeat back “this is what I heard you say”
•   Nothing beats showing a functioning prototype, demo or wireframe


9
There is life beyond American English

• Eventually you will have to deal with other languages
     - German will test your user interface spacing
     - Cyrillic will add to the character set
• Middle eastern languages
     - Read right to left
     - Some languages don’t have consistent spelling
• Far eastern languages
     - There is no such thing as Chinese
        •   Mandarin is the “Speech of Officials”
        •   Cantonese is used in Hong Kong
        •   Hangul is used in Korea
        •   Japanese
              -   Kanji is adopted Chinese characters
              -   Kana is a combination of Hiragana & Katakana

10
Of course the data is accurate, clean and ready


• How good is the data?
     -   Profiling the data is key to accurate project estimates
     -   What percentage of the data is null, blank, invalid?

• Data lifecycle includes
     -   Acquisition or creation
     -   Validation
          •   Business rules
          •   Which may result in…

• Data cleansing
     -   Zip code tables, barcodes, D & B credit ratings
     -   Public data resources: www.data.gov
• Storage in an accessible format/location
• Archiving
     -   Industry or legal rules for archiving


11
Dealing with unstructured data is fun

• Somewhere buried inside is your delimiter where you
  least expect it
• Email is one of the most complex to handle
• Hierarchal data structures must be mapped or
  navigated
• XML is not the end all, be all of structure data
  formatting
     -   JSON
     -   BSON
     -   SomethingImissedSON




12
Big Data Reference Architecture

       1                  2               3
 Collect          Translate & Enrich   Distribute

                                        DB
Structured Data




                                                    DB


                                                    Data
                                                    View

 Unstructured
     Data
The data and process is subject to…

• Pick your acronym: PCI, FIX, HIPAA, SOX
• Almost every industry has some form or another of data
  handling protocols that must be addressed
• These protocols are a combination of
     -   Data creation
     -   Data access
     -   Technology and workflow
     -   It is not just encryption and access
• Know your customers requirements!




14
The requirements once defined are set in stone


 • What your users know today is not what they will know
   tomorrow…
 • Requirements evolve
 • Why do you think they call them users?
      - If you are successful they will want more
 • Things change
      -   Economy
      -   Budgets
      -   Timeframe
      -   Management
 • Feature creep is not a bad thing if budgets and
   timelines also creep
 15
The most critical data will be on the most difficult
platform to access

• “A good deal of our case data is on Notes running on AS400”
• Discover where the data is first
• When can you access it?
     - 24x7, after hours, on demand
• Throughput is key
     - Either during business hours of afterwards
• What conditions?
     -   One time download
     -   Scheduled
     -   Event based
     -   Stream
• What about security requirements?
     - There is a performance impact of encryption during transmission

16
Containerization with Snaps




                BUY                          BUILD
    •   SnapStore                 •   SDK + API
    •   Certified and supported   •   Java, Python
        by SnapLogic              •   Customer, Partner or
                                      SnapLogic
The eleventh rule

• Free software sometimes is worth the cost
     - Or the money you save on licenses is multiplied by
       the cost of training and consultants
     - In most cases labor is the one of the biggest costs of
       most software projects
• Open source is NOT the same as free!
     - Subscription vs. perpetual licenses
     - Does the customer need to
        • Expense or capitalize software licenses



18
Thank you
For more information
www.snaplogic.com
BDaaS - BigData as a Service

Mais conteúdo relacionado

Mais de SnapLogic

Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies SnapLogic
 
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...SnapLogic
 
SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic
 
Self-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at BoxSelf-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at BoxSnapLogic
 
Live Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applicationsLive Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applicationsSnapLogic
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
Spring 2017 release customer webinar
Spring 2017 release customer webinarSpring 2017 release customer webinar
Spring 2017 release customer webinarSnapLogic
 
SnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistantSnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistantSnapLogic
 
Webinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTWebinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTSnapLogic
 
SnapLogic Culture
SnapLogic CultureSnapLogic Culture
SnapLogic CultureSnapLogic
 
SnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen IntegratorSnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen IntegratorSnapLogic
 
Big Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowBig Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowSnapLogic
 
SnapLogic Live: Workday Integration
SnapLogic Live: Workday IntegrationSnapLogic Live: Workday Integration
SnapLogic Live: Workday IntegrationSnapLogic
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic
 
SnapLogic Live: IoT Integration
SnapLogic Live: IoT IntegrationSnapLogic Live: IoT Integration
SnapLogic Live: IoT IntegrationSnapLogic
 
SnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud AnalyticsSnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud AnalyticsSnapLogic
 
SnapLogic Live: ServiceNow Integration
SnapLogic Live: ServiceNow IntegrationSnapLogic Live: ServiceNow Integration
SnapLogic Live: ServiceNow IntegrationSnapLogic
 
SnapLogic Live: Salesforce Integration
SnapLogic Live: Salesforce IntegrationSnapLogic Live: Salesforce Integration
SnapLogic Live: Salesforce IntegrationSnapLogic
 
SnapLogic Live: Anaplan Integration
SnapLogic Live: Anaplan IntegrationSnapLogic Live: Anaplan Integration
SnapLogic Live: Anaplan IntegrationSnapLogic
 

Mais de SnapLogic (20)

Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
 
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
Overcoming the challenge of multiple data frameworks in a multiple cloud envi...
 
SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018
 
Self-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at BoxSelf-Service Integration in the Age of Digital Transformation at Box
Self-Service Integration in the Age of Digital Transformation at Box
 
Live Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applicationsLive Demo: Accelerate the integration of workday applications
Live Demo: Accelerate the integration of workday applications
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Spring 2017 release customer webinar
Spring 2017 release customer webinarSpring 2017 release customer webinar
Spring 2017 release customer webinar
 
SnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistantSnapLogic unveils machine-learning-driven integration assistant
SnapLogic unveils machine-learning-driven integration assistant
 
Webinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoTWebinar: Evolution of Data Management for the IoT
Webinar: Evolution of Data Management for the IoT
 
The API Lie
The API LieThe API Lie
The API Lie
 
SnapLogic Culture
SnapLogic CultureSnapLogic Culture
SnapLogic Culture
 
SnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen IntegratorSnapLogic Live: Enabling the Citizen Integrator
SnapLogic Live: Enabling the Citizen Integrator
 
Big Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To KnowBig Data Management: What's New, What's Different, and What You Need To Know
Big Data Management: What's New, What's Different, and What You Need To Know
 
SnapLogic Live: Workday Integration
SnapLogic Live: Workday IntegrationSnapLogic Live: Workday Integration
SnapLogic Live: Workday Integration
 
SnapLogic Live: Big Data Integration
SnapLogic Live: Big Data IntegrationSnapLogic Live: Big Data Integration
SnapLogic Live: Big Data Integration
 
SnapLogic Live: IoT Integration
SnapLogic Live: IoT IntegrationSnapLogic Live: IoT Integration
SnapLogic Live: IoT Integration
 
SnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud AnalyticsSnapLogic Live: Powering Cloud Analytics
SnapLogic Live: Powering Cloud Analytics
 
SnapLogic Live: ServiceNow Integration
SnapLogic Live: ServiceNow IntegrationSnapLogic Live: ServiceNow Integration
SnapLogic Live: ServiceNow Integration
 
SnapLogic Live: Salesforce Integration
SnapLogic Live: Salesforce IntegrationSnapLogic Live: Salesforce Integration
SnapLogic Live: Salesforce Integration
 
SnapLogic Live: Anaplan Integration
SnapLogic Live: Anaplan IntegrationSnapLogic Live: Anaplan Integration
SnapLogic Live: Anaplan Integration
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

  • 1. Top 10 challenges of making big data real – and tips to overcome them Rich Dill Solutions Engineer, SnapLogic rdill@snaplogic.com
  • 2. A play on Dave Letterman’s top 10 • 1. A miracle occurs here - Of course we can connect to it… • 2. There is always more data than you expected - Unless there is not enough data to be meaningful • 3. Never mistake a memo for reality - Did you hear what I said or what I meant? • 4. It is logically impossible to schedule for the unknown - Or the relationship between developers and weathermen • 5. There is life beyond American English - Eventually you will have to deal with other languages 2
  • 3. A play on Dave Letterman’s top 10 • 6. Of course the data is accurate, clean and ready - Data quality issues can kill project schedules • 7. Dealing with unstructured data is fun - Somewhere buried inside is your delimiter where you least expect it • 8. The data and process is subject to… - Pick your acronym PCI, FIX, HIPAA, SOX • 9. The requirements once defined are set in stone - Requirements almost always evolve • 10. The most critical data will be on the most difficult platform to access - “a good deal of our case data is on Notes running on AS400” 3
  • 4. A miracle occurs here • Of course we can connect to it… 4
  • 5. And we know the image resonates, v2… 5
  • 6. SnapLogic Solution Users ESB RDBMS Data Center Mobile Enterprise Amazon Redshift Cloud Big Data
  • 7. There is always more data than you expected • Unless there is not enough data to be meaningful - It’s feast or famine - Distributed systems replicate data • At the site level and at the network level - 3x at the data center in Houston and 3x in Chicago - Replicated data can increase the cost of hardware, network and software - We are far from normal • Data is organized for performance and reliability not space efficiency 7
  • 8. It is logically impossible to schedule for the unknown • Or my theory of the relationship between developers and weathermen • The accuracy of an estimate is a function of the number of variables and the length of the project 8
  • 9. Never mistake a memo for reality • Did you hear what I said or what I meant? • Are you a literal listener? - Psycholinguistics should be required reading for project managers • Waterfall process - Allows you to build something the user wants today that you deliver in 9 months or two years • Iterative process - We’ll figure it out as we go along - Not really suited for deep architectural designs • Process - Listen - Process - Repeat back “this is what I heard you say” • Nothing beats showing a functioning prototype, demo or wireframe 9
  • 10. There is life beyond American English • Eventually you will have to deal with other languages - German will test your user interface spacing - Cyrillic will add to the character set • Middle eastern languages - Read right to left - Some languages don’t have consistent spelling • Far eastern languages - There is no such thing as Chinese • Mandarin is the “Speech of Officials” • Cantonese is used in Hong Kong • Hangul is used in Korea • Japanese - Kanji is adopted Chinese characters - Kana is a combination of Hiragana & Katakana 10
  • 11. Of course the data is accurate, clean and ready • How good is the data? - Profiling the data is key to accurate project estimates - What percentage of the data is null, blank, invalid? • Data lifecycle includes - Acquisition or creation - Validation • Business rules • Which may result in… • Data cleansing - Zip code tables, barcodes, D & B credit ratings - Public data resources: www.data.gov • Storage in an accessible format/location • Archiving - Industry or legal rules for archiving 11
  • 12. Dealing with unstructured data is fun • Somewhere buried inside is your delimiter where you least expect it • Email is one of the most complex to handle • Hierarchal data structures must be mapped or navigated • XML is not the end all, be all of structure data formatting - JSON - BSON - SomethingImissedSON 12
  • 13. Big Data Reference Architecture 1 2 3 Collect Translate & Enrich Distribute DB Structured Data DB Data View Unstructured Data
  • 14. The data and process is subject to… • Pick your acronym: PCI, FIX, HIPAA, SOX • Almost every industry has some form or another of data handling protocols that must be addressed • These protocols are a combination of - Data creation - Data access - Technology and workflow - It is not just encryption and access • Know your customers requirements! 14
  • 15. The requirements once defined are set in stone • What your users know today is not what they will know tomorrow… • Requirements evolve • Why do you think they call them users? - If you are successful they will want more • Things change - Economy - Budgets - Timeframe - Management • Feature creep is not a bad thing if budgets and timelines also creep 15
  • 16. The most critical data will be on the most difficult platform to access • “A good deal of our case data is on Notes running on AS400” • Discover where the data is first • When can you access it? - 24x7, after hours, on demand • Throughput is key - Either during business hours of afterwards • What conditions? - One time download - Scheduled - Event based - Stream • What about security requirements? - There is a performance impact of encryption during transmission 16
  • 17. Containerization with Snaps BUY BUILD • SnapStore • SDK + API • Certified and supported • Java, Python by SnapLogic • Customer, Partner or SnapLogic
  • 18. The eleventh rule • Free software sometimes is worth the cost - Or the money you save on licenses is multiplied by the cost of training and consultants - In most cases labor is the one of the biggest costs of most software projects • Open source is NOT the same as free! - Subscription vs. perpetual licenses - Does the customer need to • Expense or capitalize software licenses 18
  • 19. Thank you For more information www.snaplogic.com BDaaS - BigData as a Service

Notas do Editor

  1. 1990sValuable data was being generated but was really living in silo’d environments. The term MDM was not even coined till 2003As long as you could connect different systems together via a nightly, or sometimes even a weekly feed, that was pretty darn awesome!Technologies like ESBs, EAIs, ETLs… flourished.Data was mostly structured. Sitting in RDBMS systems2000sNetwork speeds increasedCosts went downPlayers like Salesforce and NetSuite started getting traction from SMB marketImmense value on cost and agilityFlexibility of to subscribe vs. perpetual licenses2005: Consumer / Social dataFB, Twitter, LinkedIn, amazon.com consumer reviews…Humans generating massive amounts of preference data, likes and dislikes, Data was different: Non-relational unstructured. Real-time dataHuge volumes: PetabytesProviding immense value to the business on their customers2010: MachineRFID tags. Various other sensors, weblogs. ArcSight got bought out for $1.5B by HPMassive amounts of dataExabytesSplunk had a successful IPO last monthSnap LogicThese 4 sources create an Impendence mismatch!Good luck doing all of this with an ESB Structured vs. unstructuredStreaming vs. batchPetabytes and Exabytes vs. GigaBytesPull vs. pushHub and spokeUnprecedented opportunity & desire to use dataData silos (data fragmentation) unavoidableLegacy Apps, Cloud Apps, and Hadoop are driving thisDifferent locations, protocols, formats, and architecturesData is more distributed & less accessible (less useful)Compounding due to volume & variety of apps & dataESB is just another connectionEnterprises must share data between their appsCollect, combine, process data into valuable informationCompetitive advantage will become necessity for survivalsnapLogic = data sharing platform
  2. Apple Like Model – we offer an API and about 200 SnapsBuild or BuyEasy to build w Java or Phython – An intern out of school built snaps in 4 daysBuild or Buy – Containerazation of accessAbstraction of the end point – so you do not need to know everything