2. www.dimensionality.ch @Nephentur freenode | obihackers slide 2
• Oracle ACE Director Business Analytics
• Hacking OBI since 2001 (nQuire + Peregrin aquisitions by Siebel)
• Speaker at OpenWorld, KScope, regional Oracle User Groups...
• Part-time blogger on Analytics, BI, DWH (http://dimensionality.ch)
• Full-time IRC (freenode | #obihackers) and OTN addict
• Oracle Analytics trainer for Oracle University since 2006
Who am I?
3. www.dimensionality.ch @Nephentur freenode | obihackers slide 3
Why this presentation?
• BI / Analytics / Data Science has matured
• Considered as being a commodity know-how
• As a result it’s often done worse than ever…
• …especially compared to when it was a niche skill
• Hence: Back to basics presentation series
• And: Worst Practices means someone else ran into the wall!
6. www.dimensionality.ch @Nephentur freenode | obihackers slide 6
Basics
Being a professional means
understanding as well as questioning
helping companies fulfill their needs
– not wants.
9. www.dimensionality.ch @Nephentur freenode | obihackers slide 9
Data types
Incorrect handling of data types
Example? INT vs. DOUBLE
Why?
• “In our source we store it as an integer.”
• Proper data type handling underestimated or simply ignored
12. www.dimensionality.ch @Nephentur freenode | obihackers slide 12
Hint
Never assume you know exactly how a data type
will behave.
A DOUBLE is not an INT just because it’s numeric.
Think of the lowest common denominator when
working with heterogeneous sources.
13. www.dimensionality.ch @Nephentur freenode | obihackers slide 13
Database specificities
Randomly setting database features
Why?
• “Because you read it on a blog / forum“
Result?
• Generated SQL changes sometimes drastically
• Painful troubleshooting
16. www.dimensionality.ch @Nephentur freenode | obihackers slide 16
Database specificities
Especially more specific
ones people like to pick out
of SRs / blog pposts and use
out of context!
18. www.dimensionality.ch @Nephentur freenode | obihackers slide 18
Hint
Never just copy / paste things from a blog or white paper.
Especially not if the document is called “Best Practices” or
“Performance Tuning”!
19. www.dimensionality.ch @Nephentur freenode | obihackers slide 19
Hint
Don’t fall into the trap of “I know this database so I will work
with others the same way”.
Again: source-agnostic tool means relational databases, XML,
multi-dimensional, Hadoop, etc.
20. www.dimensionality.ch @Nephentur freenode | obihackers slide 20
Joins and mashups
Data type conversion on columns used for join specifications and
mashups
(or similar “workarounds”)
22. www.dimensionality.ch @Nephentur freenode | obihackers slide 22
Joins and mashups
Why?
• Existing data models are either not properly built and clean or were
never meant to be used together for analytical purposes
• Database developers can’t be bothered to change their table creation
scripts
• “Best practices” of how to create tables are organizational rules
Result?
• Bye-bye data source optimization, hello full table scans
25. www.dimensionality.ch @Nephentur freenode | obihackers slide 25
Hint
Push back on inadequate or inappropriate physical design.
“The tool should be able to do this” is not a justification.
Do it right once physically vs do it “ok” 50’000 times logically.
26. www.dimensionality.ch @Nephentur freenode | obihackers slide 26
Aggregations
Unwittingly returning wrong calculation results due to ignorance
of the impact of pre-aggregation / post-aggregation
Background?
• Physical mapping calculations are executed pre-aggregation
• Row-by-row
29. www.dimensionality.ch @Nephentur freenode | obihackers slide 29
Aggregations
Unwittingly returning wrong calculation results due to ignorance
of the impact of pre-aggregation / post-aggregation
Background?
• Derived logical columns are execute post-aggregation
• Data set
32. www.dimensionality.ch @Nephentur freenode | obihackers slide 32
Aggregations
Why?
• Simply one of the most often misunderstood facts about aggregations
• “It’s the same formula so it will do the same thing!”
Result?
• “Wrong” results depending on interpretation and expectation
33. www.dimensionality.ch @Nephentur freenode | obihackers slide 33
Hint
Question things when you are handed a formula.
In case of doubt just go the extra miles and prepare both.
34. www.dimensionality.ch @Nephentur freenode | obihackers slide 34
Mixing models
• Not everything can be ETL’ed together
• Not all data matches on all attributes and dimensionalities
• Multi-fact models and non-conformed dimensionality are still crucial
Problems:
• Missing results from cross-star queries
• Analyses simply won’t run due to front-end post-calculations failing
• Wrong results when queries implicitly hit wrong facts
40. www.dimensionality.ch @Nephentur freenode | obihackers slide 40
Hint
Try to be as precise as possible from the start. Going through a 5
fact, 30 dimension model post-construction is a recipe for
disaster.
A workaround is never a solution.
You will believe me once you have effectively your first
environment which combines detailed relational data with huge,
sparse cubes, XML files and a Hadoop cluster.
41. www.dimensionality.ch @Nephentur freenode | obihackers slide 41
Dimensional hierarchies
• Don’t forget your business users and ease of use
• Some developers don’t see the point of content levels!
(Don’t believe me? Just do a search StackOverflow.)
• “We didn’t need that in OtherReportDrawingTool42.”
43. www.dimensionality.ch @Nephentur freenode | obihackers slide 43
Hint
Work invested here will pay off a hundred-fold.
Be smart. Think LEGO! Think re-usable pieces.
Forget cell-oriented “reporting tool” … aka Excel!
45. www.dimensionality.ch @Nephentur freenode | obihackers slide 45
Time dimensions
• Business should be able to use time in a consistent way
• Temporal calculations should be easy
48. www.dimensionality.ch @Nephentur freenode | obihackers slide 48
Hint
Be smart and think analytical.
Blindly reproducing Excel formulas isn’t being productive.
The how you do something is at least as important as what it
achieves.
49. www.dimensionality.ch @Nephentur freenode | obihackers slide 49
Hint
Your clients are actually paying you to do this.
Have the courtesy of delivering something of quality.
Guess what: repeat customers and follow-up work because they
liked things trump you coming back to fix rubbish.
Anybody doing it nice and usable from day 1 will win the
business’ heart and budgets.
58. www.dimensionality.ch @Nephentur freenode | obihackers slide 58
Excel Excel everywhere
Using analytical platforms as an Excel exporting tool
Why?
• Everyone has Excel
• Data on their Excel sheets makes users feel secure
• “BI systems never give you enough.”
• “I must able to change the data.”
59. www.dimensionality.ch @Nephentur freenode | obihackers slide 59
Excel Excel everywhere
Using OAC / OBI as an Excel exporting tool
Result?
• Data pushed DB -> OBIS -> OBIPS -> Browser -> Excel
• Even worse for cloud roundtrips
• Data multiplied in static form and open to change for everyone
Supreme surprise: Even Oracle says this is not a good idea!
https://support.oracle.com/epmos/faces/DocumentDisplay?id=1558070.1
60. www.dimensionality.ch @Nephentur freenode | obihackers slide 60
Hint
Never be afraid to reject data dump “requirements”.
Focus on what are the actual needs.
Focus on what your users are trying to achieve.
61. www.dimensionality.ch @Nephentur freenode | obihackers slide 61
Front-end usage
Using OBI as an Excel exporting tool
Why? • Everyone has Excel
• Data on their Excel sheets makes users feel secure
•“BI systems never give you enough.”
•“I must able to change the data.”
Result? • Data pushed DB -> OBIS -> OBIPS -> Browser -> Excel
• Data multiplied in static form and open to change for everyone
Multi-million row settings and hundreds of columns in instanceconfig.xml
Why? See above
Result? Taking down whole OBI farms when generating 5m row, 200 column PDFs
based on an analysis with alternating row highlighting and conditional icons.
62. www.dimensionality.ch @Nephentur freenode | obihackers slide 62
Write-back
Using analytical tools as a data entry tool
Why?
• “It is a web application and must support this”.
• “We don’t want to buy or use another tool.”
Result?
• Have you ever heard of APEX?
• Analytical write-back or scenario updates does NOT mean arbitrary data entry!
•
64. www.dimensionality.ch @Nephentur freenode | obihackers slide 64
Cache
Turning on caching for performance
Why?
• It’s a quick fix and an easy smoke screen to hide actual issues in the DB.
Result?
• Without proper cache purging and seeding strategy stale data is assured.
• The ACTUAL problem for performance is never addressed and will persist!
65. www.dimensionality.ch @Nephentur freenode | obihackers slide 65
Data Quality
Using your analytical tool to “solve” data quality issues
Example? CAST, CASE WHEN, FILTER as smoke
Why? As above plus: visible data quality issues = YOU are responsible
Result? • Performance degradation
• Data quality issues are hidden rather than solved
• Additional errors when source data changes
• Disappearing data when new source data outside the hardcoded cases
• ELSE statement becomes the only possible savior (“Invalid conversion”)
66. www.dimensionality.ch @Nephentur freenode | obihackers slide 66
Source Control and Versioning
Not using source control for all your artifact
Why? Admin effort perceived as waste of time / non-productive
Result?
• Deployment pains when chasing files and configurations
• Ever tried to go back to a certain state of an environment?
• Source Control and Concurrent Development for OBIEE
• Several tools exist
- http://redpillanalytics.com/checkmate/
- https://github.com/RittmanMead/obi-concurrent-develop
68. www.dimensionality.ch @Nephentur freenode | obihackers slide 68
Usage Tracking
• Trace and monitor usage
• Who does what, when and how
• Who accesses what, when and how
70. www.dimensionality.ch @Nephentur freenode | obihackers slide 70
GDPR
• Dangerously misunderstood / underestimated
• Will become law 25th May 2018
• “Who can access which data why?”
• “Which data gets transformed how?”
– Think “data lineage on steroids”…
75. www.dimensionality.ch @Nephentur freenode | obihackers slide 75
Hint
Graphing out your environment helps you for more than just EU
law
Security assessments and reviews become child’s play
Impact analysis is instantaneous
Just ask and we’ll follow up