31. MIKE DRISCOLL CO-FOUNDER + CTO METAMARKETS @medriscoll making sense of data: lessons for start-ups questions?
Notas do Editor
Feedback loops.
Over the next set of slides, I’m going discuss some lessons as data moves through a start-ups organization...
So this is how we frame our technology stack at my start-up, Metamarkets. It’s a four-tiered stack. I believe that many start-ups have similar stacks when they think about how data moves through them.But there’s something important missing here: your technology stack doesn’t exist in a vacuum.
Over the next set of slides, I’m going discuss some lessons as data moves through a start-ups organization...
To be successful, we’ve got to incorporate feedback, both from customers, and the larger world.Feedback is critical. Steve Blank and Eric Ries have talked about not iterating in a vacuum.The feedback you can achieve by managing your data can be incredibly important.
Which begins at ingestion, and ends at the top with products.
ETL often gets a bad wrap. Nothing could be more important to your company than moving data between systems.That is what ETL does. It should be a first class piece of your architecture, you should put one of top engineers at this layer of the stack.(At Metamarkets, we have a former VP of BlackRock working on ETL, and he’s been outstanding).When our ETL breaks down, the data stops flowing, and our business stops moving.
* Don’t invest in real-time data if you’re making weekly decisions.* Moving away from batch systems is hard work.Alternatively, some systems – such as those required for monitoring – may need sub-millisecond response times.But as a general rule, reducing latency in systems creates value in unexpected ways.
Don’t get bogged down in discussions of the perfect data format for your company. “All models are wrong, some models are useful.”There is no such thing.
Which begins at ingestion, and ends at the top with products.
You will likely end up using a variety of data stores in your organization.So don’t agonize over your data store choices.
As you scale and grow, you will have to change storage layers.We went through three different versions, first Postgres, then Greenplum, then HBase, before developing on our own version.
embrace standardssimple, flat formats wherever possible (XML is the clamshell packaging of data)We recently onboarded a client who gave us JSON data. It’s a beautiful thing.Everyone knows SQL: Cloudera found that Hadoop cluster use went up 10x when HIVE was installed.
But HIVE isn’t going to cut it for getting quick insights into their data. No wants to wait 15 minutes for answers.Put in ETL flows that summarize data, and keep a core set of key business metrics in a “hot” database, one that can be queried in real-time.
Feedback loops.
Requirements for systems should be driven by their business needs.
Which begins at ingestion, and ends at the top with products.
but remember...
4sq explorepymkkaggle winnerswritten by individuals who were engineers first, statisticians second.when hiring folks to do your analytics, you want those who can roll up their sleaves and actually code the models themselvees.
don’t make your analytics team compete for resources, or jeopardize production systemsthey will only get burned and then cut outset up systems where analytics folks can play with data, safelyanalytics often falls into the class of problems that are important, but not urgent. don’t let this happen to your organization.
Which begins at ingestion, and ends at the top with products.
Data represents the totality of a start-up’s sensory experiences.Absent a well-developed digital nervous system to respond to these inputs, you are blind to your deficiencies, deaf to your customers, and dumb to your opportunities.
Either externally, as Klout,Flightcaster, and BillGuard have done.4SQ’s Explore and LinkedIn’s PYMK, has both improved User Experience.Having strong analytical talent in your organization is critical to success here.