This document provides an overview of MapR Technologies and their products. It discusses how MapR helps companies harness big data by providing an enterprise-grade distribution of Apache Hadoop that includes data protection, security, and high performance capabilities. It also highlights MapR partnerships with companies like Syncsort to provide data integration, migration, and analytics solutions that help customers derive more value from their data.
The first trend is that the industry leaders have shown how to use big data to compete and win in their markets. It’s no longer a nice to have – you need big data to compete
Google pioneered MapReduce processing on commodity hardware and used that to catapult themselves to into the leading search engine even though they were 19th in the market
Yahoo! Leveraged these ideas to create Hadoop to keep up with Google and many mainstream companies have followed with new data-driven applications such as “people you may know” (started by LinkedIn and now used by Facebook, Twitter, and every social application), product recommendation engines, contextual and personalized music services (beats), measuring digital media effectiveness (comScore), serving more relevant/targeted ads(Comcast, rubicon project), fraud and risk detection, healthcare efficacy, and more
What makes the difference? A lot of attention is given to data science and developing sophisticated new algorithms, but in many cases just having more data beats better algorithms. (make point on collecting more consumer interaction as well as transaction data, as an example).
In addition, competitive advantage is decided by very small percentages. Just 1% improvement in fraud can mean hundreds $millions in savings. A ½% lift in advertising effectiveness means millions in new product sales and profitability. The same can be applied to customer churn, disease diagnosis, and more.
A second trend in enterprise architecture has been big data overwhelming the existing workload-specific systems which are in production. (list of requirements for each of these on the side in text)
People started with mainframes or operational systems which run ERP, finance, CRM and other mission-critical applications. They require… (pick out attributes you want to stress on the left)
You also have data warehouses, marts, data mining, and other analytical systems which pull data from these operational and other systems for providing insights to the business for decision making
The amount/variety of data has been overloading these systems. You reach a certain point as you try to ingest new types of data when these systems are not cost-effective to scale to terabytes or petabytes of data
The first reality is that as people put Hadoop into production, to relieve the pressure from other systems in their enterprise architecture it needs to reliable . Hadoop needs to be held to the same enterprise standards as your Oracle, SAP, Teradata, NetApp storage, or any other enterprise system.
Many organizations are putting Hadoop into their data center to provide (list of use cases underneath) … it can do all of this and more, but
For Hadoop to act as a system of record , it must provide the same guarantees for SLA’s, performance, data protection, and more
Most importantly, Hadoop has the potential for both analytics AND operations. It can be used to optimize the data warehouse provide batch data refining or storage. But Hadoop can provide many operational analytics or database operations/jobs when done right.
Choosing the right big data architecture is critical for success with your Hadoop projects and business applications
One analogy is building a sky scraper. Before you can start building up, you have to lay a rock-solid foundation.
This building is the new Wilshire Grand project in Los Angeles. In Feb of this year they set a Guinness World Record for pouring a 21,000 cubic yard (16,000 cubic meters) foundation over 26 hours (http://www.theguardian.com/cities/2014/feb/14/world-largest-concrete-pour-la-trucks-los-angeles)
When completed in 2017, the building will be the tallest in the US outside of NY and Chicago.
This analogy applies as well to building a data platform – you have to architect for the future. This allows you to build higher, stronger, and faster, without retrofitting later down the road (anyone who has added a second story to their house can attest to the additional cost and construction delays if you have to reinforce a foundation which wasn’t designed to hold the stress)
For business-critical applications you must have data protection and security (availability, data protection, and recovery), high performance (with random read-write system), multi-tenancy (to support multiple business units, isolate applications or user data,…), provide good resource and workload management to support multiple applications, and open standards to integrate with the rest of the enterprise data architecture
This data foundation allows you to support new data-driven applications (both operational and analytical) , maintain service level agreements with the business, provide information you can trust and count on being there when you need it, and ultimately being the best TCO for the long-run. Supporting enterprise systems without retrofits or multiple clusters to work around platform deficiencies (e.g., to support operational/online applications in Hadoop today, you need a separate HBase cluster – separate from the rest of your Hadoop cluster/investment)
The power of MapR begins with the power of open source innovation and community participation.
In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop)
In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management.
MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
The power of MapR begins with the power of open source innovation and community participation.
In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop)
In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management.
MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
The MapR distribution for Hadoop is globally recognized as the technology leader
Forrester published a Wave for Big Data Hadoop Solutions where it placed MapR as the highest ranking product based on current offering as well as roadmap.
Cloud: MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advantages of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers.
MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine.