3. Big Data Challenges
This “Big Data” usually has one
or more of the following
characteristics:
Very Large Data Volumes –
Measured in terabytes or
petabytes
Variety – Structured and
Unstructured Data
High Velocity – Rapidly
Changing Data
4. Abstract.
The challenge of building consistent, available, and
scalable data management systems capable of serving
petabytes of data for millions of users as large internet
enterprises.
Though scalable data management has been a vision for
more than three decades and much research has focussed
on large scale data management in traditional enterprise
setting, cloud computing brings its own set of novel
challenges that must be addressed to ensure the success
of data management solutions in the cloud environment
in future.
5. Scalable Data Management
• Scalable database management systems
(DBMS)—both
• For update intensive application workloads as
well as decision support systems
• For descriptive and deep analytics—are a critical
part of the cloud infrastructure
• and play an important role in ensuring the
smooth transition of applications from the
traditional enterprise infrastructures to next
generation cloud infrastructures.
6. Different class of scalable
data management
Google’s
Bigtable [5], PNUTS [6] from
Yahoo!, Amazon’s Dynamo [7]
and other similar but
undocumented systems. All of
these systems deal with
petabytes of data, serve online
requests with stringent latency
and availability requirements,
accommodate erratic
workloads, and run on cluster
computing architectures; staking
claims to the territories
used to be occupied by database
systems.
8. Cloud Computing
• Cloud computing is an extremely successful
paradigm of service oriented computing, and has
revolutionized the way computing infrastructure
is abstracted and used. Three most popular cloud
paradigms include:
• Infrastructure as a Service (IaaS),
• Platform as a Service (PaaS), and
• Software as a Service (SaaS).
• The concept however can also be extended to
Database as a Service or Storage as a Service.
11. Cloud Database System
• a system in the cloud must posses some features to
be able to effectively utilize the cloud economies.
These cloud features include:
• scalability
• elasticity
• fault-tolerance
• self-manageability
• ability to run on commodity hardware.
Most traditional relational database systems were designed
for enterprise infrastructures and hence were not designed to
meet all these goals. This calls for novel data management
systems for cloud infrastructures.
12. Scalable database system to supports large application with
lots of data and supporting hundreds of thousands of clients
13. Large data analysis tools
• Hadoop
• MapReduce
Open source tools like Hadoop and MapReduce
provide tremendous data processing power to big
data.
15. Large multitenant system
• Another important domain for data
management in the cloud is the need to
support large number of applications. This is
referred to as a large multitenant system.
18. REFERENCES
•
• [1] An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
• http://www.cs.uwaterloo.ca/~kmsalem/courses/CS848W10/presentations/Chalamalla-HadoopDB.pdf
•
• [2] Divyakant Agrawal Sudipto Das Amr El Abbadi
• Department of Computer Science
• University of California, Santa Barbara
• Santa Barbara, CA 93106-5110, USA
• http://www.edbt.org/Proceedings/2011-Uppsala/papers/edbt/a50-agrawal.pdf
•
• [3] Big Data Challenge: Future is Right Here!
• http://www.e-zest.net/blog/big-data-challenge-future-is-right-here/
•
• [4] Hadoop and Big Data challenge
• http://www.apexcloud.com/blog/category/big-data
•
• [5] Big data and cloud computing: New wine or just new bottles?
• http://www.vldb2010.org/proceedings/files/papers/T02.pdf
•
• [6] C. Curino, E. Jones, Y. Zhang, E. Wu, and S. Madden. Relational
• Cloud: The Case for a Database Service. Technical Report 2010-14,
• CSAIL, MIT, 2010. http://hdl.handle.net/1721.1/52606.
•
• [7] D. Agrawal, A. El Abbadi, S. Antony, and S. Das. Data Management
• Challenges in Cloud Computing Infrastructures.
• http://www.just.edu.jo/~amerb/teaching/2-11-12/cs728/paper1.pdf
•
• [8] Bigtable: A Distributed Storage System for Structured Data.
• http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf
•
• [9] J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton.
• Mad skills: New analysis practices for big data. PVLDB,
• http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf
•
• [10] Divyakant Agrawal Sudipto Das Amr El Abbadi
• Department of Computer Science
• University of California, Santa Barbara
• Santa Barbara, CA 93106-5110, USA
• {agrawal, sudipto, amr}@cs.ucsb.edu
19. REFERENCES
• [11] Scalable Data Management in the Cloud: Research Challenges
• http://cse.vnit.ac.in/comad2010/pdf/Keynotes/Key%20Note%202.pdf
•
•
• [12] C. Curino, E. Jones, Y. Zhang, E. Wu, and S. Madden. Relational
• Cloud: The Case for a Database Service. Technical Report 2010-14,
• CSAIL, MIT, 2010. http://hdl.handle.net/1721.1/52606.
•
• [13] S. Das, D. Agrawal, and A. El Abbadi. ElasTraS: An Elastic
• Transactional Data Store in the Cloud. In USENIX HotCloud, 2009.
• http://www.cs.ucsb.edu/~sudipto/diss/Dissertation_Single.pdf
•
• [14] S. Das, D. Agrawal, and A. El Abbadi. G-Store: A Scalable Data
• Store for Transactional Multi key Access in the Cloud. In ACM
• SOCC, 2010.
•
• [15] D. Jacobs and S. Aulbach. Ruminations on multi-tenant databases. In
• BTW, pages 514–521, 2007.
• http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf
•
• [16] T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann. Consistency
• Rationing in the Cloud: Pay only when it matters. PVLDB,
• http://www.vldb.org/pvldb/2/vldb09-759.pdf
•
• [17] C. D. Weissman and S. Bobrowski. The design of the force.com
• multitenant internet application development platform. In SIGMOD,
• http://cloud.pubs.dbs.uni-leipzig.de/sites/cloud.pubs.dbs.uni-leipzig.de/files/p889-weissman-1.pdf
•
• [18] F. Yang, J. Shanmugasundaram, and R. Yerneni. A scalable data
• platform for a large number of small applications. In CIDR, 2009.
• https://database.cs.wisc.edu/cidr/cidr2009/Paper_17.pdf
•
• [19] S. Das, S. Agarwal, D. Agrawal, and A. El Abbadi. ElasTraS: An
• Elastic, Scalable, and Self Managing Transactional Database for the
• Cloud. Technical Report 2010-04, CS, UCSB, 2010.
•
• [20] S. Das, S. Nishimura, D. Agrawal, and A. El Abbadi. Live Database
• Migration for Elasticity in a Multitenant Database for Cloud
• Platforms. Technical Report 2010-09, CS, UCSB, 2010.
•