BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

•Transferir como PPTX, PDF•

0 gostou•3,239 visualizações

High-level use case description of one department of a hospital, and comparisons of two solutions : 1) Big data solution using Cloudera Impala; and 2) Traditional RDBMS solution using Oracle DB.

Tecnologia

EndoMine System
Jewish General Hospital

by David Lauzon
and Anton Zakharov
Big Data Montreal #9
February 5th 2013 1 / 18

Presentation

• Our Objectives
• Requirements and context
• Project scope
• Hadoop Solution
– Big Data Solution Overview
– Hive Table Schema
– Compression Performance
– Data Architecture in Hadoop
– Hadoop/Impala Prototype Demo
• Oracle Solution
• Hadoop vs Oracle comparison
• What are expensive queries?

2 / 18

Our Objectives

• Lead an end-of-study project in an
industrial context
– Requirements elicitation
– Implement a « proof-of-concept » prototype

• Experiment with big data technologies
– Compare with RDBMS

3 / 18

Requirements and context

• Department of Medical Diagnostic
(medical test results DB, e.g. blood, urine, ...)
– Dr. Shaun Eintracht
• « ad hoc » Query
• ETL Query
– Dr. Elizabeth Mac Namara
• « business intelligence » requirements
• Realtime Dashboard

• Department of Endocrinology
– Dr. Mark Trifiro
• Data mining

4 / 18

Project scope

• First iteration = improve ad-hoc queries
– Slow analytical queries and ETL (MS Access)
– Risk of « crashing » production DB
– Some queries impossible to process

5 / 18

Solutions

• Solution 1 : Hadoop + Impala

• Solution 2 : Tune the existing Oracle RDBMS

7 / 18

Compression Performance

250

200

150
Impala
100 Hive
Oracle
50

0
Oracle FS Text File Sequence SeqFile + SeqFile +
File Gzip Snappy

10 / 18

Data Architecture in Hadoop

• All big tables are pre-joined
– With specimen (1)
– Without specimen (2)
• Partitioned using two schemes
– Year-month (3)
– Year and Test (4)
• 4 different versions of the same data:
– stay_order_results_yearmonth
– stay_order_results_year_and_test
– stay_order_results_specimen_yearmonth
– stay_order_results_specimen_year_and_test

11 / 18

Oracle Solution

• Same tables as source DB
– A big pre-joined table is not a good solution
• Techniques explored :
– Partitioning
• Partitions automatically created
– Compression
• Inefficient for joins
– Clustering
– Join multiple partitioned tables

13 / 18

Oracle Solution (continued)

• Avoid too many indexes on the big tables:
– Takes a lot of memory
– Slow to create
– May not be used if query use more than 5% of the
rows

14 / 18

Comparison: Hadoop Solution

• Pro
– Crunch massive amount of data
– Scalability
– Free software
• Cons
– Needs better UI and tune-ups
– Maintenance cost
– Require ETL time to merge data into one table
– BIG Joins should be avoided

15 / 18

Comparison: Oracle Solution

• Pro
– Just need to create a slave DB (just?)
– Faster random-lookup
– Easier to find expertise
• Cons
– Scalability up to a certain point..
– Synchronisation with master DB:
• Rebuilding indexes would take hours

16 / 18

What are expensive queries?

• If possible, avoid these constructs on
large result sets
– SELECT DISTINCT
– ORDER BY
– GROUP BY
– JOIN big table with another big table
• JOIN big table with multiple small tables should be OK

17 / 18

Conclusion

• Recommendation to use a “classic” RDBMS
– The database fit on a single-node
– Existing expertise in-house
– Acceptable performance with appropriate
tune-ups
– Stop using MS Access
• Disadvantage : limited scalability

18 / 18

Mais conteúdo relacionado

Mais procurados

Column Stores and Google BigQuery

Csaba Toth

From Raw Data to Analytics with No ETL

Cloudera, Inc.

Hadoop Architecture Options for Existing Enterprise DataWarehouse

Asis Mohanty

Big Data and Hadoop Ecosystem

Rajkumar Singh

Big Data in the Real World

Mark Kromer

SQL, NoSQL, BigData in Data Architecture

Venu Anuganti

ETL Practices for Better or Worse

Eric Sun

Hadoop and IDW - When_to_use_which

Dan TheMan

Optiq: A dynamic data management framework

Julian Hyde

NoSQL Needs SomeSQL

DataWorks Summit

Deep learning has become widespread as frameworks such as TensorFlow and PyTorch have made it easy to onboard machine learning applications. However, while it is easy to start developing with these frameworks on your local developer machine, scaling up a model to run on a cluster and train on huge datasets is still challenging. Code and dependencies have to be copied to every machine and defining the cluster configurations is tedious and error-prone. In addition, troubleshooting errors and aggregating logs is difficult. Ad-hoc solutions also lack resource guarantees, isolation from other jobs, and fault tolerance. To solve these problems and make scaling deep learning easy, we have made several enhancements to Hadoop and built an open-source deep learning platform called TonY. In this talk, Anthony and Keqiu will discuss new Hadoop features useful for deep learning, such as GPU resource support, and deep dive into TonY, which lets you run deep learning programs natively on Hadoop. We will discuss TonY's architecture and how it allows users to manage their deep learning jobs, acting as a portal from which to launch notebooks, monitor jobs, and visualize training results.

Scaling Deep Learning on Hadoop at LinkedIn

DataWorks Summit

Apache HBase™

Prashant Gupta

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016

Dan Lynn

Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...

Alluxio, Inc.

Teradata Partners Conference Oct 2014 Big Data Anti-Patterns

Douglas Moore

From DataEngConf 2017 - Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.

Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache

Dremio Corporation

Big data vahidamiri-tabriz-13960226-datastack.ir

datastack

Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

DataWorks Summit

Introduction To Hadoop Ecosystem

InSemble

Devon Energy is a Fortune 500 company focused on unconventional upstream oil and gas production. With a companywide focus on innovation and data-driven decision making, IT has been challenged to make more data available to more people more quickly. To this end, we have leveraged the scale of Microsoft Azure and Databricks’ Unified Analytics Platform to help reimagine our integration, data warehousing and analytics landscape to improve agility while moving our workloads to the cloud. We are in the third year of this transformation and have lessons learned around improving the testability of data pipelines, code management, model training and deployment, promotion, and user empowerment. In this talk, we will share our experience managing the lifecycle of data engineering and machine learning solutions and striking the balance between agility and reliability in a single platform, while democratizing data access to users from all disciplines across the company. Author: Paul Bruffett

Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...

Databricks

Mais procurados (20)

Column Stores and Google BigQuery

From Raw Data to Analytics with No ETL

Hadoop Architecture Options for Existing Enterprise DataWarehouse

Big Data and Hadoop Ecosystem

Big Data in the Real World

SQL, NoSQL, BigData in Data Architecture

ETL Practices for Better or Worse

Hadoop and IDW - When_to_use_which

Optiq: A dynamic data management framework

NoSQL Needs SomeSQL

Scaling Deep Learning on Hadoop at LinkedIn

Apache HBase™

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016

Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...

Teradata Partners Conference Oct 2014 Big Data Anti-Patterns

Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache

Big data vahidamiri-tabriz-13960226-datastack.ir

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Introduction To Hadoop Ecosystem

Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...

Destaque

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...

Mark Rittman

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...

Mark Rittman

Oracle big data appliance and solutions

solarisyougood

Extending Hortonworks with Oracle's Big Data Platform

DataWorks Summit/Hadoop Summit

A7 storytelling with_oracle_analytics_cloud

Dr. Wilfred Lin (Ph.D.)

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)

Rittman Analytics

Destaque (6)

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...

Oracle big data appliance and solutions

Extending Hortonworks with Oracle's Big Data Platform

A7 storytelling with_oracle_analytics_cloud

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)

Semelhante a BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...

Hadoop Data Modeling

Intro to Big Data

Big data and hadoop

Distributed Computing with Apache Hadoop: Technology Overview

Konstantin V. Shvachko

This session will cover a series of use cases where you can store your data cheaply in files and analyze the data with Apache Spark, as well as use cases where you want to store your data into a different data source to access with Spark DataFrames. Here’s an example outline of some of the topics that will be covered in the talk: Use cases to store in file systems for use with Apache Spark: - Analyzing a large set of data files. - Doing ETL of a large amount of data. - Applying Machine Learning & Data Science to a large dataset. - Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally.

Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...

Databricks

2013 year of real-time hadoop

Geoff Hendrey

Hadoop DB

Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Spark Summit EU talk by Berni Schiefer

Spark Summit

Rapid Cluster Computing with Apache Spark 2016

Zohar Elkayam

Hadoop ecosystem for health/life sciences

Uri Laserson

Hadoop, HBase, and friends are built from the ground up to support big data, but that doesn't make them easy. Just like with any other relatively new and complex technologies, there are some rough edges and growing pains to manage. We've learned some hard lessons while deploying HBase tables containing billions of rows and dozens of terabytes on OpenLogic's Hadoop infrastructure. In this webinar, Rod Cope discusses some of the "gotchas" you might run into when deploying Hadoop and HBase in your own private cloud and how to avoid them.

Top 10 lessons learned from deploying hadoop in a private cloud

Rogue Wave Software

North Bay Ruby Meetup 101911

Ines Sombra

50 Shades of SQL

DataWorks Summit

Big data and hadoop overvew

Kunal Khanna

Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs

Lucidworks

Presentation db2 best practices for optimal performance

solarisyougood

Real time hadoop + mapreduce intro

Geoff Hendrey

Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...

Rittman Analytics

Michael Ralph Stonebraker is a computer scientist specializing in database research. He is currently an adjunct professor at MIT, where he has been involved in the development of the Aurora, C-Store, H-Store, Morpheus, and SciDB systems.Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems on the market today. He is also the founder of a number of database companies, including Ingres, Illustra, Cohera, StreamBase Systems, Vertica, VoltDB, and Paradigm4. He was previously the Chief Technical Officer (CTO) of Informix & a Professor of Computer Science at University of California, Berkeley. He is also an editor for the book "Readings in Database Systems"

What Does Big Data Mean and Who Will Win

BigDataCloud

Semelhante a BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case (20)

A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...

Hadoop Data Modeling

Intro to Big Data

Big data and hadoop

Distributed Computing with Apache Hadoop: Technology Overview

Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...

2013 year of real-time hadoop

Hadoop DB

Spark Summit EU talk by Berni Schiefer

Rapid Cluster Computing with Apache Spark 2016

Hadoop ecosystem for health/life sciences

Top 10 lessons learned from deploying hadoop in a private cloud

North Bay Ruby Meetup 101911

50 Shades of SQL

Big data and hadoop overvew

Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs

Presentation db2 best practices for optimal performance

Real time hadoop + mapreduce intro

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...

What Does Big Data Mean and Who Will Win

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

Six Myths about Ontologies: The Basics of Formal Ontology

johnbeverley2021

The microservices honeymoon is over. When starting a new project or revamping a legacy monolith, teams started looking for alternatives to microservices. The Modular Monolith, or 'Modulith', is an architecture that reaps the benefits of (vertical) functional decoupling without the high costs associated with separate deployments. This talk will delve into the advantages and challenges of this progressive architecture, beginning with exploring the concept of a 'module', its internal structure, public API, and inter-module communication patterns. Supported by spring-modulith, the talk provides practical guidance on addressing the main challenges of a Modultith Architecture: finding and guarding module boundaries, data decoupling, and integration module-testing. You should not miss this talk if you are a software architect or tech lead seeking practical, scalable solutions. About the author With two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Victor Rentea

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

Vector Search -An Introduction in Oracle Database 23ai.pptx

Remote DBA Services

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

DBX First Quarter 2024 Investor Presentation

Dropbox

BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

1. EndoMine System Jewish General Hospital by David Lauzon and Anton Zakharov Big Data Montreal #9 February 5th 2013 1 / 18

2. Presentation • Our Objectives • Requirements and context • Project scope • Hadoop Solution – Big Data Solution Overview – Hive Table Schema – Compression Performance – Data Architecture in Hadoop – Hadoop/Impala Prototype Demo • Oracle Solution • Hadoop vs Oracle comparison • What are expensive queries? 2 / 18

3. Our Objectives • Lead an end-of-study project in an industrial context – Requirements elicitation – Implement a « proof-of-concept » prototype • Experiment with big data technologies – Compare with RDBMS 3 / 18

4. Requirements and context • Department of Medical Diagnostic (medical test results DB, e.g. blood, urine, ...) – Dr. Shaun Eintracht • « ad hoc » Query • ETL Query – Dr. Elizabeth Mac Namara • « business intelligence » requirements • Realtime Dashboard • Department of Endocrinology – Dr. Mark Trifiro • Data mining 4 / 18

5. Project scope • First iteration = improve ad-hoc queries – Slow analytical queries and ETL (MS Access) – Risk of « crashing » production DB – Some queries impossible to process 5 / 18

6. Production DB (Oracle) 6 / 18

7. Solutions • Solution 1 : Hadoop + Impala • Solution 2 : Tune the existing Oracle RDBMS 7 / 18

8. Big Data Solution Overview 8 / 18

9. Hive Table Schema 9 / 18

10. Compression Performance 250 200 150 Impala 100 Hive Oracle 50 0 Oracle FS Text File Sequence SeqFile + SeqFile + File Gzip Snappy 10 / 18

11. Data Architecture in Hadoop • All big tables are pre-joined – With specimen (1) – Without specimen (2) • Partitioned using two schemes – Year-month (3) – Year and Test (4) • 4 different versions of the same data: – stay_order_results_yearmonth – stay_order_results_year_and_test – stay_order_results_specimen_yearmonth – stay_order_results_specimen_year_and_test 11 / 18

12. Hadoop Prototype Demo 12 / 18

13. Oracle Solution • Same tables as source DB – A big pre-joined table is not a good solution • Techniques explored : – Partitioning • Partitions automatically created – Compression • Inefficient for joins – Clustering – Join multiple partitioned tables 13 / 18

14. Oracle Solution (continued) • Avoid too many indexes on the big tables: – Takes a lot of memory – Slow to create – May not be used if query use more than 5% of the rows 14 / 18

15. Comparison: Hadoop Solution • Pro – Crunch massive amount of data – Scalability – Free software • Cons – Needs better UI and tune-ups – Maintenance cost – Require ETL time to merge data into one table – BIG Joins should be avoided 15 / 18

16. Comparison: Oracle Solution • Pro – Just need to create a slave DB (just?) – Faster random-lookup – Easier to find expertise • Cons – Scalability up to a certain point.. – Synchronisation with master DB: • Rebuilding indexes would take hours 16 / 18

17. What are expensive queries? • If possible, avoid these constructs on large result sets – SELECT DISTINCT – ORDER BY – GROUP BY – JOIN big table with another big table • JOIN big table with multiple small tables should be OK 17 / 18

18. Conclusion • Recommendation to use a “classic” RDBMS – The database fit on a single-node – Existing expertise in-house – Acceptable performance with appropriate tune-ups – Stop using MS Access • Disadvantage : limited scalability 18 / 18

Notas do Editor

ChoisirShaun : échelle plus petite, besoin immédiat, permet de tester la technologie
ChoisirShaun : échelle plus petite, besoin immédiat, permet de tester la technologie
Base de donnéescontenant les données d’ analyse de test des spécimens des patients avec les résultats.Faire des requêtes analytiques sur la base de donnée en production est très lent et peut interférer avec le fonctionnement normal avec
Base de donnéescontenant les données d’ analyse de test des spécimens des patients avec les résultats.Faire des requêtes analytiques sur la base de donnée en production est très lent et peut interférer avec le fonctionnement normal avec
NE PARLERONS PAS DE : Extraction des exigences
25% plusrapide avec compression Snappy (5.5X compression)Impala 80% plus rapidequ’Oracle
ChoisirShaun : échelle plus petite, besoin immédiat, permet de tester la technologie

BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

Semelhante a BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case (20)

Último

Último (20)

BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case

Notas do Editor