For most analysts, the pace of analytics and data science can be frustrating. The common waterfall approach works well for the fixed reports, but it can be a lengthy process to request additional data sets, create new reports, or serve new use cases. So it’s no surprise that organizations are looking to shift towards a self-service model, empowering business users to discover and iterate quickly.
However, it’s not just about opening up this access, but also ensuring the results are accurate and trusted. When there are petabytes of data, how does a user know which tables to use and which are most relevant? How do you strike the balance between discovery and agility, while still meeting enterprise governance standards to truly get more value from your data?
During this webinar, you’ll learn how to empower end-users to make self-service BI a reality within your organization while fostering governance collaboration between all data stakeholders. We’ll discuss and demo:
Strategies of consolidating data across silos for fast, flexible access
Enabling easy discovery and exploration, including understanding which data to trust and where to start
New capabilities for intelligent query assistance as well as immediate performance optimizations and recommendations as-you-go
Collaboration and access outside of just SQL for data science and beyond
In addition, we will walk through best practices and considerations when developing your organizational strategy around self-service analytics, and highlight several real-world success stories from a wide range of industries.
3 things to learn:
Strategies of consolidating data across silos for fast, flexible access
Enabling easy discovery and exploration, including understanding which data to trust and where to start
New capabilities for intelligent query assistance as well as immediate performance optimizations and recommendations as-you-go
Created R&D Information Platform (RDIP)
Consolidate >8PB of data (structured and unstructured) from 2100 disparate databases
38 use cases initially and now 200+ across 10 domains
Results: First time metrics and monitoring on compliance data; reduce time and cost of identifying diverse groups for clinical trials; accelerate new drug development
Platform for self-service discovery, analysis, and trusted reporting, accessible by all 8,700 users
70% managers/execs (dashboards); 25% bench scientists (basic analytic skills); 5% data scientists (deep technical analytic skills)
Consistent, shared data via preferred BI tools (Spotfire, Excel, Zoomdata), with advanced analytic capabilities (joins, hierarchies, multi-dimensional views) and interactive query speeds
Balance strategic/curated usage with agile discovery
HIPAA compliant
Notes: This slide was created based on the DIA APPLICATION submitted in 2017.
Company Background:
GlaxoSmithKline (GSK) is a global pharmaceutical company with commercial operations in more than 150 countries, a network of 87 manufacturing sites, and R&D centers in the United Kingdom, the United States, Belgium, and China.
Use Case:
It can take from six to 12 years to conduct all the steps necessary—from research and testing to clinical trials and regulatory approvals—to bring a new drug or vaccine to market. Once a new product goes to market, pharmaceutical companies have a small window of opportunity to recoup development costs before their patent expires. Adding to the challenge, the cost to produce drugs has remained static in recent years, leading to a considerable reduction in profitability.
To combat these pressures, GSK sought to transform how data is used across Research & Development (R&D). With data in silos, it was difficult to take what researchers learned across the R&D pipeline and build on it. To gain the new levels of efficiency and insight it needed to reduce costs and speed development, GSK had to create a platform that would ingest all unstructured and structured R&D data, and deliver greater analytic capabilities.
The GSK R&D Information Platform uses Cloudera, partner technologies, and homegrown tools to deliver a holistic view of all data within R&D and give researchers an immense analytic advantage. For example, It previously could take months to run analysis across a collection of clinical trials. Today, researchers can complete the analysis in minutes. Months to minutesdrives significant business value.
The platform combines more than five petabytes (PBs) across 10 different data domains, including discovery, clinical, genomics, regulatory, safety, and commercial data, and more than 2,100 silos.
With privacy and security of vital importance in the healthcare industry, GSK needed to confirm that the platform addressed rigorous industry and internal standards, including the Health Insurance Portability and Accountability Act (HIPAA). By leveraging the Cloudera SDX capabilities, GSK can manage all the metadata and policy information in a centralized fashion.
With its new platform, GSK researchers are gaining insights that help streamline every aspect of the R&D process. For example, Scientists can see exactly how long it takes to design, scale and test a potential molecule. This information is critical in helping accelerate the development of new drugs. GSK researchers can also perform association analysis on public genomic data spanning 500,000 people--work that was impractical on its legacy platform. Clinical trial teams can reduce the time and cost of identifying the optimal mix of participants for clinical trials by harnessing the breadth of data and analytics capabilities. And compliance teams have reduced the time and cost of compiling data for regulatory and compliance activities, with access to all required compliance metrics from across the organization.
As GSK achieves greater efficiency and new insights across its many R&D processes, executives expect to ultimately move the needle in terms of time-to-market, bringing new drugs and vaccines to market more quickly and less expensively to help patients.
Data sources:
● 2,100 databases spanning discovery, clinical, genomics, regulatory, safety, and commercial data
Solution
● Modern Data Platform: Cloudera Enterprise
● Workloads: Analytic Database, Data Science & Engineering
● Key Components: Apache HBase, Apache Impala (incubating), Apache Sentry, Apache Spark, Cloudera Navigator, Cloudera Search, Kerberos
● Data Science Tools: Anaconda Enterprise Notebooks by Continuum Analytics, Anaconda Scale by Continuum Analytics, Anaconda Package Manager by Continuum Analytics, Anaconda ● Accelerate by Continuum Analytics, IPython, JupyterHub, RStudio
● Databases: MongoDB
● BI & Analytics Tools: AtScale, Kinetica, SpotFire, Trifacta, Zoomdata
● Data Acquisition and Curation Tools: StreamSets, Tamr
● Implementation Partner: Cloudwick
Known Data/Known Business Questions — This represents the foundational core of the business. The business problems are known and the sources of data that support their resolution are well-established.
Unknown Data/Unknown Business Questions — This is the realm of exploration and discovery. The business value of the data, and therefore the questions it can answer, has yet to be established.
Unknown Data/Known Business Questions — In this sector, the value of data has partially been established — in that there is an accepted understanding of the questions it can answer. However, the final sources of insight to support these questions have not yet been established, including the potential for a wide range of emerging external/exogenous data sources (such as open data, data brokers, social data, personal data, and so on). This is where value is established.
Known Data/Unknown Business Questions — This sector represents further exploration of known data — What other uses can this data support?