Healthcare Considerations for Modern Data Architectures: As modern healthcare creates more and more data, mining and using this data becomes critical for data-driven decision making, offering the potential for drastically improved health outcomes. However, the explosion of data sources and applications create challenges in how you architect solutions that not only are reliable, but also secure and compliant with HIPAA regulations.
Moving from a traditional single database to distributed data stores with semi-structured or unstructured data involves careful consideration. Which data store to use can’t be the only consideration. Where to operate this data store, its ability to support encryption technologies, to be deployed in a distributed architecture, to create auditable transaction logs, and its scalability and resiliency all must be considered. Not all data store technologies will fit the bill, nor will all deployment methods – is cloud better or on-premise? Which cloud providers can support all my requirements?
Once a well thought out data architecture is established, security must also be addressed, right up front in the design phase. Security best practices need to be followed to ensure sensitive data is protected properly, as it is gathered, as it is transmitted, and while it is stored and analyzed.
Distributed data can be a great fit for modern cloud delivered services, but cloud-based applications also require a different approach than traditional applications. Cloud technologies are built to support automated workflows, and automation tools and run books are key to ensuring consistency and reliability of your data and operations, but also to maintain compliance with regulations throughout the lifecycle of your applications.
All of these new technologies and approaches to data management are rife with pitfalls and challenges, but there IS hope! Understanding these issues and the best practices associated with each consideration can set you up for success.
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Data Day Health IT - Data Architecture
1. Healthcare Considerations for Modern Data
Architectures
Pitfalls, Challenges and Best Practices
Data Day Health 2017
Presented by:
Toby Owen,VP Product Development
2. OnRamp
- Industry leading high security and hybrid hosting provider
- Operates multiple enterprise class data centers located in Austin,
Texas and Raleigh, North Carolina
- SSAE 16 SOC II and SOC 3 Audited, PCI and HIPAA compliant
company
- Specializes in helping organizations meet their rigorous
compliance requirement and keep their data safe
Toby Owen
- Vice President, Product Development, OnRamp
- 20 year IT veteran with operations and engineering
background
- Security, IT ops at scale, hybrid cloud, compliant
workload hosting
3.
4. AGENDA
GOAL: Designing an app for Healthcare… that’s compliant!
Data Stores
App Design
Where to Run It
Dev Lifecycle
Takeaways
Q & A
5. Refresher on (or intro to) databases
CAP theorem
C = Consistency
A = Availability
P = PartitionTolerance
6. Database ReferenceGuide – at a glance
*Adapted from http://blog.nahurst.com/visual-guide-to-nosql-systems
7. Why do we care?
• Scaling vertically versus horizontally
- Costs of scaling up can grow exponentially
- Scaling horizontally is linear
- Limits to scaling vertically, “indefinite” horizontal
scale limit
• Data sources are increasingly distributed
• Horizontal scaling provides better geo-
resiliency at the same time
• Not all data needs strict ACID compliance
More arguments favor distributed data stores
9. Is scalability and ACID a false tradeoff?
• Scalability and ACID are difficult to satisfy at the same time
• Not all data requires strict ACID compliance
• Relational can be a bottleneck
- Simpler models might simplify operations – easier and more efficient
• New relational DBs can be very fast AND scalable
• Many NoSQL DB’s adding features to look more like RDBMS
• Take-away: understand your data (shape and use case) and pick the
right solution
10. NoSQL and BASE
• NoSQL Definition
- SOME of the following: non-relational, distributed, open-source, horizontally
scalable, schema free, easy replication support, simple API
• BASE Definition: Basically Available, Soft state, Eventual consistency
- All data reads will eventually yield the same result
• Favors Availability over Consistency
• Let’s focus some time here exploring NoSQL databases/datastores
- Considerations based on scalability, encryption and key management
11. • Document oriented Database (JSON). Considered “semi-structured” data
• Scalability - built in via automatic sharding (range, hash, zone)
- EA FIFA game (250+ servers),Yandex (10’s billion objects,TBs of data, growing at 10MM files uploads/day)
• Security – encryption in-transit
- SSL/TLS client support (data in-transit)
- MongoDB Enterprise Advanced supports FIPS 140-2
- Atlas (Mongo-aaS on Amazon) does NOT support FIPS mode
• Security – encryption at-rest
- App level, external filesystem, disk level, or natively (encrypted storage engine). Native suports FIOPS 140-2
• Security – key management
- Each DB has a separate Key
- Can be integrated with external KMS
- Supports key rotation without downtime (via rolling restarts of replica set)
- Native encryption is only available via Enterprise Advanced version!
12. • Row-oriented
• Scalability – peer-to-peer distributed system, data across all nodes
- Each node contains commit log, exchanges data across cluster every second
- All writes are automatically partitioned and replicated throughout cluster
- Apple (75,000 nodes, 10PB); Netflix (2,500 nodes, 420TB, 1 trillion requests/day)
• Security – encryption in-transit
- SupportsTLS/SSL, separate configs for client-server and server-server
- FIPS compliance supported
• Security – encryption at-rest
- Open-sourceCassandra relies on filesystem encryption
- Datastax (commercial version) supports at-rest encryption
• Security – key management
- Open-sourceCassandra relies on filesystem encryption’s key management tools (can be complex)
- Datastax (commercial version) has native KMIP support
13. • Not really a database – distributed filesystem (HDFS) plus application interface (MapReduce)
• Scalability – designed for large file distribution across 100’s and 1000’s of servers, streaming
access and large data sets
- (compute cheaper to move than data)
- Facebook (21PB, 2000 machines), Spotify (1300 nodes, 42PB storage, 20TB a day ingested, 200TB a day
generated by Hadoop)
• Security – encryption in-transit
- HDFS supports transparent encryption
• Security – encryption at-rest
- Supported by HDFS, application, database, or disk-level
- Lots of options for commercial support and tools to simplify management
• Security – key management
- Natively supports it’s own KMS
- Again, more commercial options exist to simplify
14. LOTS of others
• KeyValue
- Redis
- DynamoDB
• Document Oriented
- CouchDB
- DocumentDB
• Time Series
• Graph
• + 225 more! (nosql-database.org for basic info and comparisons)
15. So you’ve chosen your datastore(s)
Now what?
Application architecture!
16. Application design
SOME Considerations for HIPAA and HITECH
• HITECH – each app zone requires firewall isolation
- Web, app, database
• Key Management
- Key Management System (KMS)
- Hardware Security Module (HSM)
- Keys database
- Key splitting – for transferring clear-text cipher keys
18. And more
• Many other security considerations around compliant application
architecture
- Shared storage resources and shared IaaS
Supporting encryption at-rest may not be enough to achieve HIPAA or HITRUST
compliance.
- Verifiable (compliant) destruction of data in a shared environment
- Encryption keys need to be managed in accordance with shared secrets or
‘key splitting’ schemes (e.g. Shamir’s secret sharing)
19. Next?
We’ve chosen the right datastores…
We’ve designed our application to
support HITRUST or HIPAA…
Where will the app run?
20. Hybrid is the likely reality
• Consuming 3rd party data sources
• Capabilities of each data or app
component provider
• BAA with each provider
• Peril of failing to plan
21. How to keep all this compliant?
• Lots to consider to get it right
• Start at the beginning – your development
lifecycle
• Automate everything
• Dev/Test/Staging/Production should all account
for secure design
• Use Containers ?
• Maybe get some help
22. KeyTakeaways
• Distributed data is becoming the new norm
• Data is different – data usage should dictate data technology
- (no one-size-fits-all)
• ApplicationArchitecture is key to achieving compliance
• Must consider all locations where app is running
• Consider compliance in all phases of app development (starting with
design)
• Automation in development pipeline is key to building-in and maintaining
compliance throughout app lifecycle
• Final consideration – are you now a service provider?
23.
24. Toby Owen
VP, Product Development
OnRamp
towen@onr.com
@tobydowen
linkedin.com/in/tobyowen
C= Consistency – a read request gets the most recent write (or an error)
A=Availability – Every request receives a response, which may or may not be the most recent version of the record
P=Partition Tolerance – system operates despite a network partition (messages being dropped or delayed)
In the presence of a network partition, one has to choose between consistency and availability.
a transaction is all or nothing, DB stays in a valid state, transactions are separate and non-interfering, and commits stay committed
Lots of databases are built on top of Hadoop, or are building integrated connectivity to hadoop
Key Management System (KMS) – software used to store and provide access to encryption keys. Utilizes Local Master Key to encrypt keys stored locally. Must be under dual control.
Hardware Security Module (HSM) – tamper-resistant hardware specialized for key management – generation, export, ciphering and storage of encryption keys. The HSM must be at a minimum certified and configured at FIPS 140-2 Level3.
Keys database – key management database - must be encrypted and able to support all attributes needed for the key lifecycle management. Databases managing Live and Test keys must be physically separate and utilize different HSMs.
Key splitting – usually in 3 parts, each part has it’s own custodian (on both sides of the transaction)
Shared storage resources that support encryption at rest may not be enough to support claims of HIPAA or HITRUST compliance. Shared IaaS means that the traditional method of managing data remanence through storage media destruction is not possible. Highly efficient storage arrays write data across many disks.
Verifiable (compliant) destruction of data in a shared environment requires that each tenant of a shared storage resource be utilizing unique encryption keys.
Furthermore, encryption keys need to be managed in accordance with shared secrets or ‘key splitting’ schemes in order to meet HITRUST requirements (e.g. Sharmir’s secret sharing). Ex: https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing
Likely you are consuming and/or writing to external (legacy) data sources that you don’t control – means you will be hybrid
All of the above considerations need to be considered against each location where these are hosted
You will need a BAA with each provider
Example – you can write an app in dev on a single AMI, then you add a few data sources based on new requirements, then you need to scale to multiple compute instances. Is it auditable? You are already broken – you need to rewrite the app stack to account for proxied DB connections, etc.
Consider your providers capabilities: HSM, KMS, encryption (fips certified?), key management, auditability, IAM, interoperability of all of the above
Ignore these considerations at your peril – not just an app rewrite, you may need to replatform or migrate
Attempting to bolt on component after component on top of public cloud resources becomes a very costly proposition. Add to that the management of audit activities related to the deployed infrastructure and you’ve pretty well sapped all you investment dollars in just purchasing a ticket to the dance.
Retrofitting IT environment to address this LATER is a problem – need to design in from the start.
App tier and DB tier separate – changes to how data is retrieved can be a big lift once you are in production
Consider from the design phase, include compliance requirements in your entire dev lifecycle tool chain and process
Automation gets you repeatability and auditability, removes human error – also forces understanding and analysis on every process
Containers – allow you portability and atomic version control = predictability and audit trail, versioning
The agile healthcare SaaS provider needs a managed IT partner who knows more than they do about how to run compliant infrastructure and can expose easy to consume services that accelerate scale, not hamstring it with regulatory process.
am I consuming a shared resource?
am I providing a shared resource?