A mind-bending way of dealing with time syncing when aggregating data from many disparate sources. Talk by Jasmine Tsai and Alyssa Kwan, Clover Health. To hear about future conferences go to http://dataengconf.com
3. Time | tīm |
noun
The indefinite
continued progression
of existence and events
that occur in apparently
irreversible succession
from the past through
the present to the future.
13. Typical lifecycle of data
App
1459671246251
User clicks on a button
1459671246251
App
1459671246251
User clicks on a button
1459671246251
Life event = publish event
14. Clover’s lifecycle of data
Clover
Clover publishes claim data
in our DWH and knows a
member went to the doctor on
Jan 10, 2016 as of Apr 10,
2016
Member goes to
doctor
(Jan 10, 2016)
Billing enters
claims
(Apr 1, 2016)
Transaction clearinghouse
systems and third-party
claims processor
Data entry human at
claims processor
Our pipelines
Happy path
What did Clover really
know about someone’s
health event that
happened on Jan 10,
2016 as of April 2016 ,
May 2016, vs Jun
2016? What did the
claims processor
know?
Oops, processed
the claim wrong
(restatement Apr
11, 2016)
Oops, there was
a data entry error
(restatement Jun
11, 2016)
Unreliable path
Oops, the
pipeline is
broken
(breakage
Jun 12, 2016)
28. Footer
• Uniform treatment of event logs and snapshots
• Reproduce event and snapshot views from one structure
• Relatively simpler data access patterns
How this helps us
31. Footer
Why we use relational (PostgreSQL)
• Industry standard
• Wide adoption
• Robust
• Approachable
• Not constrained by scale
• Distributed / sharding
• Transactions!
• Global clock!
PostgreSQL
• Not limited to scalar types
• GiST indexes!
• Exclusion constraints!
32. Footer
An example of bitemporal merge in SQL
INSERT table
SELECT id id
, LOWER(publish_tr) publish_tb
, TSRANGE(LOWER(publish_tr), `publish_ts`, '[)') publish_tr
, effective_tr effective_tr
, state state
FROM table
WHERE id = `id`
AND publish_tr @> `publish_ts`
UNION ALL
SELECT `id` id
, `publish_ts` publish_tb
, TSRANGE(`publish_ts`, NULL, '[)') publish_tr
, TSRANGE(`effective_tb`,`effective_te`,'[)') effective_tr
, state state
ON CONFLICT UPDATE
SET publish_tr = publish_tr
, effective_tr = effective_tr
, state = state
35. Abstracting that away Alembic
@Operations.register_operation('create_bitemporal_table')
class CreateBitemporalTableOp(MigrateOperation):
"""Create a bitemporal src table”""
identities = identities or []
identity_constraints = [(expr, '=') for identity, expr in identities.items()]
additional_exclusions = additional_exclusions or []
exclusion_contraints = identity_constraints + additional_exclusions
exclusion = sa.dialects.postgresql.ExcludeConstraint(
('published_as_of', '&&'),
('{}'.format(self.as_on_name), '&&'),
*exclusion_contraints)
current_publish_ixes = []
current_publish_current_as_on_ixes = []
…..
36. Temporality as a concept
import sqlalchemy as sa
import clover_web.models.temporal as temporal
@temporal.add_clock('prop_a', 'prop_b')
class MyModel(temporal.Clocked, SomeBase):
prop_a = sa.Column(sa.Integer)
prop_b = sa.Column(sa.Text)
prop_a_hm = temporal.get_history_model(MyModel.prop_a)
37. Temporality as a concept
effective/valid
published
S3 archives/versions
38. Using the time machine
What was member’s status according to the claims processor on Dec 1, 2015?
What was member’s status according to us on Dec 1, 2015?
What is the member’s current full effective history?
What is our latest understanding of the member’s status according to the claims
processor?
40. Footer
• How do we know if a call queue campaign was successful?
• How do we know how and where to deploy our nurses?
• How do we know what impact a certain data integration will have on
understanding the risk profile of our members?
Making meaningful decisions about health outcomes
41. Footer
• Richard Snodgrass (http://www.cs.arizona.edu/~rts/publications.html)
• Developing Time-Oriented Databases in SQL
Further resources