SlideShare uma empresa Scribd logo
1 de 27
Data Security Guidelines

               May 2010 – Version 1.0




Gary Waldrom
Candidate Selection

 Entity Class

 A logical model of
 an identifiable
 party
 • Each instance of an
                              Domains
   entity defined within
   the system should be
   identified and marked
   for drill down
   investigation

                              A logical structure
                              of attributes
                              represented within
                                                           Attributes
                              a single entity
                              • Each instance of a
                                domain structure as
                                listed within the
                                spreadsheet (slides 5-     Individual data fields under data type constraints and associated business
                                8) and being contained
                                within an identified       and integrity rules
                                Entity should be           • Each attribute type as listed within the spreadsheet (slides 5-8) and being contained within
                                marked for further drill     an identified domain is a candidate for data obfuscation based on the data obfuscation
                                down investigation           rules




      Data Security Guidelines 2010· Page 2
Data Sensitivity


Level 1
• Sensitivity level 1 is a unique identifier in which a party can be identified
  without further reference to other sensitive information (High Cardinality),
  all instances should be obfuscated or masked




         Level 2
         • Sensitivity level 2 is information which collectively i.e. more than 1
           instance may form a positive identification of a party, in isolation this data,
           although deemed sensitive has no direct and unique identification of the
           party however the more attributes supplied ultimately form a sensitivity
           level 1 without a level 1 being involved (Normal-Cardinality). All combined
           instances must be obfuscated




                    Level 3
                    • Sensitivity level 3 is data with a Low Cardinality ratio. All combined
                      instances should be obfuscated although individual instances will not
                      identify a party




      Data Security Guidelines 2010· Page 3
Risk of Identification of Parties




                                                        Composite
                                                       Identifiers –
•∞                                                  Sensitivity Level 2   • n + Composite
                                                                            • Multiple composites
 • High risk, these                           • n exponent                    increase identification,
   identifiers will uniquely                    • Becomes an identifier       cardinality increases as
   identify party and are                                                     instances are added
   traceable through                              as multiple instances
   various public domain                          increase cardinality,
   based systems                                  exponent based on
                                                  cardinality
                                                                                 Low Cardinality
       Unique Identifier –
                                                                                   Identifiers –
       Sensitivity Level 1
                                                                                Sensitivity Level 3




      Data Security Guidelines 2010· Page 4
Attribute Identification
Entity                      Domain               Attribute                           Data Type (Generic)        Classification
                                                 Firstname(s)                        Character                         2

                                  Name           Surnames / Family Name              Character                         2
                                                 Title / Prefix                      Denormalised: Character           3
                                                 Suffix                              Denormalised: Character           3
                                                 Salutation                          Denormalised: Character           2


                                                 House Number/Name                   Character                         2
                                                 Address Line 1                      Character                         2
                                 Address         Address Line 2                      Character                         2
                                                 Address Line 3                      Character                         2
          Client
                                                 Address Line 4                       Character                        2
                                                 State / County / Canton / Region etc Denormalised: Character          3
                                                 Zip / Post Code                     Character                         2
                                                 Country                             Denormalised: Character           3


                                                 Home Telephone Number               Character                         1
                                                 Work Telephone Number               Character                         1
                                                 Cell/Mobile Number                  Character                         1
                                 Contact
                                                 Additional Telephone Numbers        Character                         1
                                                 Email1                              Character                         1
                                                 Email2                              Character                         1
                                                 Additional Email Accts              Character                         1



         Data Security Guidelines 2010· Page 5
Attribute Identification
Entity                      Domain               Attribute                        Data Type (Generic)                    Classification
                                                 Date of Birth                    Date                                          3
                                                 Gender                           Denormalised: Character                       3
                                                 Political Persuasion             Denormalised: Character                       3
                                                 Religious or Philosophical Beliefs Denormalised: Character                     3
                                                 Sexual Persuasion                Denormalised: Character                       3
                                                 Race or Ethnic Origin            Denormalised: Character                       3
                                                 Accusations or Suspicions        Denormalised: Character                       3
          Client              Personal Details
                                                 Convictions / Judgements /
                                                 Criminal Records                 Denormalised: Character                       3
                                                                                  Long Character (Free text could hold
                                                 Notes                            sensitive details)                            1
                                                 Internet usage & web tracking
                                                 information                      Character / W3C Logs                          2
                                                 Physical and/or Mental Health    Character                                     3
                                                                                  Long Character (Free text could hold
                                                 Source of Wealth                 sensitive details)                            1
                                                 Nationality                      Denormalised: Character                       3
                                                 Domicile                         Denormalised: Character                       3
                                                 Spouse                           Name Domain                                   2
                                                 Children                         Name Domain                                   2




         Data Security Guidelines 2010· Page 6
Attribute Identification
Entity                      Domain               Attribute                Data Type (Generic)   Classification
                                                 SSN / Tax ID / NI Number Character                    1
                                                 Passport Number          Character                    1
                                                 Login ID's & Passwords   Character                    1
                            Natural Keys         Union / Club / Society
                                                 Membership               Character                    1
                                                 Bank Account Number(s)   Number                       1
                                                 Sort Code(s)             Number                       2
          Client                                 Account Name(s)          Character                    1
                                                 Residential Address      Address Domain               2


                                                 Beneficiary              Beneficiary Entity           1
                                                 IFA                      IFA Entity                   2
                            Linked Data
                                                 Intermediary             Intermediary Entity          2
                                                 Sub Account              Sub Account Entity           1
                                                 Accountant               Accountant Entity            2




         Data Security Guidelines 2010· Page 7
Attribute Identification
Entity                        Domain                       Attribute                   Data Type (Generic)                     Classification
         Beneficiary           All Client Entity Domains                                                                               2


             IFA               All Client Entity Domains                                                                               3


         Intermediary          All Client Entity Domains                                                                               3


         Sub Account           All Client Entity Domains                                                                               1


                                                   Sensitivity level 1 is a unique identifier in which a party can be identified without further
                          Sensitivity Level 1      reference to other sensitive information (High Cardinality), all instances should be
                                                   obfuscated
                                                   Sensitivity level 2 is information which collectively i.e. more than 1 instance may form a
                                                   positive identification of a party, in isolation this data, although deemed sensitive has no
  Classification          Sensitivity Level 2      direct and unique identification of the party ,however the more attributes supplied
      Key                                          ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality).
                                                   All combined instances must be obfuscated

                          Sensitivity Level 3      Sensitivity level 3 is data with a Low Cardinality ratio. All combined instances should be
                                                   obfuscated although individual instances will not identify a party




 Note: Normalised data types obfuscated layer at the reference table level




           Data Security Guidelines 2010· Page 8
Use-Case Example of Composite Identifiers (Sensitivity
Level 2)                           Data is purely for reference


                                                                • Cardinality
                                  First Name
                                                                  =>1,000,000
                                                            • Cardinality =>100,000
                                    Surname

                                                        • Cardinality =>10,000
                                     Country
      Increase of
        positive                                    • Cardinality =>100
     identification                   Region
          by a
    cumulative of                               • Cardinality =>5         Obfuscation point
      sensitivity 2                Post Code
    attributes held
       within the
    same domain                      House • Cardinality =<2              Point of probability
                                     Number


    Data Security Guidelines 2010· Page 9
Use-Case Example of Composite Identifiers (Sensitivity
Level 3)                           Data is purely for reference


                                                                 • Cardinality
                                    Gender                         =>100,000,000
                                                             • Cardinality
                                    Country                    =>10,000,000
    Little increase                                      • Cardinality =>1,000,000
       of positive                   Region
     identification
          by a                                       • Cardinality =>3,000
    cumulative of             Date of Birth
      sensitivity 1
        until the
     addition of a
                                                  • Cardinality =>5      Obfuscation point
                                  Surname
       sensitivity
         level 2
        attribute                      Post • Cardinality <=2            Point of probability
                                       Code


    Data Security Guidelines 2010· Page 10
Numeric Obfuscation

 Numbers used in aggregate functions and checked to provide accuracy
 i.e. holdings, values, transactions, should not be obfuscated if all other
 attributes within the domain/entity structure have been obfuscated and
there is no method of reversing the obfuscation layer to identify sensitive
                    data against the values, barring that:



                       Fixed point numbers
                              should be           Floating point
 Integers should be                                                                       Ordinal numbers
                       obfuscated equal to     numbers should be
 obfuscated equal to                                                Currency/percentage    should have the
                          or less than the     obfuscated equal to
   or less than the                                                   formatting over    alphabetic element
                         original precision      or less than the
length of the original                                                numeric values      obfuscated in the
                        and obfuscated but      original precision
   number but still                                                  should be retained    same way as an
                         retain the original    and scale number
   conform to any                                                      for verification  alpha data element
                         scale number but      but still conform to
  specific business                                                       purposes       retaining the same
                        still conform to any       any specific
         rules                                                                          two character format
                         specific business       business rules
                                rules




      Data Security Guidelines 2010· Page 11
Alpha Obfuscation

    Alphabetic and Alphanumeric data types should be
obfuscated retaining the original structure of the underlying
  data, however certain exceptions exist for search/view
                          criteria


 SGML/XML/HTML/XHTML/RSS
   data formats must retain XML
                                    Embedded Java Code must be
  reserved characters in order for
                                   retained but underlying attributes
 them to be used in native views,
                                              obfuscated
  DTD, XLS, Web based formats
                etc.




    Data Security Guidelines 2010· Page 12
Key Obfuscation

 Obfuscation of keys gives rise to the challenge of failure
  of Declarative Referential Integrity when presented to
      certain applications that rely upon them thus:



                                             Natural keys that are
 Natural keys that are
                                               identified as non-    Surrogate keys are
identified as sensitive
                                              sensitive are out of    out of scope and
   data can only be
                                              scope and may be       should be retained
 anonymised/masked
                                                   retained




    Data Security Guidelines 2010· Page 13
Date Obfuscation 1

 Dates should retain the original date format of
  the National Character set of the underlying
                      data
                                                  Day names
                                                  should be                         Ordinal numbers
                                             obfuscated as per                       should have the
                           Day of the week
                                                the alpha data                          alphabetic
  Day numbers              numbers should                         Month numbers
                                             element, however                            element
    should be             be obfuscated but                          should be
                                              the length of the                     obfuscated in the
 obfuscated but            retain the 0-6 or                      obfuscated but
                                                 day must be                        same way as an
 retain the 1-31            1-7 formatting                        retain the 1-12
                                                 changed to a                           alpha data
     format                 dependent on                              format
                                             length between 6                       element retaining
                               platform
                                             and 9 but not the                        the same two
                                               same length as                       character format
                                               the original day




     Data Security Guidelines 2010· Page 14
Date Obfuscation 2

  Dates should retain the original date format of
   the National Character set of the underlying
                       data
                                   Year numbers should always retain
                                   the century 4-number format in the
                                   range (current year- any validation
      Month names should be
                                   criteria) to current year-1 for years
 obfuscated as per the alpha data
                                    in the past and current year + 1 to
element, however the length of the                                          Decision support systems relying
                                        (current year +any validation
month must be changed to a length                                           on “roll-forward”/”roll-back” date
                                   criteria) for projected ranges. (This
between 3 and 12 but not the same                                          scenarios and date range queries
                                     potentially could cause problems
    length as the original month.                                           must retain the requested period
                                   with date verification functions and
 Abbreviated month names should                                               change between two dates
                                    any function code which performs
   be obfuscated retaining the 3-
                                    these verifications must utilise the
          character format
                                   same seed value as the date value
                                     and must fully enclose within the
                                         same block all other dates)




      Data Security Guidelines 2010· Page 15
Granularity of Access to Sensitive Data
                                                                            Business
                                                                            Users only
                                      Business Users,
                                      Development &
                                      Support
    Development
    & Support only

                                                                                         Production
                                                                                         • Production environments
                                                                                           must be fully obfuscated
                                                                                           to all Development,
                                                                                           Support, and Non-
                                                        UAT                                Authorised users
                                                        • UAT environments must          • Business users may see
                                                          be fully obfuscated to all       sensitive data based on
                                                          Development, Support,            their individual levels of
                                                          and Non-Authorised users         authorisation
                                                        • Business users may see         • Access to data by
                                                          sensitive data based on          Support users should be
                                                          their individual levels of       disallowed if possible
                     Development                          authorisation                  • If access is allowed for
                                                        • Access to data by Support        “fix-on-fail” functionality
                     • Development environments           users should be                  this must be keystroke
                       must be fully obfuscated at        disallowed if possible           logged through an
                       the data level (not                                                 auditing application
                       obfuscated views) as             • If access is allowed for
                       developers usually hold            “fix-on-fail” functionality
                       higher privileges in these         this must be keystroke
                       environments                       logged through an auditing
                                                          application




   Data Security Guidelines 2010· Page 16
Deployment Methods



          Data Security Guideline Policy
                                                           Shared
    Full Environment                                                                                     Hybrid
                                                         Environment
     Access Control                                                                                   Environments
                                                           Access


                                                 Prod, UAT, SIT &
                                                  Dev environment                               Prod, UAT, & SIT
                               Data             may share different             Data is         environments may
 Prod, UAT, SIT &      obfuscation/anonym           user types i.e.      obfuscated/anonymi     be obfuscated at a     Data is obfuscated
 Dev environments       isation/masking is             business,         sed/masked based       user type level but     to the same rules
are fully segregated    performed through            developers,           on the authority      transfers of data     but the deployment
  by user type, or     ETL tools from one        support. The level        level of the user         into Dev          method uses both
   privilege level.     environment to the       of granularity must       type or privilege    environments may       technical methods
                               next             be defined on a per-             level             be performed
                                                     user type or                              through ETL utilities
                                                privilege level basis.




       Data Security Guidelines 2010· Page 17
Benefits & Drawbacks of Deployment Methods

      Full                                       Shared
                                                                                        Hybrid
  Environment                                  Environment
 Access Control                                                                      Environments
                                              Access Control
  Benefits                                     Benefits                              Benefits
  • Leverage existing tools                    • Higher level of access              • All prior mentioned
    capabilities and vendor support              granularity, greater flexibility    • Greater flexibility in defining a
  • Guaranteed obfuscation                     • Define the level of encryption to     solution which fits with a current
    contained within the environment             conform to national regulatory        “modus operandi”
  • User access managed at                       controls
    different layer to data access             • No load window issues all users
  • Access to environment                        share same data instance
    determines visibility



  Drawbacks                                    Drawbacks                             Drawbacks
  •ETL tool license/platform                   • Development costs                   • All prior mentioned
   costs                                       • Requires clear delineation of       • Potential support complexity
  •Load window issues                            user roles and role management        issues
  •Metadata & cipher security                  • Proprietary technology solutions
   concerns




     Data Security Guidelines 2010· Page 18
Data Obfuscation Methodology




    Full                                    Hybrid
    Environmental                           environment            Shared
    Access                                  • No access to         Environment
    Control                                   PROD,
                                                                   • Data obfuscation
                                              obfuscation in UAT
    • No data                                                        based on roles
                                              based on roles
      obfuscation, none                                              and rules of
                                              and rules, ETL
      authorised users                                               sensitivity
                                              obfuscation into
      have no access                          DEV




   Data Security Guidelines 2010· Page 19
Environmental Control (Access Method)
                               Informatica                    Informatica




           PROD                                 UAT                             DEV

                               ETL                             ETL
          Instance 1           (Apply        Instance 2        (Apply        Instance 3
                               Obfuscation                     Obfuscation
                               Rules)
                                             (Obfuscated)      Rules)
                                                                             (Obfuscated)




                                                            Development & Support
                     Business Users
                                                                   Users


   Data Security Guidelines 2010· Page 20
Environmental Control (Hybrid Method)
                                                            Informatica
                    Periodic Refresh or
                       Duplex Feed
            PROD                               UAT                            DEV

                                                             ETL
          Instance 1                        Instance 1       (Apply        Instance 3
                                                             Obfuscation
                                            or 2             Rules)
                                                                           (Totally
                                                                           Obfuscated)




                                                    Obfuscation
                                                    Layer



                                                         Development & Support
                     Business Users
                                                                Users


   Data Security Guidelines 2010· Page 21
Appendix




 Terms of Reference
                                                                                   Dynamic
    Lingual                      Risk           Non-Deterministic   Monte Carlo   Obfuscation
   Reference               Impact/Probability     Obfuscation        Method        Function
                                                                                   Methods




   Data Security Guidelines 2010· Page 22
Lingual Reference


                                                                   To remain unidentified, nameless
                                                                    i.e. NULL therefore a field that is
                                   Anonymous/Anonymised             anonymous would not show any
                                                                   data at all and you could not verify
                                                                         the structure of the data




                                                                   To confuse, scramble i.e. encrypt,
                                                                     therefore you could verify that a
                                                                    date was a date albeit the wrong
                                                                    one, a number is a number albeit
                                     Obfuscate/Obfuscated         the wrong one and alpha is alpha in
                                                                    the same structure so you would
                                                                   see the structure but the sensitive
                                                                      data would be indecipherable




                                                                  To cover, hide, this would normally
                                                                   be used in password protection
                                           Mask/Masked            where the asterisk is displayed as
                                                                                 typed




Anonymous and Obfuscate are used in literature, an anonymous writer is unknown whereas writing under a nom de plume
is obfuscated


        Data Security Guidelines 2010· Page 23
Risk impact/Probability


Probability - A risk is an event that "may"
occur. The probability of it occurring can
range anywhere from just above 0% to just
below 100%. (Note: It can't be exactly
100%, because then it would be a certainty,
not a risk. And it can't be exactly 0%, or it
wouldn't be a risk.)


Impact - A risk, by its very nature, always
has a negative impact. However, the size of
the impact varies in terms of cost and
impact on some other critical factor.

We apply these rules to determine when to
obfuscate data and when not to


     Data Security Guidelines 2010· Page 24
Non-Deterministic Obfuscation




           A variety of factors can cause an algorithm to
           behave in a way which is not deterministic, or
           non-deterministic:
           • If it uses external state other than the input, such as user input, a
             global variable, a hardware timer value, a random value, or stored           A major problem with deterministic algorithms is that
             disk data.                                                                  sometimes, we don't want the results to be predictable.
           • If it operates in a way that is timing-sensitive, for example if it has        For example, if you are playing an on-line game of
             multiple processors writing to the same data at the same time. In           blackjack that shuffles its deck using a pseudorandom
             this case, the precise order in which each processor writes its data
             will affect the result.                                                   number generator, a clever gambler might guess precisely
           • If a hardware error causes its state to change in an unexpected            the numbers the generator will choose and so determine
             way.                                                                        the entire contents of the deck ahead of time, allowing
                                                                                          him to cheat. Similar problems arise in cryptography,
                                                                                          where private keys are often generated using such a
                                                                                       generator. This sort of problem is generally avoided using
                                                                                           a cryptographically secure pseudo-random number
                                                                                                                generator.




   Data Security Guidelines 2010· Page 25
The Monte Carlo Methods


                                    Monte Carlo methods are computational algorithms that rely on
                                 repeated random sampling to compute their results one of which is a
                                          stochastic function to create an obfuscation layer


                                   Stochastic programming is a framework for modelling optimization
                                                   problems that involve uncertainty.

                                   Because of their reliance on repeated computation of random or
                                pseudo-random numbers, these methods are most suited and tend to
                                be used when it is unfeasible or impossible to compute an exact result
                                    with a deterministic algorithm thus ensuring data obfuscation

                                      These are the building blocks to secure obfuscation of highly
                                     sensitive data within the banking environment and will satisfy an
                                                                external audit




   Data Security Guidelines 2010· Page 26
Dynamic Obfuscation Function Methods
                                                  This is an example of a high level data
                                                  obfuscation function in which a decision
                                                  is made based on the previous criteria of
                                                  when to obfuscate and the process of
                                                  obfuscation for an alpha data type
                                                  (simplest form)




Data is obfuscated on the
decision point based on the
underlying technologies info-
gap non-probalistic theory
methods of random number
generation which creates seed
data for ASCII conversion of
real-data




         Data Security Guidelines 2010· Page 27

Mais conteúdo relacionado

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Data Security Guidelines

  • 1. Data Security Guidelines May 2010 – Version 1.0 Gary Waldrom
  • 2. Candidate Selection Entity Class A logical model of an identifiable party • Each instance of an Domains entity defined within the system should be identified and marked for drill down investigation A logical structure of attributes represented within Attributes a single entity • Each instance of a domain structure as listed within the spreadsheet (slides 5- Individual data fields under data type constraints and associated business 8) and being contained within an identified and integrity rules Entity should be • Each attribute type as listed within the spreadsheet (slides 5-8) and being contained within marked for further drill an identified domain is a candidate for data obfuscation based on the data obfuscation down investigation rules Data Security Guidelines 2010· Page 2
  • 3. Data Sensitivity Level 1 • Sensitivity level 1 is a unique identifier in which a party can be identified without further reference to other sensitive information (High Cardinality), all instances should be obfuscated or masked Level 2 • Sensitivity level 2 is information which collectively i.e. more than 1 instance may form a positive identification of a party, in isolation this data, although deemed sensitive has no direct and unique identification of the party however the more attributes supplied ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality). All combined instances must be obfuscated Level 3 • Sensitivity level 3 is data with a Low Cardinality ratio. All combined instances should be obfuscated although individual instances will not identify a party Data Security Guidelines 2010· Page 3
  • 4. Risk of Identification of Parties Composite Identifiers – •∞ Sensitivity Level 2 • n + Composite • Multiple composites • High risk, these • n exponent increase identification, identifiers will uniquely • Becomes an identifier cardinality increases as identify party and are instances are added traceable through as multiple instances various public domain increase cardinality, based systems exponent based on cardinality Low Cardinality Unique Identifier – Identifiers – Sensitivity Level 1 Sensitivity Level 3 Data Security Guidelines 2010· Page 4
  • 5. Attribute Identification Entity Domain Attribute Data Type (Generic) Classification Firstname(s) Character 2 Name Surnames / Family Name Character 2 Title / Prefix Denormalised: Character 3 Suffix Denormalised: Character 3 Salutation Denormalised: Character 2 House Number/Name Character 2 Address Line 1 Character 2 Address Address Line 2 Character 2 Address Line 3 Character 2 Client Address Line 4 Character 2 State / County / Canton / Region etc Denormalised: Character 3 Zip / Post Code Character 2 Country Denormalised: Character 3 Home Telephone Number Character 1 Work Telephone Number Character 1 Cell/Mobile Number Character 1 Contact Additional Telephone Numbers Character 1 Email1 Character 1 Email2 Character 1 Additional Email Accts Character 1 Data Security Guidelines 2010· Page 5
  • 6. Attribute Identification Entity Domain Attribute Data Type (Generic) Classification Date of Birth Date 3 Gender Denormalised: Character 3 Political Persuasion Denormalised: Character 3 Religious or Philosophical Beliefs Denormalised: Character 3 Sexual Persuasion Denormalised: Character 3 Race or Ethnic Origin Denormalised: Character 3 Accusations or Suspicions Denormalised: Character 3 Client Personal Details Convictions / Judgements / Criminal Records Denormalised: Character 3 Long Character (Free text could hold Notes sensitive details) 1 Internet usage & web tracking information Character / W3C Logs 2 Physical and/or Mental Health Character 3 Long Character (Free text could hold Source of Wealth sensitive details) 1 Nationality Denormalised: Character 3 Domicile Denormalised: Character 3 Spouse Name Domain 2 Children Name Domain 2 Data Security Guidelines 2010· Page 6
  • 7. Attribute Identification Entity Domain Attribute Data Type (Generic) Classification SSN / Tax ID / NI Number Character 1 Passport Number Character 1 Login ID's & Passwords Character 1 Natural Keys Union / Club / Society Membership Character 1 Bank Account Number(s) Number 1 Sort Code(s) Number 2 Client Account Name(s) Character 1 Residential Address Address Domain 2 Beneficiary Beneficiary Entity 1 IFA IFA Entity 2 Linked Data Intermediary Intermediary Entity 2 Sub Account Sub Account Entity 1 Accountant Accountant Entity 2 Data Security Guidelines 2010· Page 7
  • 8. Attribute Identification Entity Domain Attribute Data Type (Generic) Classification Beneficiary All Client Entity Domains 2 IFA All Client Entity Domains 3 Intermediary All Client Entity Domains 3 Sub Account All Client Entity Domains 1 Sensitivity level 1 is a unique identifier in which a party can be identified without further Sensitivity Level 1 reference to other sensitive information (High Cardinality), all instances should be obfuscated Sensitivity level 2 is information which collectively i.e. more than 1 instance may form a positive identification of a party, in isolation this data, although deemed sensitive has no Classification Sensitivity Level 2 direct and unique identification of the party ,however the more attributes supplied Key ultimately form a sensitivity level 1 without a level 1 being involved (Normal-Cardinality). All combined instances must be obfuscated Sensitivity Level 3 Sensitivity level 3 is data with a Low Cardinality ratio. All combined instances should be obfuscated although individual instances will not identify a party Note: Normalised data types obfuscated layer at the reference table level Data Security Guidelines 2010· Page 8
  • 9. Use-Case Example of Composite Identifiers (Sensitivity Level 2) Data is purely for reference • Cardinality First Name =>1,000,000 • Cardinality =>100,000 Surname • Cardinality =>10,000 Country Increase of positive • Cardinality =>100 identification Region by a cumulative of • Cardinality =>5 Obfuscation point sensitivity 2 Post Code attributes held within the same domain House • Cardinality =<2 Point of probability Number Data Security Guidelines 2010· Page 9
  • 10. Use-Case Example of Composite Identifiers (Sensitivity Level 3) Data is purely for reference • Cardinality Gender =>100,000,000 • Cardinality Country =>10,000,000 Little increase • Cardinality =>1,000,000 of positive Region identification by a • Cardinality =>3,000 cumulative of Date of Birth sensitivity 1 until the addition of a • Cardinality =>5 Obfuscation point Surname sensitivity level 2 attribute Post • Cardinality <=2 Point of probability Code Data Security Guidelines 2010· Page 10
  • 11. Numeric Obfuscation Numbers used in aggregate functions and checked to provide accuracy i.e. holdings, values, transactions, should not be obfuscated if all other attributes within the domain/entity structure have been obfuscated and there is no method of reversing the obfuscation layer to identify sensitive data against the values, barring that: Fixed point numbers should be Floating point Integers should be Ordinal numbers obfuscated equal to numbers should be obfuscated equal to Currency/percentage should have the or less than the obfuscated equal to or less than the formatting over alphabetic element original precision or less than the length of the original numeric values obfuscated in the and obfuscated but original precision number but still should be retained same way as an retain the original and scale number conform to any for verification alpha data element scale number but but still conform to specific business purposes retaining the same still conform to any any specific rules two character format specific business business rules rules Data Security Guidelines 2010· Page 11
  • 12. Alpha Obfuscation Alphabetic and Alphanumeric data types should be obfuscated retaining the original structure of the underlying data, however certain exceptions exist for search/view criteria SGML/XML/HTML/XHTML/RSS data formats must retain XML Embedded Java Code must be reserved characters in order for retained but underlying attributes them to be used in native views, obfuscated DTD, XLS, Web based formats etc. Data Security Guidelines 2010· Page 12
  • 13. Key Obfuscation Obfuscation of keys gives rise to the challenge of failure of Declarative Referential Integrity when presented to certain applications that rely upon them thus: Natural keys that are Natural keys that are identified as non- Surrogate keys are identified as sensitive sensitive are out of out of scope and data can only be scope and may be should be retained anonymised/masked retained Data Security Guidelines 2010· Page 13
  • 14. Date Obfuscation 1 Dates should retain the original date format of the National Character set of the underlying data Day names should be Ordinal numbers obfuscated as per should have the Day of the week the alpha data alphabetic Day numbers numbers should Month numbers element, however element should be be obfuscated but should be the length of the obfuscated in the obfuscated but retain the 0-6 or obfuscated but day must be same way as an retain the 1-31 1-7 formatting retain the 1-12 changed to a alpha data format dependent on format length between 6 element retaining platform and 9 but not the the same two same length as character format the original day Data Security Guidelines 2010· Page 14
  • 15. Date Obfuscation 2 Dates should retain the original date format of the National Character set of the underlying data Year numbers should always retain the century 4-number format in the range (current year- any validation Month names should be criteria) to current year-1 for years obfuscated as per the alpha data in the past and current year + 1 to element, however the length of the Decision support systems relying (current year +any validation month must be changed to a length on “roll-forward”/”roll-back” date criteria) for projected ranges. (This between 3 and 12 but not the same scenarios and date range queries potentially could cause problems length as the original month. must retain the requested period with date verification functions and Abbreviated month names should change between two dates any function code which performs be obfuscated retaining the 3- these verifications must utilise the character format same seed value as the date value and must fully enclose within the same block all other dates) Data Security Guidelines 2010· Page 15
  • 16. Granularity of Access to Sensitive Data Business Users only Business Users, Development & Support Development & Support only Production • Production environments must be fully obfuscated to all Development, Support, and Non- UAT Authorised users • UAT environments must • Business users may see be fully obfuscated to all sensitive data based on Development, Support, their individual levels of and Non-Authorised users authorisation • Business users may see • Access to data by sensitive data based on Support users should be their individual levels of disallowed if possible Development authorisation • If access is allowed for • Access to data by Support “fix-on-fail” functionality • Development environments users should be this must be keystroke must be fully obfuscated at disallowed if possible logged through an the data level (not auditing application obfuscated views) as • If access is allowed for developers usually hold “fix-on-fail” functionality higher privileges in these this must be keystroke environments logged through an auditing application Data Security Guidelines 2010· Page 16
  • 17. Deployment Methods Data Security Guideline Policy Shared Full Environment Hybrid Environment Access Control Environments Access Prod, UAT, SIT & Dev environment Prod, UAT, & SIT Data may share different Data is environments may Prod, UAT, SIT & obfuscation/anonym user types i.e. obfuscated/anonymi be obfuscated at a Data is obfuscated Dev environments isation/masking is business, sed/masked based user type level but to the same rules are fully segregated performed through developers, on the authority transfers of data but the deployment by user type, or ETL tools from one support. The level level of the user into Dev method uses both privilege level. environment to the of granularity must type or privilege environments may technical methods next be defined on a per- level be performed user type or through ETL utilities privilege level basis. Data Security Guidelines 2010· Page 17
  • 18. Benefits & Drawbacks of Deployment Methods Full Shared Hybrid Environment Environment Access Control Environments Access Control Benefits Benefits Benefits • Leverage existing tools • Higher level of access • All prior mentioned capabilities and vendor support granularity, greater flexibility • Greater flexibility in defining a • Guaranteed obfuscation • Define the level of encryption to solution which fits with a current contained within the environment conform to national regulatory “modus operandi” • User access managed at controls different layer to data access • No load window issues all users • Access to environment share same data instance determines visibility Drawbacks Drawbacks Drawbacks •ETL tool license/platform • Development costs • All prior mentioned costs • Requires clear delineation of • Potential support complexity •Load window issues user roles and role management issues •Metadata & cipher security • Proprietary technology solutions concerns Data Security Guidelines 2010· Page 18
  • 19. Data Obfuscation Methodology Full Hybrid Environmental environment Shared Access • No access to Environment Control PROD, • Data obfuscation obfuscation in UAT • No data based on roles based on roles obfuscation, none and rules of and rules, ETL authorised users sensitivity obfuscation into have no access DEV Data Security Guidelines 2010· Page 19
  • 20. Environmental Control (Access Method) Informatica Informatica PROD UAT DEV ETL ETL Instance 1 (Apply Instance 2 (Apply Instance 3 Obfuscation Obfuscation Rules) (Obfuscated) Rules) (Obfuscated) Development & Support Business Users Users Data Security Guidelines 2010· Page 20
  • 21. Environmental Control (Hybrid Method) Informatica Periodic Refresh or Duplex Feed PROD UAT DEV ETL Instance 1 Instance 1 (Apply Instance 3 Obfuscation or 2 Rules) (Totally Obfuscated) Obfuscation Layer Development & Support Business Users Users Data Security Guidelines 2010· Page 21
  • 22. Appendix Terms of Reference Dynamic Lingual Risk Non-Deterministic Monte Carlo Obfuscation Reference Impact/Probability Obfuscation Method Function Methods Data Security Guidelines 2010· Page 22
  • 23. Lingual Reference To remain unidentified, nameless i.e. NULL therefore a field that is Anonymous/Anonymised anonymous would not show any data at all and you could not verify the structure of the data To confuse, scramble i.e. encrypt, therefore you could verify that a date was a date albeit the wrong one, a number is a number albeit Obfuscate/Obfuscated the wrong one and alpha is alpha in the same structure so you would see the structure but the sensitive data would be indecipherable To cover, hide, this would normally be used in password protection Mask/Masked where the asterisk is displayed as typed Anonymous and Obfuscate are used in literature, an anonymous writer is unknown whereas writing under a nom de plume is obfuscated Data Security Guidelines 2010· Page 23
  • 24. Risk impact/Probability Probability - A risk is an event that "may" occur. The probability of it occurring can range anywhere from just above 0% to just below 100%. (Note: It can't be exactly 100%, because then it would be a certainty, not a risk. And it can't be exactly 0%, or it wouldn't be a risk.) Impact - A risk, by its very nature, always has a negative impact. However, the size of the impact varies in terms of cost and impact on some other critical factor. We apply these rules to determine when to obfuscate data and when not to Data Security Guidelines 2010· Page 24
  • 25. Non-Deterministic Obfuscation A variety of factors can cause an algorithm to behave in a way which is not deterministic, or non-deterministic: • If it uses external state other than the input, such as user input, a global variable, a hardware timer value, a random value, or stored A major problem with deterministic algorithms is that disk data. sometimes, we don't want the results to be predictable. • If it operates in a way that is timing-sensitive, for example if it has For example, if you are playing an on-line game of multiple processors writing to the same data at the same time. In blackjack that shuffles its deck using a pseudorandom this case, the precise order in which each processor writes its data will affect the result. number generator, a clever gambler might guess precisely • If a hardware error causes its state to change in an unexpected the numbers the generator will choose and so determine way. the entire contents of the deck ahead of time, allowing him to cheat. Similar problems arise in cryptography, where private keys are often generated using such a generator. This sort of problem is generally avoided using a cryptographically secure pseudo-random number generator. Data Security Guidelines 2010· Page 25
  • 26. The Monte Carlo Methods Monte Carlo methods are computational algorithms that rely on repeated random sampling to compute their results one of which is a stochastic function to create an obfuscation layer Stochastic programming is a framework for modelling optimization problems that involve uncertainty. Because of their reliance on repeated computation of random or pseudo-random numbers, these methods are most suited and tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithm thus ensuring data obfuscation These are the building blocks to secure obfuscation of highly sensitive data within the banking environment and will satisfy an external audit Data Security Guidelines 2010· Page 26
  • 27. Dynamic Obfuscation Function Methods This is an example of a high level data obfuscation function in which a decision is made based on the previous criteria of when to obfuscate and the process of obfuscation for an alpha data type (simplest form) Data is obfuscated on the decision point based on the underlying technologies info- gap non-probalistic theory methods of random number generation which creates seed data for ASCII conversion of real-data Data Security Guidelines 2010· Page 27