O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Unleashing the Power of Apache Atlas with Apache Ranger

5.615 visualizações

Publicada em

Unleashing the Power of Apache Atlas with Apache Ranger Slides

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Unleashing the Power of Apache Atlas with Apache Ranger

  1. 1. UnleashingthepowerofApacheAtlaswith ApacheRanger VirtualDataConnectorProject NIGELJONES JONESN@UK.IBM.COM DATAWORKS,MUNICH,APRIL2017 Apache®,ApacheAtlas,ApacheRanger&otherApacheprojectnamesreferencedareeitherregisteredtrademarksortrademarksoftheApache SoftwareFoundationintheUnitedStatesand/orothercountries.NoendorsementbyTheApacheSoftwareFoundationisimpliedbytheuseof thesemarks.
  2. 2. AboutMe–NigelJones •https://www.linkedin.com/in/nigelljones/ •jonesn@uk.ibm.com(Anyonestilluseemail?) •@planetf1–noisy,f1,electricvehicles,food&drink….Asplitofwork/life accountsdidn’tworkforme! •AndofcoursetheApacheAtlas&Rangermailinglists&JIRA! •Sciencefanatschooluni.Itwascloudchambersbackthen…nowjustthecloud J •IBMHursley,UKsince1990 •Last3yearsfocusonDataLake,InformationGovernance,OpenMetadata
  3. 3. TheProblem….. WHYAREWEHERE…..
  4. 4. Data? •WhatdatadoIhave? •Whatdoesitmean? •Whereisit? •Whohasaccesstoit? •Whoownsit? •Whatqualityisit? •Howdoesitrelatetootherdata? •HowtoIcontrol,audit&understandaccess?
  5. 5. Regulatoryneeds •AdheretoregulationslikeBCBS-239andGDPR •Needtoknowmeaning,valueofthedata •Demonstrateprocessesinplacetogovernaccess •Audit •Significantfinesifrulesbreached •Whilstensuringeasy,readyaccesstoappropriatedatafordataprofessionalstosupport anagilebusiness
  6. 6. Sowhatdoweneedtoaddressthis?
  7. 7. Metadata.. •Metadataenablesdatatobeusedoutsideoftheapplicationthatcreatedit. •Analyticsanddecisionmaking •Newbusinessapplications •Reportingandcompliance •Metadatadescribestheformatandcontentofdataallowingpeopletojudgewhich datasettouseforanewproject •Structure •Meaning •Origin •Validvaluesandquality •Usageandownership •Regulationsandclassificationsthatapply
  8. 8. Whichcansupport… •Anenterprisedatacataloguethatlistsalldataincludingwhereitis,whatitis,who ownsit,it’smeaning,quality,whereitcamefrom,andcanfullydescribeit’s businesscontext&howthedatashouldbegoverned…. •SubjectMatterexpertssearching,collaborating,feedingbackabouttheirdata needsanduse •Automatedgovernanceactionstoprotectandmanageincludingauditing, monitoring,qualitycontrol,rightsmanagement
  9. 9. Buteasily… •Openframeworks&APIs •Automaticcollection&discoveryofmetadatainadynamicheterogeneous environment •Usingpredefinedstandardsforglossaries,schemas,rules,regulationstoreduce cost •Cheaptointegratenewtools •Noproprietarylock-in&assumptionsthatalltoolsarefromonesuiteorvendor •Avoidingsilos •DistributedandOpen
  10. 10. Thevision Open and Unified Metadata
  11. 11. VirtualizationDataConnectorproject
  12. 12. Datavirtualizationproject •Collaboration–IBM,severalbanks&opencommunity •ADataLakeenvironment •NotjustHadoop,butothersourcestoo •BusinessTerms,Classifications,Metadatarich •Offervirtualizedviews.Exposerelationaldatawithbusinessterms •ManageAccesstoresources–permit,deny,log,filter/mask….THROUGH METADATA •Open,pluggable •Workingthroughusecases,design,initialMVP(thisyear) •Critique,feedbackiswelcomed.We’relookingforguidanceandsupportfromthe Atlas&Rangercommunitiesaswellascontributeourideas •ProposedchangesallgothroughmailinglistandJIRAforfeedback
  13. 13. ApacheAtlas •“Atlasisascalableandextensiblesetofcorefoundationalgovernanceservices– enablingenterprisestoeffectivelyandefficientlymeettheircompliance requirementswithinHadoopandallowsintegrationwiththewholeenterprisedata ecosystem.”….http://www.apache.org •OpenCommunity--ApacheIncubatorsinceMay2015 •Typeagnosticmetadatastore •RESTAPI&UI •SupportsmanyHadoopcomponentsincludingHBase,Hive,Sqoop,Storm& others
  14. 14. ApacheRanger •Centralizedsecurityadministrationtomanageallsecurityrelatedtasksinacentral UIorusingRESTAPIs. •Finegrainedauthorizationtodoaspecificactionand/oroperationwithHadoop component/toolandmanagedthroughacentraladministrationtool •StandardizeauthorizationmethodacrossallHadoopcomponents. •Enhancedsupportfordifferentauthorizationmethods-Rolebasedaccesscontrol, attributebasedaccesscontroletc. •Centralizeauditingofuseraccessandadministrativeactions(securityrelated) withinallthecomponentsofHadoop. •…fromhttp://ranger.apache.org
  15. 15. ProjectInteractions Search/Report GaianDB •Searchforlistofassetsbymetadata •Searchfordata •Reportingtoolobtainsdatatodrawreport Underlyingdata,sql,hive, HDFS,Oracle,Netezzaetc Manageslogicalviews Deploysrules,pushes classifications,sourcefor userroles(notusers) +rangerplugintopermit/deny,masketc Pullsrules.classifications RDBMSHadoop ApacheAtlas Apache Ranger ApacheSolr
  16. 16. WhyAtlasandRanger? •OpenSourceessentialtoforminganactiveecosystem •Vision,activecommunity&evolving–abilitytocontribute&workwithothersto providethebestsolution •Alreadyhavegoodcorecapabilities •Atlastypesystemisveryflexible •Rangeroffersarangeofpolicytypesandprovidesapluggableframework •Alreadycrossprojectintegration •UseoftagbasedpolicieinRangersourcedfromAtlas •CanbeusedindependentlyoffullHadoopstack
  17. 17. Refinedvirtualconnectorscopescope GaianDB Ranger Plugin Titan (GraphDB, Metadata Repository) Ranger Config RangerServer Atlas PollPolicies OMAS OMRS IGC PrePostCreate View Metadata Extract physical metadata Manage Logical Tables Virtualizer Retrievemetadata Retrievemetadata Retrievemetadata Pushmetadata OracleNetezza Hive Tables Pushandquerymetadata DataLakeRepositories Meta Data DataLakeVirtualization tag-sync rule-sync Config (eg Policies, Audit log locaMon) LDAP Audit Log Mapper Searchfordata/reporting Pushandquery metadata Meta Data Navigator Meta Data Datameer
  18. 18. GaianDB&Virtualizer •GaianDB •OpenSource •Federated,selflearning,dynamicconfiguration •BasedonApacheDerby •Alreadyhad“policy”support–we’replugginginRangerfor thisproject •Virtualizer •Listenstoeventnotificationsonassetsetc •CreatesviewdefinitionsinGaianDB,andnewAtlasAPIsto storemetadata.Couldusedifferentvirtualengine.. •Designedtobeopentoothervirtualizationtechnologies. LT1LT2 DS2DS1DS3 Policy Plugin (ranger) VirtualizerAtlas GaianDBsupportsfederation –notusedforMVP
  19. 19. Atlas–glossaryenhancements •GetAtlasclosertoparitywithcommercialofferings •BusinessTerms–categories,categoryhierarchies •Has-a,is-a,type-of,synonym,antonym,arbitraryrelationships •AssetsmappedtoBusinessTerms •Classifications •Hierarchy •Navigablemappingstoretainabilitytoflattentagstoranger •InsteadofhivecolumnEMP_SALARY->SPI,nowcanbeEMP_SALARY->SALARY-> SPI… •Usedtodrivegovernance •ATLAS-1410
  20. 20. Atlas–otherenhancements •ConsumerCentricAPIs •OpenMetadataAccessServices(OMAS) •REST&moreKafkanotifications •Asset,Catalog,Connector,Glossary,GovernanceAction,GovernanceDefinitions, InformationView,RolesandAccess •RepositorylevelAPIs •OpenMetadataRepositoryServices(OMRS) •REST&moreKafkanotifications •PluggabilitythroughanOpenConnectorFrameworktoothermetadatarepositories– distributedandOpen •Standarddatamodel/core •Enhancementtocoremodel–versioning,externallinkageetc •Morestandardtypesieforallrelationaldatabasestoeasesharing
  21. 21. Rangerareasbeinglookedat •BuildingapluginforGaianDB •Accesscontrol,simplemasking.Morelater •Usersynchronization(large#users,roleofAtlas) •ChangestotagsyncprocessforNewglossaryproposal •AsmoremetadatagoesintoAtlas,itbecomessourceforgenerationofsomekinds ofpolicies.Whereisthemaster? •Generatingrangerrulesfromgovernancedefinitions •HowaboutcontrolofaccesstoAtlasitself? •Aside:Interfacesusedbyenforcementengines(suchastogetclassificationdata) needtobeefficient–theseshouldworkforprojectslikeApacheSentryaswellas Atlas
  22. 22. BeyondtheMVP •OpenDiscoveryFramework •Considerothersecurityenforcementengines–suchasApacheSentry&driving morecapabilityaroundrules&governanceactionsfromAtlasmetadata •Workonstandardmodelstosupportdifferentdomains •Lineage •Fromhighleveldesignlineagethroughtooperationaldetail.Logsvsgraph…. •APImetadata •Infrastructure–JanusGraph… •AbstractionaddedbyIBMinlastfewmonthsfortitan1
  23. 23. Thevision •Anenterprisedatacatalogthatlistsallofyourdata,whereitislocated,itsorigin(lineage), owner,structure,meaning,classificationandquality •Spanningsystemsbothonpremiseandcloudproviders •Hostedlocallytoyourdataplatformsbutintegratedtoprovidetheenterpriseview •Newdatatools(fromanyvendor)connecttoyourdatacatalogoutofthebox •Novendorlock-in;norexpensivepopulationofyetanotherproprietarysiloedmetadatarepository •Metadataisaddedautomaticallytothecatalogasnewdataiscreated •Extensiblediscoveryprocessescharacteriseandclassifythedata •Interestedpartiesandprocessesarenotified •Subjectmatterexpertscollaboratingaroundthedata •Locatethedatatheyneed,quicklyandefficiently •Feedbacktheirknowledgeaboutthedataandtheusestheyhavemadeaboutittohelpothersand supporteconomicevaluationofdata •Automatedgovernanceprocessesprotectandmanageyourdata •Metadata-drivenaccesscontrol
  24. 24. Summary •Atlascanhelpushaveanindustrywidecommonmetadataplatformaroundwhicha vibrantecosystemcanevolve •NotonlyinHadoopbutmorebroadly •Metadatadrivengovernancecanbescalable&enableustomanageourdatabetter, andbecompliantwithregulations •Theideaspresentedhereresonatewithmanypeoplewe’vespokento •Getinvolved!I’dlovetohearthefeedbackonthisapproach! •CommentontheJIRAS,askquestions,contribute,disagree…;-) •LookatJIRATag“VirtualDataConnector”orstartatATLAS-1689 •Atlaswiki •“Innovationhappensbestnotinisolationbutincollaboration”(keynote) •THANKS!
  25. 25. Questions Afterthistalk jonesn@uk.ibm.com 17:50Room4–Security&GovernanceBOF z zzz z z z Questions?
  26. 26. Backupcharts
  27. 27. Atlas graphDB “gaiandb” IG C IGC REST API Oracle Data HDFS Data Netezza Data P-JDBCP-JDBCP-JDBC GAF OMAS Virtual Asset OMAS Search Search/ExploreUI Catalog OMAS OMR S OMR S GAF Pre GAF Post Connector Framework * Atlas boundaries Developed in POC May not be in POC iniNally *May be hardcoded at first Conne ctor Frame work ATLAS Virtualizer Architecture
  28. 28. Metadataareasandtypes Policy Metadata (Principles, Regula6ons, Standards, Approaches, Rule Specifica6ons, Roles and Metrics) Governance Ac6ons and Processes Augmenta6on Mapping Implementa6on Connector Directories Access Access Informa6on Auditor Integra6on Developer Business Analyst Data Scien6st Informa6on Worker Informa6on Owner Informa6on Governor Informa6on Steward Data Quality Analyst Business Objects and Rela6onships, Taxonomies and Ontologies Business AMributes Organiza6on Informa6on Curator Teaming Metadata (people profiles, communi6es, projects, notebooks, …) Models and Schemas 3 2 4 5 Physical Asset Descrip6ons (Data stores, APIs, models and components) Asset Collec6ons (Sets, Typed Sets, Type Organized Sets) Informa6on Views Rights Management Reference Data Feedback Metadata (tags, comments, ra6ngs, …) Classifica6on Schemes C l a s s if i c a 6 o n StrategySubject Area Defini6on Campaigns and Projects Infrastructure and systems Rollout 1 Discovery Metadata (profile data, technical classifica6on, data classifica6on, data quality assessment, …) Augmenta6on Instrument Associa6on Informa6on Process Instrumenta6on (design lineage) 6 7
  29. 29. User&Group/Rolesynchronization UserSync2 LDAPholdsrole-membership (LDAPgroups)–couldalsobe ActiveDirectory ATLASmanagesdefinitive listofroles<thatareusedfor atlasmanagedsources> •CorporateLDAPhasahugenumberofusers/groups •Rangercurrentlyneedstosyncall •Infutureperhapsweestablishgroup/rolemembership duringauthentication •Capabilityforalternativesourcecouldbemergedinto baseUserSync LDAPlookup-> group:member GovernanceActionOMAS -getRoles Apache Ranger LDAP ApacheAtlas
  30. 30. AtlasGlossaryv2:TagSynctoRanger TagSync2 ATLASglossarymanagesa sophisticatedenterpriseglossary structure •AtlasGlossaryv2ProposedinATLAS-1410(DavidRadley)SyncBuildsonexistingtagsyncapproach •NewAPIinAtlaswillflattenclassificationstructure •Nochangestoranger–butexposingricherclassificationcouldbeareaoffuturework GovernanceActionOMAS Confidential Salary emp_renum Business Term HiveColumn Business Term Confidential emp_renum HiveColumn Tag Apache Ranger ApacheAtlas
  31. 31. Policy(Rule)synchronization RuleSync •GeneratepoliciesinRangerbasedoffentitiesinAtlas •Currentlydesigninghowthisworks •ScopedbypolicyservicesoexistingRangerUIapproachstillworks GovernanceActionOMAS -getRules Role Classifications Asset RangerRule Action Apache RangerApacheAtlas
  32. 32. VirtualDataConnectorJIRAS20170402 •RANGER- 1488 •RANGER- 1487 •RANGER- 1486 •RANGER- 1485 •RANGER- 1464 •RANGER- 1454 •RANGER- 1234 •RANGER- •CreateRangerpluginforgaiandb •generaterulesfromGovernancedefinitionsinAtlas •NewusersyncalternativeforAtlas(vdc) •RangersupportforVirtualDataConnectorProject(ATLAS) •SupportAtlasv2glossaryinAtlasplugin(foraccesscontroltotermsetc) •SupportofAtlasv2glossaryAPIproposalfortagsource •Post-evaluationphaseuserextensions •RangerSource:eclipse •Adddatamaskingfortagbasedpolicies •GovernanceActionFrameworkOMAS •SampleassetstosupportVirtualConnectorProject •OMASInterfacesforAtlas •BuildATLASusingDocker
  33. 33. References •ApacheAtlas-http://atlas.apache.org/ •ToplevelJIRAforthisactivityhttps://issues.apache.org/jira/browse/ATLAS-1689 •ApacheRanger-http://ranger.apache.org/ •GaianDB •https://github.com/gaiandb/gaiandb •https://developer.ibm.com/open/openprojects/gaian-database/ •Thecaseforopenmetadata–A.M.Chessell •http://www.ibmbigdatahub.com/blog/case-open-metadata

×