Next Generation Data Warehouse Development with Lambda and Redshift

•Transferir como PPSX, PDF•

2 gostaram•536 visualizações

This document discusses using Lambda and Redshift for next generation data warehouse development. It summarizes that Redshift provides a petabyte-scale data warehouse that is fast, horizontally scalable, and has massive storage capacity with attractive pricing. It then proposes using Lambda triggers and a CI/CD pipeline to empower database developers and DBAs by automating deployments, improving code quality and skills, and enhancing data security, uptime, and reputation.

Tecnologia

Next Generation
Data Warehouse
Development
with Lambda and
Redshift
Andras Gombosi

AMAZON REDSHIFT
Petabyte scale Massively Parallel
Data Warehouse
Exceptionally fast *
Horizontally Scalable
Massive Storage capacity
Attractive and transparent pricing
SQL interface
AWS ecosystem
Secure *

Challenges – the cost of greatness
SLAs
Security
Audit & Compliance

Solution:
Empower the Database
Developer and DBA
communities with DevOps
methodologies!

5
AWS Cloud
Lambda
Trigger
CD Pipeline for DB code
Developer
And DBA
Communites
Git Push
Task Router
DEPLOYER APP

Data Modeller SDLC: Forward Engineering
CD PIPELINE
Data Modeling
Tool

Edge ETL Use Cases: AWS Billing Data Load
SQL Forwarder
CD PIPELINE

SDLC changes and effects
UPTIME and
REPUTATION
CODE QUALITY
AND SKILLS
AUTOMATIC,
PROCESS ENFORCED
DATA SECURITY
Push changes to repo instead of execute!

Well Architected : Security Pillar
Endpoints

Well Architected : Other Pillars
Security Reliability
AWS Well Architected Framework
Performance
Efficiency
Cost
Optimisation
Operational
Excellence

Conclusion &
Questions
• AWS Ecosystem
• Cloud-native
• Not just for Redshift
• Not “outside the box”, use
multiple boxes!

Mais conteúdo relacionado

Mais procurados

Video: https://youtu.be/Zg8jrAOfqEY Feb 2021 Sydney Serverless Meetup talk on AWS Lambda Containers - bridging the gap between serverless and containers once and for all The serverless paradigm focuses on business problems and containers are the infrastructure abstraction of choice for most developers. With AWS Lambda container support, it is now possible to combine the two worlds to focus on business problems with the certainty of immutable infrastructure and unprecedented levels of code flexibility/portability. What does this brave new world of serverless containers on AWS looks like? How easy is it to implement/migrate? Which use cases are suitable? Let’s dive deep and find out!

AWS Lambda Containers - bridging the gap between serverless and containers on...

Yun Zhi Lin

AWS Office Hours: Dev and Test

Amazon Web Services

Real time serverless data pipelines on AWS

The Incredible Automation Day

Rapid Prototyping for Big Data with AWS

SoftServe

Real time Object Detection and Analytics using RedisEdge and Docker

Ajeet Singh Raina

Filip Kanpik - Product Manager @ Google Cloud Platform raised a topic related to the challenges and opportunities of the servaste architecture based on Google Cloud. The presentation will also cover open source solutions and launch of Serverless services on Kubernetes clusters using Knative; also outside of Google Cloud. We will talk about issues related to the efficiency of Serverless solutions in the public cloud and the direction of development of the entire area.

The future is Serveless | Filip Knapik | #4 Serverless UG Warsaw

Serverless User Group Poland

World's best AWS Cloud Log Analytics & Management Tool

Cloudlytics

Azure functions

The Incredible Automation Day

Scalable Application Development on AWS

Mikalai Alimenkou

In this talk from the Dublin Websummit 2014 AWS Technical Evangelist Ian Massingham discusses the major trends that are changing the gaming market today and how using the cloud as a development and delivery platform for gaming products and services can help meet the challenges that these trends present. Includes examples of gaming customers running on the AWS cloud today as well as a discussion of how you might build and scaling a gaming back-end on AWS using AWS services to enable low cost and pain free scaling of your gaming infrastructure.

Gaming in the Cloud at Websummit Dublin

Ian Massingham

Scalable Java Application Development on AWS

Mikalai Alimenkou

Learning Objectives: - Learn the basics of AWS Lambda and Amazon API Gateway - Understand how to build a web application using these services - Learn to architect a serverless application - Gain an overview of frameworks for building serverless applications What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? In this session, you will learn how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverless applications, such as the AWS Serverless Application Model (AWS SAM), Chalice, and ClaudiaJS.

Building Serverless Web Applications - May 2017 AWS Online Tech Talks

Amazon Web Services

AWS Cloud Computing for Developers

Amazon Web Services

Welcome Keynote

Amazon Web Services

AWS Community Day Bangkok 2019 - Dev Ops Philosophy Increase Productivity

AWS User Group - Thailand

2017 September Golang Sydney meetup https://www.meetup.com/golang-syd/events/243263974/ Yun Zhi Lin wrote serverless-golang to bring about the perfect combination of strongly typed idiomatic Golang with the simplicity of Serverless Framework. Serverless Golang currently forms the backbone of amaysim’s Serverless Realtime Event Driven Architecture, Anti-Corruption Layer and Single Customer View across 4 business verticals. The library comes with easy to follow real world examples, and is entirely built and deployed immutably via Docker.

Amazingly Simple Serverless Go

Yun Zhi Lin

Cosmos DB and Azure Functions A serverless database processing.pptx

icebeam7

Trying out the Go language with Google App Engine

Lynn Langit

Running your database in the cloud presentation

Manish Singh

Mais procurados (19)

AWS Lambda Containers - bridging the gap between serverless and containers on...

AWS Office Hours: Dev and Test

Real time serverless data pipelines on AWS

Rapid Prototyping for Big Data with AWS

Real time Object Detection and Analytics using RedisEdge and Docker

The future is Serveless | Filip Knapik | #4 Serverless UG Warsaw

World's best AWS Cloud Log Analytics & Management Tool

Azure functions

Scalable Application Development on AWS

Gaming in the Cloud at Websummit Dublin

Scalable Java Application Development on AWS

Building Serverless Web Applications - May 2017 AWS Online Tech Talks

AWS Cloud Computing for Developers

Welcome Keynote

AWS Community Day Bangkok 2019 - Dev Ops Philosophy Increase Productivity

Amazingly Simple Serverless Go

Cosmos DB and Azure Functions A serverless database processing.pptx

Trying out the Go language with Google App Engine

Running your database in the cloud presentation

Semelhante a Next Generation Data Warehouse Development with Lambda and Redshift

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

Databricks

Join us to learn about the state of serverless computing from Dr. Tim Wagner, General Manager of AWS Lambda. Dr. Wagner discusses the latest developments from AWS Lambda and the serverless computing ecosystem. He talks about how serverless computing is becoming a core component in how companies build and run their applications and services, and he also discusses how serverless computing will continue to evolve.

SMC301 The State of Serverless Computing

Amazon Web Services

Aws-What You Need to Know_Simon Elisha

Helen Rogers

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Amazon Web Services

Beyond Relational

Lynn Langit

AWS Webcast - Migrating to RDS Oracle

Amazon Web Services

oin us to learn about the state of serverless computing from Dougal Ballantyne, Principal Product Manager, Serverless. Dougal Ballantyne discusses the latest developments from AWS Lambda and the serverless computing ecosystem. He talks about how serverless computing is becoming a core component in how companies build and run their applications and services, and he also discusses how serverless computing will continue to evolve. Learn More: https://aws.amazon.com/government-education/

The State of Serverless Computing | AWS Public Sector Summit 2017

Amazon Web Services

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Precisely

Building your first Analysis Services Tabular BI Semantic model with SQL Serv...

Microsoft TechNet - Belgium and Luxembourg

Cloud Big Data Architectures

Lynn Langit

Learning Objectives: - Secure network access to Amazon EC2 Instances with Security Groups - Select Amazon Machine Images (AMI) with Windows - Launch and configure a Windows virtual machine Bootstrap using Powershell - Create Key Pairs for authentication AWS helps you build, deploy, scale, and manage Microsoft applications quickly, easily, more securely, and more cost-effectively. This webinar will give you everything you need to get started deploying Windows Workloads on AWS, starting with creating and securing a new EC2 Windows instance. Join the hands-on-lab webinar and receive access to valuable online training. After the session, you can take your learning even further with free access to advanced and expert-level labs.

Hands On Lab: Windows Workloads on AWS - May 2017 AWS Online Tech Talks

Amazon Web Services

Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity, automates time-consuming database administration tasks, and provides you with six familiar database engines to choose from: Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. In this session, we will take a close look at the capabilities of Amazon RDS and explain how it works. We’ll also discuss the AWS Database Migration Service and AWS Schema Conversion Tool, which help you migrate databases and data warehouses with minimal downtime from on-premises and cloud environments to Amazon RDS and other Amazon services. Gain your freedom from expensive, proprietary databases while providing your applications with the fast performance, scalability, high availability, and compatibility they need.

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...

Amazon Web Services

Modern application delivery pipelines rely on series of increasingly complex test environments. Manual processes and ad-hoc management typically lead to misconfigured environments, scheduling conflicts, and project delays. You know automation and transitioning resources to the cloud can help, but don’t know where to start and unclear how to prove the value. Join Amazon Web Services (AWS) and Plutora in this joint presentation on managing test environments and transitioning them to the cloud. Learn secrets about Why you need a single source of truth to smooth out scheduling kinks. How to improve configuration tracking and management to enhance validation efforts. The best way to manage the complexity of large-scale test environments. How to reduce costs and eliminate conflicts by identifying and moving environments to the cloud.

Solved: Your Most Dreaded Test Environment Management Challenges

DevOps.com

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Amazon Web Services

Learning Objectives: - Securing network access to Amazon EC2 Instances with Security Groups, Launch and configure a Windows virtual machine - Bootstrapping using Powershell - Creating Key Pairs for authentication AWS helps you build, deploy, scale, and manage Microsoft applications quickly, easily, more securely, and more cost-effectively. This Hands on Lab workshop will give you everything you need to get started deploying Windows Workloads on AWS, starting with creating and securing a new EC2 Windows instance.

Windows Workloads on AWS - July 2017 AWS Online Tech Talks

Amazon Web Services

AWS re:Invent 2016: The State of Serverless Computing (SVR311)

Amazon Web Services

In this webinar, learn how SnapLogic and Amazon Web Services helped Earth Networks create a responsive, self-service cloud for data integration, preparation and analytics. We also discuss how Earth Networks gained faster data insights using SnapLogic’s Amazon Redshift data integration and other connectors to quickly integrate, transfer and analyze data from multiple applications. To learn more, visit: www.snaplogic.com/redshift

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

SnapLogic

Building compelling Enterprise Solutions on AWS

Amazon Web Services

Practical Cloud

Lynn Langit

Best of re:Invent

Amazon Web Services

Semelhante a Next Generation Data Warehouse Development with Lambda and Redshift (20)

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

SMC301 The State of Serverless Computing

Aws-What You Need to Know_Simon Elisha

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Beyond Relational

AWS Webcast - Migrating to RDS Oracle

The State of Serverless Computing | AWS Public Sector Summit 2017

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Building your first Analysis Services Tabular BI Semantic model with SQL Serv...

Cloud Big Data Architectures

Hands On Lab: Windows Workloads on AWS - May 2017 AWS Online Tech Talks

ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...

Solved: Your Most Dreaded Test Environment Management Challenges

Hands on Lab: Windows Workloads - AWS Online Tech Talks

Windows Workloads on AWS - July 2017 AWS Online Tech Talks

AWS re:Invent 2016: The State of Serverless Computing (SVR311)

Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...

Building compelling Enterprise Solutions on AWS

Practical Cloud

Best of re:Invent

Último

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Scaling API-first – The story of a global engineering organization

Radu Cotescu

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Histor y of HAM Radio presentation slide

vu2urc

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Developing An App To Navigate The Roads of Brazil

V3cube

Tech Trends Report 2024 Future Today Institute.pdf

hans926745

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Next Generation Data Warehouse Development with Lambda and Redshift

1. Next Generation Data Warehouse Development with Lambda and Redshift Andras Gombosi

2. AMAZON REDSHIFT Petabyte scale Massively Parallel Data Warehouse Exceptionally fast * Horizontally Scalable Massive Storage capacity Attractive and transparent pricing SQL interface AWS ecosystem Secure *

3. Challenges – the cost of greatness SLAs Security Audit & Compliance

4. Solution: Empower the Database Developer and DBA communities with DevOps methodologies!

5. 5 AWS Cloud Lambda Trigger CD Pipeline for DB code Developer And DBA Communites Git Push Task Router DEPLOYER APP

6. Data Modeller SDLC: Forward Engineering CD PIPELINE Data Modeling Tool

7. Edge ETL Use Cases: AWS Billing Data Load SQL Forwarder CD PIPELINE

8. Deployer functionality Email

9. SDLC changes and effects UPTIME and REPUTATION CODE QUALITY AND SKILLS AUTOMATIC, PROCESS ENFORCED DATA SECURITY Push changes to repo instead of execute!

10. Well Architected : Security Pillar Endpoints

11. Well Architected : Other Pillars Security Reliability AWS Well Architected Framework Performance Efficiency Cost Optimisation Operational Excellence

12. Conclusion & Questions • AWS Ecosystem • Cloud-native • Not just for Redshift • Not “outside the box”, use multiple boxes!

Notas do Editor

Hello, my name is Andras Gombosi. I am a senior Data- and Database Engineer at TerraAlto. We are a Dublin based well established technical consultancy focusing solely on AWS. We are an AWS Advanced Consulting Partner. We are also an AWS Managed Services Provider, members of an elite group of 126 companies worldwide having this competency. We are serving clients of all sizes from start-ups to truly global enterprises, we have countless migrations under our belt in Europe, Asia and also in AWS China. We are also working on various projects in the space of Big Data, IoT- and Data Lakes and Blockchain based track-and-trace solutions. One of our core operating principles is automation. The topic I have brought today is automation in a place where automation is not as widespread yet. Data Warehousing and BI development.
The ongoing rumour is that Redshift has been named to mark that we are moving away from something Red… I have worked with those Red technologies for nearly a decade, but since I have also made a shift. Redshift in physics happens when light undergoes an increase in wavelength. This phenomenon is directly related to the expansion of space, the expansion of the universe. Redshift is an exceptionally good service for corporate data warehouseing, both as a standalone DWH and as a SQL-compatible extension of a corporate data lake. As usual , AWS does most of the heavy lifting, but Data security and cluster performance however great care and attention from the customer side as well.
The capabilities of Redshift to grow make organisations capable of having a single, true enterprise Data Warehouse, typically queried, developed and modified by multiple, often geographically distributed teams and processes, in some cases hundreds having some sort of access to it. Developers and Data Engineers modify data and change structure in Data Marts Data Analysts query data directly DBA’s change Data Security (grants and revokes) and do housekeeping (VACUUM, ANALYZE) ETL processes (Glue, EMR, Matilion, Informatica) constantly insert and update Front end BI tools (QuickSight, Tableau, Microstrategy, SpotFire) query data through data marts Control? The challenges are not new, just a bit amplified, again because of the size, and because of the open source origins of Redshift, as open source solutions are typically surrounded by a tooling ecosystem, which is not present on Redshift right now out of the box. Challenges: SLA’s on Data Availability, and Uptime of data marts or other data sources for the upstream consumers. That means ETL/ELT jobs are running in a timely and performant manner, and BI teams and other upstream consumer tools can connect and query without any disruption. Security. In this case security of the Data itself. Who has access to what? Audit and Compliance. Who changed what exactly and when? In a complicated environment it is vital to have formal, automated processes without human intervention, otherwise due to the sheer scale the proper management of these challenges become very time-consuming, and sometimes near impossible.
One possible solution is a “DevOps” style governance framework. Yes, bringing database changes under the DevOps umbrella is an increasingly popular topic. There are many tools and many ways to build a pipeline, some of them pricy, some of them complicated, some of them only work with specific DB engines, and some of them are all three of these. Nevertheless, the principles are the same for a Redshift CD pipeline too. A Code repository for code version Control and audit is the entry point, triggering an event driven , automatic , intelligent Continuous Deployment capability Ideally this is accompanied with an in-cluster Database and Schema based User- and Privilege Management Framework which is controlling access via user groups, dedicated service users and default privileges. The solution I brought today is a BASIC, practically free Cloud- and AWS Native way to get going. It does not use anything, only AWS and Python.
Code Commit is the starting point. Multiple communities use separate Repositories, and different branches are set up. Some branches are protected, cannot be directly pushed into, only via Pull Requests and Merging. Code is being pushed or merged to the appropriate branch triggers a task router Task Router: Can be CodePipeline with a Lambda as custom action for the Build stage to execute anything on a database. For most organizations Lambda might be better suitable. Your mileage may vary. We are using Lambda for this step too. Task router understands information about the commit and evaluates requests. The Commit message, for example, for the order you want to run your SQL files, or the routing information. i.e. flag your commit if it has a big task to route it towards a container instead of Lambda (Limitation here is 15 minutes execution time.) Two major types of long running executions ETL COPY’s and UNLOAD’s , CREATE TABLE AS’s are usually done by an ETL / ELT tool (Glue, Matillion, Informatica, ) HOUSEKEEPING VACUUM / ANALYZE. Some ETL tools are also capable of scheduling these operations. Big Job deployer is entirely optional in most cases, depending of what other tools are available already in-house
A few examples of possible use cases apart from normal development work. Anything which is can commit a SQL file to a repository can utilize the framework. Automatic, controlled , central deployment of generated scripts forward engineered from a database modelling tool, be it a full new schema deployment or incremental deltas to structure. No more “Oops” situations where someone have accidentally dropped a few and broke another few views on Production instead of Dev just because he started to work before the third coffee. DBSchema, Aqua, Aginity, whatever is your weapon of choice. If the tool has git integration, it will work seamlessly with the PipeLine. If a Database team gets to a higher capability maturity level and the company can justify purchasing more complicated and potentially pricy Database Release Management software solutions, the SDLC might be changing again, but up until then…
What we frequently see is that more and more customers want to have almost real-time visibility of their AWS costs. AWS provides a neat extract mechanism which dumps the billing data into an S3 bucket, hourly if required. But good guy AWS not only dumps the raw data, it also dumps the SQL Commands and Manifest files for loading the raw csv’s to Redshift A trigger on the appropriate S3 put can start a function which picks up the event, makes minor changes to the loader SQL file (adding Redshift target schema for example), and commits the modified SQL to a repository monitored by a pipeline. Similar approaches can work very well even in certain Data Lake scenarios, or if you make the loader SQL and manifest part of your interface contract between systems, a deliverable with pieces of data.
TRIGGER -In a newly pushed commit, following info is getting automatically forwarded to Lambda in the trigger event - WHO - WHEN - Which repo - Which branch - Commit ID EXECUTOR - Most of the work is done by a Lambda function, written in Python. Boto is an incredibly convenient and elegant tool to create integration between AWS services. Retrieves commit details and code from Code Commit based on Commit ID Retrieves additional config from DynamoDB, such as hooks for Slack or Teams , Redshift host, and target database and schema. Retrieves appropriate Secrets from Secrets Manager. You will have to have a naming convention in place , [repo-branch] combo works fine. Executes code against Database Initiates notifications, Slack Hook, MS Teams Hook , basically anything supporting CURL / HTTP hooks, or email Exact setup depends on networking setup including Lambda networking, client preferences and existing messaging platform usage and integration capabilities. Logs Everything in CloudWatch
It will be a change, especially for teams at the low end of the Capability Maturity Model, but a crucial change, and that is exactly the point! Improved Code Quality, "lot of tools try to differentiate themselves with automatic code review capability" In the real, complicated world it is not always that simple that it can be codified, otherwise the DBA work would not have to be black magic! And there are other options, such as the new Redshift Recommendations, or clever monitoring of certain STL and STV views, sometimes in combination with alerting on a Kibana dashboard. Skills -> Pull Requests protected branches-> 4 eye checks Console provides easy access to relatively advanced GIT features, which is important, database development teams are traditionally a little bit behind in terms of experience in DevOps A human-to-human knowledge transfer is built in the deployment process, which automatically encourages Growth in both Team Maturity and individual developer skills, and Redshift Performance. Quality Many SQL statements can be scripted in an IDEMPOTENT way, so many scripts will be re-runnable. UPTIME The main effect is a much Improved, undisturbed Availability of Data for end-user facing BI tools. Breaking a Data Mart via an incorrect VIEW definition is now much harder. This leads to and trust in the IT team. Increased Customer satisfaction Overall Data Security enforced by automatic processes on every level, including auditing and traceability.
Multi-layer security is present. VPC (This was yesterday -> Re:Invent happened while I was sleeping) - closed VPC with Service Endpoints wherever it is possible (ENI or NAT setup might be required, but a seasoned SA should breeze through these) Executor Lambda running in closed VPC, which has S3 and Secrets Manager endpoint, also Redshift enhanced VPC routing is on. Code Commit has no VPC endpoints available yet, and also in AWS China there is no CodeCommit . Companies having very strict security requirements such as data (including code) cannot travel on the open internet even if encrypted still have choices, hosting GIT on an EC2 instance within closed VPC. Triggering the executors might require manual setup of the hooks. IAM IAM provides full lock-down capabilities on both Infrastructure and Services and Resource level - Bespoke Lambda and any other service execution / resource roles - Bespoke CodeCommit users and groups for engineers and senior / approver group Directory Service Working in Federation with IAM for Single Sign-On Console access, for example to facilitate Pull Request reviews and merges. The client controls access levels via AD Groups. Redshift Redshift -> in-database user management framework with service users the Lambda executors are utilising, and pre-configured upstream user groups. Many of our clients DO NOT even have credentials for any Redshift user accounts with elevated privileges, such as Schema owners or Superusers.
Reliability Lambda scales horizontally, Automatic burst 500 – 3000 (bigger regions, such as Ireland) Scaling on Code Commit and Secrets Manager are managed by AWS Just as on Fargate and ECS Operational Excellence Deployment of CD pipelines via Parameterized CloudFormation templates, infrastructure as a code Lambda : Retry functionality and Dead Letter Queues, optionally AWS Step functions for an extra layer of state management CloudWatch and X-Ray Notifications on functional DB code failures to Dev teams via Slack / Teams notifications Notifications and alerting on Infra level problems to SysOps teams via CloudWatch and DLQ Performance Efficiency Rightly sized Lambdas and rightly sized, configured containers for the Infra Power users are using repositories which are connected to dedicated Redshift users with access to Superuser / dedicated WLM queues DynamoDB -> autoscaling might be an overkill, depends on the size of the dev teams and branches to manage, but the main thing is that the load is measurable and the functionality is there to auto-scale if required. Cost Optimization The beauty is, that this is practically free to run once you build it, cost is insignificant if there is any. Lambda , Code Commit , Triggering, CloudFormation, all the nice tools are being made available free of charge or very cheap. Minimal cost associated with the “Big Job” The SQL code itself runs on the Redshift clusters!
Also, not just for Redshift. I believe the power of the AWS EcoSystem is evident, multiple cloud-native services working perfectly in concert to create an automated, event-driven, efficient, secure and scalable solution to a challenge. AWS is the perfect place for thinking “outside the box”

Next Generation Data Warehouse Development with Lambda and Redshift

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Next Generation Data Warehouse Development with Lambda and Redshift

Semelhante a Next Generation Data Warehouse Development with Lambda and Redshift (20)

Último

Último (20)

Next Generation Data Warehouse Development with Lambda and Redshift

Notas do Editor