Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma

•

3 gostaram•1,152 visualizações

Learn about the Big Data Processing ecosystem at Netflix and how Apache Spark sits in this platform. I talk about typical data flows and data pipeline architectures that are used in Netflix and address how Spark is helping us gain efficiency in our processes. As a bonus – i’ll touch on some unconventional use-cases contrary to typical warehousing / analytics solutions that are being served by Apache Spark.

Dados e análise

Sparking up
Data Engineering
Rohan Sharma
Spark Summit East - Feb 9, 2017

Agenda
● Context: Netflix
● Netflix Data Ecosystem
● Spark Development @ Netflix
● Stranger Things with Spark
● Q&A

#NetflixEverywhere
● 93+ Million Members
● 190+ Countries
● 125+ Million streaming hours / day
● 1000 hours of Original content in 2017
● ⅓ of US internet traffic during evenings

Netflix Culture
● Freedom and Responsibility
● Context, not Control
● Highly aligned, loosely coupled
● The Paved road

#NetflixData
Product Experience Streaming Experience Content
Marketing Business Operations Other Functions...

Data Producers
● Member Devices
● CDN Servers
● Application Servers
● Device/Server Telemetry
● Application Data
● Vendors / Partner Data

Data Processing
● Stream Processing - Shriya Arora
● Recommendation Systems - DB Tsai & Gary Yeh
● Batch Processing
● Experimentation Analytics
● Operational Analytics

Data Platform (Batch Processing)
Storage
Compute
Service
Tools
APIBig Data Portal
S3 Parquet
Transport VisualizationQuality Pig Workflow Vis Job/Cluster Vis
Interface
Execution Metadata
Notebooks Tableau Micro Strategy JavaScript Applications

Cloud Warehouse
● 60+ Petabytes
● Hadoop on S3 - separate compute from storage
● Multiple workloads on same cluster
● Tens of thousands of Spark, Pig, Hive, Presto jobs
● Production Cluster : 2600 d2.4xl nodes
● Query Cluster : 400 m4.16xl nodes

Spark Development
● Focus on stakeholders - not process & operations
● Reuse boilerplate code & best practices
● Reuse build & deployment practices
● Make Spark easy-to-use & cruise

Spark Development Template
Abstract Spark Project (Template)
Netflix Data Utils (cross-platform business logic)
Spark Utils (spark boilerplate + interface implementations)
Code Reuse
Registry + Build
Properties
Project Instances
Build & Deploy
Reuse
S3
Scheduler
FRAMEWORK
USER
FRAMEWORK
. . .
Jenkins

Development Lifecycle
Develop
&
Test
Commit, Build,
Deploy & Run
Zeppelin Spark ShellIntelliJ
Integration Test
Bit Bucket Jenkins S3 / Artifactory
Execute

Whats Next...
● Spark added to paved road in early 2016
● Improved query performance / cluster utilization
● Pyspark templates & data utils

Code Refactoring
● Download Source Code
● Semi-Compile to Abstract Syntax Trees
● Check for re-factoring rules
● Codegen refactored lines and update source
● Invoke dockerized CI service to build
● Raise Pull Requests with inferred reviewers
More Details:
Jonathan Schneider
Linked In:
https://www.linkedin.com/in/jonkschneider/
Youtube:
https://www.youtube.com/watch?v=JbcKFKiBU60

Questions
To learn more, please visit:
youtube channel
NetflixData
twitter handle
@NetflixData

Mais conteúdo relacionado

Mais procurados

As enterprises move to cloud-based analytics, the risk of cloud security breaches poses a serious threat. Encrypting data at rest and in transit is a major first step. However, data must still be decrypted in memory for processing, exposing it to an attacker who has compromised the operating system or hypervisor. Trusted hardware such as Intel SGX has recently become available in latest-generation processors. Such hardware enables arbitrary computation on encrypted data while shielding it from a malicious OS or hypervisor. However, it still suffers from a significant side channel: access pattern leakage. We present Opaque, a package for Apache Spark SQL that enables very strong security for SQL queries: data encryption, computation verification, and access pattern leakage protection (a.k.a. obliviousness). Opaque achieves these guarantees by introducing new oblivious distributed relational operators that provide 2000x performance gain over state of the art oblivious systems, as well as novel query planning techniques for these operators implemented using Catalyst.

Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...

Spark Summit

Spark Summit EU talk by Bas Geerdink

Spark Summit

Since mid-2016, Spark-as-a-Service has been available to researchers in Sweden from the Rise SICS ICE Data Center at www.hops.site. In this session, Dowling will discuss the challenges in building multi-tenant Spark structured streaming applications on YARN that are metered and easy-to-debug. The platform, called Hopsworks, is in an entirely UI-driven environment built with only open-source software. Learn how they use the ELK stack (Elasticsearch, Logstash and Kibana) for logging and debugging running Spark streaming applications; how they use Grafana and InfluxDB for monitoring Spark streaming applications; and, finally, how Apache Zeppelin can provide interactive visualizations and charts to end-users. This session will also show how Spark applications are run within a ‘project’ on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In addition, hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.afka topics are protected from access by users that are not members of the project. We will also discuss the experiences of our users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling

Databricks

Attestation Legale is a social networking service for companies that alleviates the administrative burden European countries are imposing on client supplier relationships. It helps companies from construction, staffing and transport industries, digitalize, secure and share their legal documents. With clients ranging from one-person businesses to industry leaders such as Orange or Bouygues Construction, they ease business relationships for a social network of companies that would be equivalent to a 34 billion dollar industry. While providing a high quality of service through our SAAS platform, we faced many challenges including refactoring our monolith into microservices, a daunting architectural task a lot of organizations are facing today. Strategies for tackling that problem primarily revolve around extracting business logic from the monolith or building new applications with their own logic that interfaces with the legacy. Sometimes however, especially in companies sustaining an important growth, new business opportunities arise and the required logic from your microservices might greatly differs from the legacy. We will discuss how we used Spark Streaming and Kafka to build a real time business logic translation engine that allows loose technical and business coupling between our microservices and legacy code. You will also hear about how making Apache Spark a part of our consumer facing product also came with technical challenges, especially when it comes to reliability. Finally, we will share the lambda architecture that allowed us to use move data in batch (migrating data from the monolith for initialization) and real time (handling data generated after through use). Key takeaways include: – Breaking down this strategy and its derived technical and business profits – Feedback on how we achieved reliability – Examples of implementations using RabbitMQ (then Kafka) and GraphX – Testing business rules and data transformation.

Building a Business Logic Translation Engine with Spark Streaming for Communi...

Spark Summit

Spark Under the Hood - Meetup @ Data Science London

Databricks

by Anya Bida and Rachel Warren from Alpine Data https://spark-summit.org/east-2016/events/spark-tuning-for-enterprise-system-administrators/ Spark offers the promise of speed, but many enterprises are reluctant to make the leap from Hadoop to Spark. Indeed, System Administrators will face many challenges with tuning Spark performance. This talk is a gentle introduction to Spark Tuning for the Enterprise System Administrator, based on experience assisting two enterprise companies running Spark in yarn-cluster mode. The initial challenges can be categorized in two FAQs. First, with so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Second, once I know which Spark Tuning parameters I need, how do I enforce them for the various users submitting various jobs to my cluster? This introduction to Spark Tuning will enable enterprise system administrators to overcome common issues quickly and focus on more advanced Spark Tuning challenges. The audience will understand the “cheat-sheet” posted here: http://techsuppdiva.github.io/ Key takeaways: FAQ 1: With so many Spark Tuning parameters, how do I know which parameters are important for which jobs? Solution 1: The Spark Tuning cheat-sheet! A visualization that guides the System Administrator to quickly overcome the most common hurdles to algorithm deployment. [1]http://techsuppdiva.github.io/ FAQ 2: Once I know which Spark Tuning parameters I need, how do I enforce them at the user level? job level? algorithm level? project level? cluster level? Solution 2: We’ll approach these challenges using job & cluster configuration, the Spark context, and 3rd party tools – of which Alpine will be one example. We’ll operationalize Spark parameters according to user, job, algorithm, workflow pipeline, or cluster levels.

Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Anya Bida

Spark Summit EU talk by Jim Dowling

Spark Summit

There’s a growing number of data scientists that use R as their primary language. While the SparkR API has made tremendous progress since release 1.6, with major advancements in Apache Spark 2.0 and 2.1, it can be difficult for traditional R programmers to embrace the Spark ecosystem. In this session, Zaidi will discuss the sparklyr package, which is a feature-rich and tidy interface for data science with Spark, and will show how it can be coupled with Microsoft R Server and extended with it’s lower-level API to become a full, first-class citizen of Spark. Learn how easy it is to go from single-threaded, memory-bound R functions to multi-threaded, multi-node, out-of-memory applications that can be deployed in a distributed cluster environment with minimal amount of code changes. You’ll also get best practices for reproducibility and performance by looking at a real-world case study of default risk classification and prediction entirely through R and Spark.

Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...

Databricks

R is the latest language added to Apache Spark, and the SparkR API is slightly different from PySpark. With the release of Spark 2.0, the R API officially supports executing user code on distributed data. This is done through a family of apply() functions. In this talk, Hossein Falaki gives an overview of this new functionality in SparkR. Using this API requires some changes to regular code with dapply(). This talk will focus on how to correctly use this API to parallelize existing R packages. Most important topics of consideration will be performance and correctness when using the apply family of functions in SparkR. Speaker: Hossein Falaki This talk was originally presented at Spark Summit East 2017.

Parallelizing Existing R Packages with SparkR

Databricks

Spark Summit EU talk by Rolf Jagerman

Spark Summit

Nomad is a modern cluster manager by HashiCorp, designed for both long-lived services and short-lived batch processing workloads. The Nomad team has been working to bring a native integration between Nomad and Apache Spark. By running Spark jobs on Nomad, both Spark developers and the engineering organization benefit. Nomad’s architecture allows it to have an incredibly high scheduling throughput. To demonstrate this, HashiCorp scheduled 1 million containers in less than five minutes. That speed means that large Spark workloads can be immediately placed, minimizing job runtime and job start latencies. For an organization, Nomad offers many benefits. Since Nomad was designed for both batch and services, a single cluster can service both an organization’s Spark workload and all service-oriented jobs. That, coupled with the fact that Nomad uses bin-packing to place multiple jobs on each machine, means that organizations can achieve higher density. Which saves money and makes capacity planning easier. In the future, Nomad will also have the ability to enforce quotas and apply chargebacks, allowing multi-tenant clusters to be easily managed. To further increase the performance of Spark on Nomad, HashiCorp would like to ingest HDFS locality information to place the compute by the data.

Homologous Apache Spark Clusters Using Nomad with Alex Dadgar

Databricks

Spark Summit EU talk by Jakub Hava

Spark Summit

Pour ce mois de mars, nous vous proposons une thématique Big Data autour de Spark et du Machine Learning ! Nous attaquerons par une présentation d'Apache Spark 1.5 : son architecture distribuée et ses possibilités n'auront plus de secret pour vous. Nous enchaînerons ensuite avec les fondamentaux du Machine Learning : vocabulaire (pour enfin comprendre ce que raconte les data scientists / dataminer ! ), usages et explication des algorithmes les plus populaires ... Promis la présentation ne comporte pas de formules de maths barbares ;) Puis nous mettrons en pratique ces deux présentations en développant ensemble votre première application prédictive avec Apache Spark et Apache Zeppelin !

NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin

Zenika

Spark Summit EU talk by John Musser

Spark Summit

Lyft is on the mission to improve people's lives with the world's best transportation. As part of this mission Lyft invests heavily in open source infrastructure and tooling. At Lyft Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark at Lyft has evolved to solve both Machine Learning and large scale ETL workloads. By combining the flexibility of Kubernetes with the data processing power of Apache Spark, Lyft is able to drive ETL data processing to a different level. In this talk, Li Gao and Rohit Menon will talk about challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Topics Include: - Key traits of Apache Spark on Kubernetes. - Deep dive into Lyft's multi-cluster setup and operationality to handle petabytes of production data. - How Lyft extends and enhances Apache Spark to support capabilities such as Spark pod life cycle metrics and state management, resource prioritization, and queuing and throttling. - Dynamic job scale estimation and runtime dynamic job configuration. - How Lyft powers internal Data Scientists, Business Analysts, and Data Engineers via a multi-cluster setup. Speakers: Li Gao, Rohit Menon

Scaling Apache Spark on Kubernetes at Lyft

Databricks

At Databricks, we have a unique view into hundreds different companies using Apache Spark for development and production use-cases, from their support tickets and forum posts. Having seen so many different workflows and applications, some discernible patterns emerge when looking at common manageability, debugging, and visibility issues that our users run into. This talk will first show some representatives of these common issues. Then, we will show you what we have done and have been working on in Databricks to make Spark clusters easier to manage, monitor, and debug.

Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...

Databricks

We all dread “Lost task” and “Container killed by YARN for exceeding memory limits” messages in our scaled-up spark yarn applications. Even answering the question “How much memory did my application use?” is surprisingly tricky in the distributed yarn environment. Sqrrl has developed a testing framework for observing vital statistics of spark jobs including executor-by-executor memory and CPU usage over time for both the JDK and python portions of pyspark yarn containers. This talk will detail the methods we use to collect, store, and report spark yarn resource usage. This information has proved to be invaluable for performance and regression testing of the spark jobs in Sqrrl Enterprise.

Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...

Spark Summit

With components like Spark SQL, MLlib, and Streaming, Spark is a unified engine for building data applications. In this talk, we will take a look at how we use Spark on our own Databricks platform throughout our data pipeline for use cases such as ETL, data warehousing, and real time analysis. We will demonstrate how these applications empower engineering and data analytics. We will also share some lessons learned from building our data pipeline around security and operations. This talk will include examples on how to use Structured Streaming (a.k.a Streaming DataFrames) for online analysis, SparkR for offline analysis, and how we connect multiple sources to achieve a Just-In-Time Data Warehouse.

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Databricks

Spark Summit EU talk by Heiko Korndorf

Spark Summit

At the end of day, the only thing that data scientists want is tabular data for their analysis. They do not want to spend hours or days preparing data. How does a data engineer handle the massive amount of data that is being streamed at them from IoT devices and apps, and at the same time add structure to it so that data scientists can focus on finding insights and not preparing data? By the way, you need to do this within minutes (sometimes seconds). Oh… and there are a lot of other data sources that you need to ingest, and the current providers of data are changing their structure. GoPro has massive amounts of heterogeneous data being streamed from their consumer devices and applications, and they have developed the concept of “dynamic DDL” to structure their streamed data on the fly using Spark Streaming, Kafka, HBase, Hive and S3. The idea is simple: Add structure (schema) to the data as soon as possible; allow the providers of the data to dictate the structure; and automatically create event-based and state-based tables (DDL) for all data sources to allow data scientists to access the data via their lingua franca, SQL, within minutes.

Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...

Databricks

Mais procurados (20)

Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...

Spark Summit EU talk by Bas Geerdink

Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling

Building a Business Logic Translation Engine with Spark Streaming for Communi...

Spark Under the Hood - Meetup @ Data Science London

Spark Tuning For Enterprise System Administrators, Spark Summit East 2016

Spark Summit EU talk by Jim Dowling

Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...

Parallelizing Existing R Packages with SparkR

Spark Summit EU talk by Rolf Jagerman

Homologous Apache Spark Clusters Using Nomad with Alex Dadgar

Spark Summit EU talk by Jakub Hava

NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin

Spark Summit EU talk by John Musser

Scaling Apache Spark on Kubernetes at Lyft

Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...

Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...

A Journey into Databricks' Pipelines: Journey and Lessons Learned

Spark Summit EU talk by Heiko Korndorf

Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...

Semelhante a Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma

Apache Spark has been gaining steam, with rapidity, both in the headlines and in real-world adoption. Spark was developed in 2009, and open sourced in 2010. Since then, it has grown to become one of the largest open source communities in big data with over 200 contributors from more than 50 organizations. This open source analytics engine stands out for its ability to process large volumes of data significantly faster than contemporaries such as MapReduce, primarily owing to in-memory storage of data on its own processing framework. That being said, one of the top real-world industry use cases for Apache Spark is its ability to process ‘streaming data‘.

Structured Streaming in Spark

Digital Vidya

Session Video: https://youtu.be/7MPH1mknIxE In this talk, we share Devsisters' journey of migrating its internal data platform including Spark to Kubernetes, with its benefits and issues. 데브시스터즈에서 데이터플랫폼 컴포넌트를 쿠버네티스로 옮기면서 얻은 장점들과 이슈들에 대해 공유합니다. Conference session page: - English: https://sched.co/WIRK - Korean: https://sched.co/WYRc

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

SeungYong Oh

Talk 1. Scaling Apache Spark on Kubernetes at Lyft As part of this mission Lyft invests heavily in open source infrastructure and tooling. At Lyft Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark at Lyft has evolved to solve both Machine Learning and large scale ETL workloads. By combining the flexibility of Kubernetes with the data processing power of Apache Spark, Lyft is able to drive ETL data processing to a different level. In this talk, We will talk about challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Topics Include: - Key traits of Apache Spark on Kubernetes. - Deep dive into Lyft's multi-cluster setup and operationality to handle petabytes of production data. - How Lyft extends and enhances Apache Spark to support capabilities such as Spark pod life cycle metrics and state management, resource prioritization, and queuing and throttling. - Dynamic job scale estimation and runtime dynamic job configuration. - How Lyft powers internal Data Scientists, Business Analysts, and Data Engineers via a multi-cluster setup. Speaker: Li Gao Li Gao is the tech lead in the cloud native spark compute initiative at Lyft. Prior to Lyft, Li worked at Salesforce, Fitbit, Marin Software, and a few startups etc. on various technical leadership positions on cloud native and hybrid cloud data platforms at scale. Besides Spark, Li has scaled and productionized other open source projects, such as Presto, Apache HBase, Apache Phoenix, Apache Kafka, Apache Airflow, Apache Hive, and Apache Cassandra.

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft

Chester Chen

Seattle Spark Meetup Mobius CSharp API

shareddatamsft

Scaling spark on kubernetes at Lyft

Li Gao

Yao Yao Mooyoung Lee https://github.com/yaowser/learn-spark/tree/master/Final%20project https://www.youtube.com/watch?v=IVMbSDS4q3A https://www.academia.edu/35646386/Teaching_Apache_Spark_Demonstrations_on_the_Databricks_Cloud_Platform https://www.slideshare.net/YaoYao44/teaching-apache-spark-demonstrations-on-the-databricks-cloud-platform-86063070/ Apache Spark is a fast and general engine for big data analytics processing with libraries for SQL, streaming, and advanced analytics Cloud Computing, Structured Streaming, Unified Analytics Integration, End-to-End Applications

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform

Yao Yao

Stitch Fix aspires to help you find the style that you will love. Data, the backbone of the business, is used to help with styling recommendations, demand modeling, user acquisition, and merchandise planning and also to influence business decisions throughout the organization. These decisions are backed by algorithms and data collected and interpreted based on client preferences. This talk offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way. Apache Spark plays an important role in Stitch Fix’s data platform, and the company’s data scientists use Spark for their ETL and Presto for their ad hoc queries. The goal for the team running the compute infrastructure is to understand and make the data scientists’ lives easier, particularly in terms of usability of Spark, by building tools that make it easier to get started with Spark and transition themselves to a daily workflow. The compute infrastructure is a part of the data platform that is responsible for all the needs of data scientists as Stitch Fix. In this talk, we look at Stitch Fix’s journey, exploring its Spark setup, in-house tools and how they work in synergy with open source frameworks in a cloud environment. There are additional improvements to the infrastructure that help persist information for future use and optimization and we look at how the implementation of Amazon’s EMR FS has helped make it easier for us to read from the S3 source.

A compute infrastructure for data scientists

Stitch Fix Algorithms

Apache Spark is the next big data processing tool for Data Scientist. As seen on the recent StackOverflow analysis, it's the hottest big data technology on their site! In this talk, I'll use the PySpark interface to leverage the speed and performance of Apache Spark. I'll focus on the end to end workflow for getting data into a distributed platform, and leverage Spark to process the data for advanced analytics. I'll discuss the popular Spark APIs used for data preparation, SQL analysis, and ML algorithms. I'll explain the performance differences between Scala and Python, and how Spark has bridged the gap in performance. I'll focus on PySpark as the interface to the platform, and walk through a demo to showcase the APIs. Talk Overview: Spark's Architecture. What's out now and what's in Spark 2.0Spark APIs: Most common APIs used by Spark Common misconceptions and proper techniques for using Spark. Demo: Walk through ETL of the Reddit dataset. SparkSQL Analytics + Visualizations of the Dataset using MatplotLibSentiment Analysis on Reddit Comments

The Nitty Gritty of Advanced Analytics Using Apache Spark in Python

Miklos Christine

As companies adopt data processing technologies and add data-driven features to user-facing products, the need for effective automated test techniques for data processing applications increase. We go through anatomy of scalable data streaming applications, and how to set up test harnesses for reliable integration testing of such applications. We cover a few common anti-patterns that make asynchronous tests fragile, and corresponding patterns for remediation. We will also mention virtualisation components suitable for our testing scenarios.

Testing data streaming applications

Lars Albertsson

Running Apache Spark on Kubernetes: Best Practices and Pitfalls

Databricks

In this in-depth workshop you will gain hands on experience with using Spark and Cassandra inside the DataStax Enterprise Platform. The focus of the workshop will be working through data analytics exercises to understand the major developer developer considerations. You will also gain an understanding of the internals behind the integration that allow for large scale data loading and analysis. It will also review some of the major machine learning libraries in Spark as an example of data analysis. The workshop will start with a review the basics of how Spark and Cassandra are integrated. Then we will work through a series of exercises that will show how to perform large scale Data Analytics with Spark and Cassandra. A major part of the workshop will be to understand effective data modeling techniques in Cassandra that allow for fast parallel loading of the data into Spark to perform large scale analytics on that data. The exercises will also look at how to how to use the open source Spark Notebook to run interactive data analytics with the DataStax Enterprise Platform.

DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...

DataStax Academy

Apache Spark 2.x has laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data. In this introductory part lecture and part hands-on workshop you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas: Apache Spark Fundamentals & Concepts What’s new in Spark 2.x SparkSessions vs SparkContexts Datasets/Dataframes and Spark SQL Introduction to Structured Streaming concepts and APIs

Jump Start with Apache Spark 2.0 on Databricks

Anyscale

Lambda architecture with Spark

Vincent GALOPIN

Databricks Meetup @ Los Angeles Apache Spark User Group

Paco Nathan

Traveloka's journey to no ops streaming analytics

Rendy Bambang Junior

End-to-End Data Pipelines with Apache Spark

Burak Yavuz

Triangle Devops Meetup 10/2015

aspyker

Putting the Spark into Functional Fashion Tech Analystics

Gareth Rogers

Pivotal Real Time Data Stream Analytics

kgshukla

In machine learning projects, the preparation of large datasets is a key phase which can be complex and expensive. It was traditionally done by data engineers before the handover to data scientists or ML engineers. They operated in different environments due to the differences in the tools, frameworks and runtimes required in each phase. Spark's support for different types of workloads brought data engineering closer to the downstream activities like machine learning that depended on the data. Unifying data acquisition, preprocessing, training models and batch inferencing under a single platform enabled by Spark not only provided seamless experience between different phases and helped accelerate the end-to-end ML lifecycle but also lowered the TCO in the building, managing the infrastructure to cover different phases. With that, the needs of a shared infrastructure expanded to include specialized hardware like GPUs and support deep learning workloads as well. Spark can effectively make use of such infrastructure as it integrates with popular deep learning frameworks and supports acceleration of deep learning jobs using GPUs. In this talk, we share learnings and experiences in supporting different types of workloads in shared clusters equipped for doing deep learning as well as data engineering. We will cover the following topics: * Considerations for sharing the infrastructure for big data and deep learning in Spark * Deep learning in Spark in clusters with and without GPUs * Differences between distributed data processing and distributed machine learning * Multitenancy and isolation in shared infrastructure

Infrastructure for Deep Learning in Apache Spark

Databricks

Semelhante a Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma (20)

Structured Streaming in Spark

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft

Seattle Spark Meetup Mobius CSharp API

Scaling spark on kubernetes at Lyft

Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform

A compute infrastructure for data scientists

The Nitty Gritty of Advanced Analytics Using Apache Spark in Python

Testing data streaming applications

Running Apache Spark on Kubernetes: Best Practices and Pitfalls

DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...

Jump Start with Apache Spark 2.0 on Databricks

Lambda architecture with Spark

Databricks Meetup @ Los Angeles Apache Spark User Group

Traveloka's journey to no ops streaming analytics

End-to-End Data Pipelines with Apache Spark

Triangle Devops Meetup 10/2015

Putting the Spark into Functional Fashion Tech Analystics

Pivotal Real Time Data Stream Analytics

Infrastructure for Deep Learning in Apache Spark

Mais de Spark Summit

In this session we will present a Configurable FPGA-Based Spark SQL Acceleration Architecture. It is target to leverage FPGA highly parallel computing capability to accelerate Spark SQL Query and for FPGA’s higher power efficiency than CPU we can lower the power consumption at the same time. The Architecture consists of SQL query decomposition algorithms, fine-grained FPGA based Engine Units which perform basic computation of sub string, arithmetic and logic operations. Using SQL query decomposition algorithm, we are able to decompose a complex SQL query into basic operations and according to their patterns each is fed into an Engine Unit. SQL Engine Units are highly configurable and can be chained together to perform complex Spark SQL queries, finally one SQL query is transformed into a Hardware Pipeline. We will present the performance benchmark results comparing the queries with FGPA-Based Spark SQL Acceleration Architecture on XEON E5 and FPGA to the ones with Spark SQL Query on XEON E5 with 10X ~ 100X improvement and we will demonstrate one SQL query workload from a real customer.

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

Spark Summit

In this talk, we’ll present techniques for visualizing large scale machine learning systems in Spark. These are techniques that are employed by Netflix to understand and refine the machine learning models behind Netflix’s famous recommender systems that are used to personalize the Netflix experience for their 99 millions members around the world. Essential to these techniques is Vegas, a new OSS Scala library that aims to be the “missing MatPlotLib” for Spark/Scala. We’ll talk about the design of Vegas and its usage in Scala notebooks to visualize Machine Learning Models.

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Spark Summit

This presentation introduces how we design and implement a real-time processing platform using latest Spark Structured Streaming framework to intelligently transform the production lines in the manufacturing industry. In the traditional production line there are a variety of isolated structured, semi-structured and unstructured data, such as sensor data, machine screen output, log output, database records etc. There are two main data scenarios: 1) Picture and video data with low frequency but a large amount; 2) Continuous data with high frequency. They are not a large amount of data per unit. However the total amount of them is very large, such as vibration data used to detect the quality of the equipment. These data have the characteristics of streaming data: real-time, volatile, burst, disorder and infinity. Making effective real-time decisions to retrieve values from these data is critical to smart manufacturing. The latest Spark Structured Streaming framework greatly lowers the bar for building highly scalable and fault-tolerant streaming applications. Thanks to the Spark we are able to build a low-latency, high-throughput and reliable operation system involving data acquisition, transmission, analysis and storage. The actual user case proved that the system meets the needs of real-time decision-making. The system greatly enhance the production process of predictive fault repair and production line material tracking efficiency, and can reduce about half of the labor force for the production lines.

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Spark Summit

As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

Spark Summit

Graph is on the rise and it’s time to start learning about scalable graph analytics! In this session we will go over two Spark-based Graph Analytics frameworks: Tinkerpop and GraphFrames. While both frameworks can express very similar traversals, they have different performance characteristics and APIs. In this Deep-Dive by example presentation, we will demonstrate some common traversals and explain how, at a Spark level, each traversal is actually computed under the hood! Learn both the fluent Gremlin API as well as the powerful GraphFrame Motif api as we show examples of both simultaneously. No need to be familiar with Graphs or Spark for this presentation as we’ll be explaining everything from the ground up!

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

Spark Summit

Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints. This talk will cover the presentation already shown on Spark Summit SF’17 (#SFds5) but from more technical perspective.

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Spark Summit

In Sweden, from the Rise ICE Data Center at www.hops.site, we are providing to reseachers both Spark-as-a-Service and, more recently, Tensorflow-as-a-Service as part of the Hops platform. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows, from batch to streaming to structured streaming applications. We will analyse the different frameworks for integrating Spark with Tensorflow, from Tensorframes to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We introduce the different programming models supported and highlight the importance of cluster support for managing different versions of python libraries on behalf of users. We will also present cluster management support for sharing GPUs, including Mesos and YARN (in Hops Hadoop). Finally, we will perform a live demonstration of training and inference for a TensorflowOnSpark application written on Jupyter that can read data from either HDFS or Kafka, transform the data in Spark, and train a deep neural network on Tensorflow. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training.

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

Apache Spark and Tensorflow as a Service with Jim Dowling

Spark Summit

With the rapid growth of available datasets, it is imperative to have good tools for extracting insight from big data. The Spark ML library has excellent support for performing at-scale data processing and machine learning experiments, but more often than not, Data Scientists find themselves struggling with issues such as: low level data manipulation, lack of support for image processing, text analytics and deep learning, as well as the inability to use Spark alongside other popular machine learning libraries. To address these pain points, Microsoft recently released The Microsoft Machine Learning Library for Apache Spark (MMLSpark), an open-source machine learning library built on top of SparkML that seeks to simplify the data science process and integrate SparkML Pipelines with deep learning and computer vision libraries such as the Microsoft Cognitive Toolkit (CNTK) and OpenCV. With MMLSpark, Data Scientists can build models with 1/10th of the code through Pipeline objects that compose seamlessly with other parts of the SparkML ecosystem. In this session, we explore some of the main lessons learned from building MMLSpark. Join us if you would like to know how to extend Pipelines to ensure seamless integration with SparkML, how to auto-generate Python and R wrappers from Scala Transformers and Estimators, how to integrate and use previously non-distributed libraries in a distributed manner and how to efficiently deploy a Spark library across multiple platforms.

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Spark Summit

The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service. The main purpose of the system is to store and present Controls/Infrastructure related data gathered from thousands of devices in the whole accelerator complex. The data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments. During this talk, Jakub will speak about NXCALS requirements and design choices that lead to the selected architecture based on Hadoop and Spark. He will present the Ingestion API, the abstractions behind the Meta-data Service and the Spark-based Extraction API where simple changes to the schema handling greatly improved the overall usability of the system. The system itself is not CERN specific and can be of interest to other companies or institutes confronted with similar Big Data problems.

Next CERN Accelerator Logging Service with Jakub Wozniak

Spark Summit

In Between (A mobile App for couples, downloaded 20M in Global), from daily batch for extracting metrics, analysis and dashboard. Spark is widely used by engineers and data analysts in Between, thanks to the performance and expendability of Spark, data operating has become extremely efficient. Entire team including Biz Dev, Global Operation, Designers are enjoying data results so Spark is empowering entire company for data driven operation and thinking. Kevin, Co-founder and Data Team leader of Between will be presenting how things are going in Between. Listeners will know how small and agile team is living with data (how we build organization, culture and technical base) after this presentation.

Powering a Startup with Apache Spark with Kevin Kim

Spark Summit

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Spark Summit

In many cases, Big Data becomes just another buzzword because of the lack of tools that can support both the technological requirements for developing and deploying of the projects and/or the fluency of communication between the different profiles of people involved in the projects. In this talk, we will present Moriarty, a set of tools for fast prototyping of Big Data applications that can be deployed in an Apache Spark environment. These tools support the creation of Big Data workflows using the already existing functional blocks or supporting the creation of new functional blocks. The created workflow can then be deployed in a Spark infrastructure and used through a REST API. For better understanding of Moriarty, the prototyping process and the way it hides the Spark environment to the Big Data users and developers, we will present it together with a couple of examples based on a Industry 4.0 success cases and other on a logistic success case.

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

Spark Summit

Large-scale testing of new data products or enhancements to existing products in a research and development environment can be a technical challenge for data scientists. In some cases, tools available to data scientists lack production-level capacity, whereas other tools do not provide the algorithms needed to run the methodology. At Nielsen, the Databricks platform provided a solution to both of these challenges. This breakout session will cover a specific Nielsen business case where two methodology enhancements were developed and tested at large-scale using the Databricks platform. Development and large-scale testing of these enhancements would not have been possible using standard database tools.

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spark Summit

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Spark Summit

Since the invention of SQL and relational databases, data production has been about specifying how data is transformed through queries. While Apache Spark can certainly be used as a general distributed query engine, the power and granularity of Spark’s APIs enables a revolutionary increase in data engineering productivity: goal-based data production. Goal-based data production concerns itself with specifying WHAT the desired result is, leaving the details of HOW the result is achieved to a smart data warehouse running on top of Spark. That not only substantially increases productivity, but also significantly expands the audience that can work directly with Spark: from developers and data scientists to technical business users. With specific data and architecture patterns spanning the range from ETL to machine learning data prep and with live demos, this session will demonstrate how Spark users can gain the benefits of goal-based data production.

Goal Based Data Production with Sim Simeonov

Spark Summit

Have you imagined a simple machine learning solution able to prevent revenue leakage and monitor your distributed application? To answer this question, we offer a practical and a simple machine learning solution to create an intelligent monitoring application based on simple data analysis using Apache Spark MLlib. Our application uses linear regression models to make predictions and check if the platform is experiencing any operational problems that can impact in revenue losses. The application monitor distributed systems and provides notifications stating the problem detected, that way users can operate quickly to avoid serious problems which directly impact the company’s revenue and reduce the time for action. We will present an architecture for not only a monitoring system, but also an active actor for our outages recoveries. At the end of the presentation you will have access to our training program source code and you will be able to adapt and implement in your company. This solution already helped to prevent about US$3mi in losses last year.

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Spark Summit

Getting Ready to use Redis with Apache Spark is a technical tutorial designed to address integrating Redis with an Apache Spark deployment to increase the performance of serving complex decision models. To set the context for the session, we start with a quick introduction to Redis and the capabilities Redis provides. We cover the basic data types provided by Redis and cover the module system. Using an ad serving use-case, we look at how Redis can improve the performance and reduce the cost of using complex ML-models in production. Attendees will be guided through the key steps of setting up and integrating Redis with Spark, including how to train a model using Spark then load and serve it using Redis, as well as how to work with the Spark Redis module. The capabilities of the Redis Machine Learning Module (redis-ml) will be discussed focusing primarily on decision trees and regression (linear and logistic) with code examples to demonstrate how to use these feature. At the end of the session, developers should feel confident building a prototype/proof-of-concept application using Redis and Spark. Attendees will understand how Redis complements Spark and how to use Redis to serve complex, ML-models with high performance.

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Spark Summit

Here we present a general supervised framework for record deduplication and author-disambiguation via Spark. This work differentiates itself by – Application of Databricks and AWS makes this a scalable implementation. Compute resources are comparably lower than traditional legacy technology using big boxes 24/7. Scalability is crucial as Elsevier’s Scopus data, the biggest scientific abstract repository, covers roughly 250 million authorships from 70 million abstracts covering a few hundred years. – We create a fingerprint for each content by deep learning and/or word2vec algorithms to expedite pairwise similarity calculation. These encoders substantially reduce compute time while maintaining semantic similarity (unlike traditional TFIDF or predefined taxonomies). We will briefly discuss how to optimize word2vec training with high parallelization. Moreover, we show how these encoders can be used to derive a standard representation for all our entities namely such as documents, authors, users, journals, etc. This standard representation can simplify the recommendation problem into a pairwise similarity search and hence it can offer a basic recommender for cross-product applications where we may not have a dedicate recommender engine designed. – Traditional author-disambiguation or record deduplication algorithms are batch-processing with small to no training data. However, we have roughly 25 million authorships that are manually curated or corrected upon user feedback. Hence, it is crucial to maintain historical profiles and hence we have developed a machine learning implementation to deal with data streams and process them in mini batches or one document at a time. We will discuss how to measure the accuracy of such a system, how to tune it and how to process the raw data of pairwise similarity function into final clusters. Lessons learned from this talk can help all sort of companies where they want to integrate their data or deduplicate their user/customer/product databases.

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

Spark Summit

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This work presents new efficient and scalable matrix processing and optimization techniques based on Spark. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix techniques inside the Spark SQL, and optimize the matrix execution plan based on Spark SQL Catalyst. We conduct case studies on a series of ML models and matrix computations with special features on different datasets. These are PageRank, GNMF, BFGS, sparse matrix chain multiplications, and a biological data analysis. The open-source library ScaLAPACK and the array-based database SciDB are used for performance evaluation. Our experiments are performed on six real-world datasets are: social network data ( e.g., soc-pokec, cit-Patents, LiveJournal), Twitter2010, Netflix recommendation data, and 1000 Genomes Project sample. Experiments demonstrate that our proposed techniques achieve up to an order-of-magnitude performance.

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Spark Summit

Mais de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...

Apache Spark and Tensorflow as a Service with Jim Dowling

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...

Next CERN Accelerator Logging Service with Jakub Wozniak

Powering a Startup with Apache Spark with Kevin Kim

Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...

How Nielsen Utilized Databricks for Large-Scale Research and Development with...

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...

Goal Based Data Production with Sim Simeonov

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...

Getting Ready to Use Redis with Apache Spark with Dvir Volk

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...

Último

Building Real-Time Pipelines With FLaNK Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making. Apache NiFi Apache Kafka Apache Flink Apache Iceberg LLM Generative AI Slack Postgresql

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Timothy Spann

+97470301568 Qatar THC Oil and weed in Qatar? Where can I get THC vape in Doha Qatar?WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in France,Germany, Poland Serbia,Romania, Ukraine WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia Qatar Where can I get weed in Qatar? Where can I get THC vape in Doha Qatar?WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in France,Germany, Poland Serbia,Romania, UkraineWhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia Qatar WhatsApp+97470301568 WhatsApp +97470301568 Buy Weed, Cocaine, Heroin and Shrooms in Qatar Dubai UAE Malaysia Oman Kuwait Bahrain Saudi Arabia #Singapore #Jordan #Ireland, #Belgium, #United Kingdom, #Iceland, #*Portugal, Spain, China, Japan, Turkey, Canada United States, Morocco, France,Germany, Poland Serbia,Romania, Ukraine, and all countries United Arab Emirates . Our team has succesfully delivered in 26 different countries . All marijuana and Cocaine is double vacuum packed before shipping, making it completely odorless to ensure that it arrives safely to your door. Our distribution crew is expert at making packages that blend in with the rest of the mail. We have also put into place many other security measures to ensure the security of our customers. buy weed Dubai +97470301568buy Weed Qatar #Buy Weed Kuwait #Buy Weed Bahrain #Buy #Weed #Oman #Buy Weed UAE #Buy Weed Abu Dhabi @Buy Weed Doha Qatar #@Buy Weed Ajman #@Buy Weed Online #@Buy Weed UK #@Buy Weed Iceland #*@Buy Weed All Countries Below are the various strains of kush available ; buy hash and Weed in dubai,abu dhabi,sharjah where to buy weed in doha,where can i find weed in jeddah,Can I get weed delivered to Riyadh?,Buy weed Online Jeddah Saudi Arabia,Buy Weed and THC Cannabis Oil online ,QATAR , DOHA buy kush in DOHA , buy kush in DOHAWeed in QATAR # DOHA Buy Weed and THC Cannabis Oil online who delivers at your own location in Qatar Doha ,Kuwait ,Dubai including cannabis / weed,Where can I find weed in Dubai as a tourist?,Is marijuana allowed in Dubai ? How much is medical marijuana in Dubai ?Is weed legal in Dubai ?How to get marijuana in Saudi Arabia Do people in Saudi Arabia smoke weed ? Is Hash legal in Saudi Arabia ? Where is marijuana the most illegal?Can you get weed in Baku? Dubai, United Arab Emirates Canabis smokers - Dubai, Buy Marijuana Products Online in UAE Desertcart ships the Marijuana products in Dubai ,Abu Dhabi, Sharjah, Al Ain, Ajman and more cities in UAE. Get unlimited free shipping in 164+ countries Buy Weed Products Online in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Order thc Weed in Saudi Arabia Buy Marijuana Saudi Arabia weed in jeddahis cbd legal in saudi arabia cali tins smoke in saudi arabia drug use saudi arabia Buy weed marijuana. White Widow OG #Kush Sensi Star x ak 47 Afghan Ku

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...

Health

As electricity is difficult to store, it is crucial to strictly maintain the balance between production and consumption. The integration of intermittent renewable energies into the production mix has made the management of the balance more complex. However, access to near real-time data and communication with consumers via smart meters suggest demand response. Specifically, sending signals would encourage users to adjust their consumption according to the production of electricity. The algorithms used to select these signals must learn consumer reactions and optimize them while balancing exploration and exploitation. Various sequential or reinforcement learning approaches are being considered.

Sequential and reinforcement learning for demand side management by Margaux B...

Paris Women in Machine Learning and Data Science

Abortion pills in Kuwait city !🌆+918761049707^) Where to get cytotec pills in salmiyah, WhatsApp '''(+918761049707)''' Explore Discreet and Safe Abortion Pill Options in Dubai, Empowering Women With Confidential and Medically Supervised Choices For Reproductive Health" Abortion Pills In Dubai, Abu Dhabi/Alain/Sharjah/RAK City Satwa-Mifepristone and Misoprostol Available in Dubai/Abu Dhabi - i Pills in Dubai and Cost Of Cytotec in WhatsApp (+918761049707) Abortion Pills in Dubai/Abu Dubai/ Abu Dhabi. Are you stranded with unwanted pregnancy in Dubai, Abu Dhabi , the United Arab Emirates (U.A.E), Qatar, Oman, Saudi Arabia or Kuwait? you can now contact us now on Whatsapp Dr AJ:+918761049707to buy safe abortion pills In Dubai, Abu Dhabi, Sharjah, Al Ain, Ajman, RAK City, Ras Al Khaimah and Fujairah to terminate an unwanted pregnancy in Dubai and the United Arab Emirates. Get your Discreet 100% Safe (+918761049707 )*Effective Abortion Pills For Sale in Dubai, Abu Dhabi, KUWAIT, QATAR, BAHRAIN, DOHA, SALMIYA, Sharjah, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE. BUY Mifepristone and Misoprostol (Cytotec), Mtp Kit In UAE. Abortion pills available in UAE (United Arab Emirates), Saudi Arabia, Kuwait, Oman, Bahrain and Qatar. Contact us today. +918761049707 -The UAE’s leading abortion care service in Dubai. Abortion Treatment. Medical Abortion. Surgical Abortion. Find A Clinic like Dr Maria Abortion clinic in Dubai We have Abortion Pills / Cytotec Tablets Available in Dubai, Abu Dhabi, KUWAIT, QATAR, BAHRAIN, DOHA, SALMIYA, Sharjah, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE., buy cytotec in Dubai abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain We sell original abortion medicine which includes: Cytotec 200mcg (Misoprostol), Mifepristone, Mifegest-kit, Misoclear, Emergency contraceptive pills, Morning after sex pills, ipills, pills to prevent pregnancy 72 hours after sex. All our pills are manufactured by reputable medical manufacturing companies like PFIZER. Medical abortion is easy and effective for everyone to perform in their own privacy. There are very few complications that may arise from medical abortion if one follows the right guidelines as instructed by the obstetrician. Abortion Pills in Dubai Can Now Be Offered at Dr Maria Abortion clinic in Dubai, F.D.A. Mifepristone and Misoprostol, the first of two drugs in medication abortions, previously had to be dispensed only by clinics, doctors or a few mail-order pharmacies like Dr Maria Abortion clinic in Dubai . Now, We can provide it. For the first time, retail pharmacies, like Dr Maria Abortion clinic in Dubai, will be Able to offer abortion pills in Dubai under a regulatory change made Tuesday by the Food and Drug Administration. The action could significantly expand access to abortion through medication. Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia

ahmedjiabur940

Gartner's Data Analytics Maturity Model.pptx

chadhar227

This PowerPoint presentation likely explores the journey and evolution of boAt as a brand known for its innovative approach in the consumer electronics industry. The presentation may cover several key aspects: Introduction to boAt: Overview of the company's founding, vision, and initial products. Innovation Highlights: Exploration of boAt's innovative strategies in product design, technology integration, and market positioning. Market Impact: Discussion on how boAt has disrupted the audio electronics market with its unique offerings and consumer-centric approach. Brand Building: Insights into the marketing and branding strategies employed by boAt to create a strong and recognizable brand identity. Growth Trajectory: Analysis of boAt's growth story, including milestones, challenges faced, and strategies for expansion. Future Outlook: Speculation or plans for the future of boAt, including potential areas of innovation and growth opportunities. Lessons Learned: Takeaways and lessons that can be gleaned from boAt's success story, particularly in the context of entrepreneurship and innovation.

The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx

Vivek487417

Saudi Arabia [ Abortion pills) Jeddah/riaydh/dammam/+966572737505☎️] cytotec tablets uses abortion pills 💊💊 How effective is the abortion pill? 💊💊 +966572737505) "Abortion pills in Jeddah" how to get cytotec tablets in Riyadh " Abortion pills in dammam*💊💊 The abortion pill is very effective. If you’re taking mifepristone and misoprostol, it depends on how far along the pregnancy is, and how many doses of medicine you take:💊💊 +966572737505) how to buy cytotec pills At 8 weeks pregnant or less, it works about 94-98% of the time. +966572737505[ 💊💊💊 At 8-9 weeks pregnant, it works about 94-96% of the time. +966572737505) At 9-10 weeks pregnant, it works about 91-93% of the time. +966572737505)💊💊 If you take an extra dose of misoprostol, it works about 99% of the time. At 10-11 weeks pregnant, it works about 87% of the time. +966572737505) If you take an extra dose of misoprostol, it works about 98% of the time. In general, taking both mifepristone and+966572737505 misoprostol works a bit better than taking misoprostol only. +966572737505 Taking misoprostol alone works to end the+966572737505 pregnancy about 85-95% of the time — depending on how far along the+966572737505 pregnancy is and how you take the medicine. +966572737505 The abortion pill usually works, but if it doesn’t, you can take more medicine or have an in-clinic abortion. +966572737505 When can I take the abortion pill?+966572737505 In general, you can have a medication abortion up to 77 days (11 weeks)+966572737505 after the first day of your last period. If it’s been 78 days or more since the first day of your last+966572737505 period, you can have an in-clinic abortion to end your pregnancy.+966572737505 Why do people choose the abortion pill? Which kind of abortion you choose all depends on your personal+966572737505 preference and situation. With+966572737505 medication+966572737505 abortion, some people like that you don’t need to have a procedure in a doctor’s office. You can have your medication abortion on your own+966572737505 schedule, at home or in another comfortable place that you choose.+966572737505 You get to decide who you want to be with during your abortion, or you can go it alone. Because+966572737505 medication abortion is similar to a miscarriage, many people feel like it’s more “natural” and less invasive. And some+966572737505 people may not have an in-clinic abortion provider close by, so abortion pills are more available to+966572737505 them. +966572737505 Your doctor, nurse, or health center staff can help you decide which kind of abortion is best for you. +966572737505 More questions from patients: Saudi Arabia+966572737505 CYTOTEC Misoprostol Tablets. Misoprostol is a medication that can prevent stomach ulcers if you also take NSAID medications. It reduces the amount of acid in your stomach, which protects your stomach lining. The brand name of this medication is Cytotec®.+966573737505) Unwanted Kit is a combination of two medicin

Abortion pills in Jeddah | +966572737505 | Get Cytotec

Abortion pills in Riyadh +966572737505 get cytotec

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...

gajnagarg

Aspirational Block Program Block Syaldey District - Almora

GovindSinghDasila

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

Elaine Werffeli

Data Analyst Tasks to do the internship.pdf

theeltifs

Klinik_ Apotek Onlin 085657271886 Solusi Menggugurkan Masalah Kehamilan Anda Jual Obat Aborsi Asli KLINIK ABORSI TERPEECAYA _ Jual Obat Aborsi Cytotec Misoprostol Asli 100% Ampuh Hanya 3 Jam Langsung Gugur || OBAT PENGGUGUR KANDUNGAN AMPUH MANJUR OBAT ABORSI OLINE" APOTIK Jual Obat Cytotec, Gastrul, Gynecoside Asli Ampuh. JUAL ” Obat Aborsi Tuntas | Obat Aborsi Manjur | Obat Aborsi Ampuh | Obat Penggugur Janin | Obat Pencegah Kehamilan | Obat Pelancar Haid | Obat terlambat Bulan | Ciri Obat Aborsi Asli | Obat Telat Bulan | Pil Aborsi Asli | Cara Menggugurkan Konten | Cara Aborsi Tuntas | Harga Obat Aborsi Asli | Pil Aborsi | Jual Obat Aborsi Cytotec | Cara Aborsi Sendiri | Cara Aborsi Usia 1 Bulan | Cara Aborsi Usia 2 Tahun | Cara Aborsi Usia 3 Bulan | Obat Aborsi Usia 4 Bulan | Cara Abrasi Usia 5 Bulan | Cara Menggugurkan Konten | Kandungan Obat Penggugur | Cara Menghitung Usia Konten | Cara Mengatasi Terlambat Bulan | Penjual Obat Aborsi Asli | Obat Aborsi Garansi | Kandungan Obat Peluntur | Obat Telat Datang Bulan | Obat Telat Haid | Obat Aborsi Paling Murah | Klinik Jual Obat Aborsi | Jual Pil Cytotec | Apotik Jual Obat Aborsi | Kandungan Dokter Abrasi | Cara Aborsi Cepat | Jual Obat Aborsi Bergaransi | Jual Obat Cytotec Asli | Obat Aborsi Aman Manjur | Obat Misoprostol Cytotec Asli. "APA ITU ABORSI" “Aborsi Adalah dengan membendung hormon yang di perlukan untuk mempertahankan kehamilan yaitu hormon progesteron, karena hormon ini dibendung, maka jalur kehamilan mulai membuka dan leher rahim menjadi melunak,sehingga mengeluarkan darah yang merupakan tanda bahwa obat telah bekerja || maksimal 1 jam obat diminum || PENJELASAN OBAT ABORSI USIA 1 _7 BULAN Pada usia kandungan ini, pasien akan merasakan sakit yang sedikit tidak berlebihan || sekitar 1 jam ||. namun hanya akan terjadi pada saatdarah keluar merupakan pertanda menstruasi. Hal ini dikarenakan pada usiakandungan 3 bulan,janin sudah terbentuk sebesar kepalan tangan orang dewasa. Cara kerja obat aborsi : JUAL OBAT ABORSI AMPUH dosis 3 bulan secara umum sama dengan cara kerja || DOSIS OBAT ABORSI 2 bulan”, hanya berbedanya selain mengisolasijanin juga menghancurkan janin dengan formula methotrexate dikandungdidalamnya. Formula methotrexate ini sangat ampuh untuk menghancurkan janinmenjadi serpihan-serpihan kecil akan sangat berguna pada saat dikeluarkan nanti. APA ALASAN WANITA MELAKUKAN ABORSI? Aborsi di lakukan wanita hamil baik yang sudah menikah maupun belum menikah dengan berbagai alasan , akan tetapi alasan yang utama adalah alasan-alasan non medis (termasuk aborsi sendiri / di sengaja/ buatan] MELAYANI PEMESANAN OBAT ABORSI SETIAP HARI, SIAP KIRIM KESELURUH KOTA BESAR DI INDONESIA DAN LUAR NEGERI. HUBUNGI PEMESANAN LEBIH NYAMAN VIA WA/: 085657271886

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...

Klinik kandungan

PLE-statistics document for primary schs

cnajjemba

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation

EfruzAsilolu

Discover Why Less is More in B2B Research

michael115558

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...

nirzagarg

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We are available 24*7 Booking Contact Details :- WhatsApp Chat :- +91-7014168258 If you're looking for India Call girls you've come to the right place. You'll find some of the most beautiful call girls in our location with. These ladies have pleasing personalities, hot figures, and a passion for physical pleasure. Call girls in India Lucknow Many men have booked them for their erotic and soul-mixing performances, which are sure to leave you with unforgettable memories. #K09 Escort Service India is available in the city for men and women of all ages. They can satisfy your sexual needs and will make your experience even more enjoyable and memorable. Whether you're looking for a blow-job, stripping, lovemaking, or other dirty acts, you'll be able to find a match for your tastes and budget. These highly trained professionals will help you have an unforgettable night. One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7014168258 We are available 24*7 all days of the year. Call us — 7014168258 Thank you for Visiting.

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...

nirzagarg

Digital Transformation Playbook by Graham Ware

Graham Ware

原版定制【微信:153539019】《(UCD毕业证书）加州大学戴维斯分校毕业证》【微信:153539019】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信153539019】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信153539019】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。

一比一原版(UCD毕业证书）加州大学戴维斯分校毕业证成绩单原件一模一样

wsppdmt

Digital advertising, or paid media, encompasses the strategic deployment of online advertisements to reach target audiences efficiently and effectively. This includes any digital platform that supports advertising to deliver unique messages for any objective. Understanding the mechanics of digital advertising platforms, along with insights into audience behaviors and preferences, allows marketers to optimize their ad spend and achieve significant engagement and conversion rates. This lecture is for Advanced Digital & Social Media Strategy (MGMTX 466.05) at UCLA Extension.

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

Valters Lauzums

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma

1. Sparking up Data Engineering Rohan Sharma Spark Summit East - Feb 9, 2017

2. Agenda ● Context: Netflix ● Netflix Data Ecosystem ● Spark Development @ Netflix ● Stranger Things with Spark ● Q&A

3. #NetflixEverywhere ● 93+ Million Members ● 190+ Countries ● 125+ Million streaming hours / day ● 1000 hours of Original content in 2017 ● ⅓ of US internet traffic during evenings

4. Netflix Culture ● Freedom and Responsibility ● Context, not Control ● Highly aligned, loosely coupled ● The Paved road

5. #NetflixData

6. #NetflixData Product Experience Streaming Experience Content Marketing Business Operations Other Functions...

7. Data Producers ● Member Devices ● CDN Servers ● Application Servers ● Device/Server Telemetry ● Application Data ● Vendors / Partner Data

8. Data Processing ● Stream Processing - Shriya Arora ● Recommendation Systems - DB Tsai & Gary Yeh ● Batch Processing ● Experimentation Analytics ● Operational Analytics

9. Data Platform (Batch Processing) Storage Compute Service Tools APIBig Data Portal S3 Parquet Transport VisualizationQuality Pig Workflow Vis Job/Cluster Vis Interface Execution Metadata Notebooks Tableau Micro Strategy JavaScript Applications

10. Cloud Warehouse ● 60+ Petabytes ● Hadoop on S3 - separate compute from storage ● Multiple workloads on same cluster ● Tens of thousands of Spark, Pig, Hive, Presto jobs ● Production Cluster : 2600 d2.4xl nodes ● Query Cluster : 400 m4.16xl nodes

11. Spark Development ● Focus on stakeholders - not process & operations ● Reuse boilerplate code & best practices ● Reuse build & deployment practices ● Make Spark easy-to-use & cruise

12. Spark Development Template Abstract Spark Project (Template) Netflix Data Utils (cross-platform business logic) Spark Utils (spark boilerplate + interface implementations) Code Reuse Registry + Build Properties Project Instances Build & Deploy Reuse S3 Scheduler FRAMEWORK USER FRAMEWORK . . . Jenkins

13. Development Lifecycle Develop & Test Commit, Build, Deploy & Run Zeppelin Spark ShellIntelliJ Integration Test Bit Bucket Jenkins S3 / Artifactory Execute

14. Whats Next... ● Spark added to paved road in early 2016 ● Improved query performance / cluster utilization ● Pyspark templates & data utils

15.

16. Code Refactoring ● Download Source Code ● Semi-Compile to Abstract Syntax Trees ● Check for re-factoring rules ● Codegen refactored lines and update source ● Invoke dockerized CI service to build ● Raise Pull Requests with inferred reviewers More Details: Jonathan Schneider Linked In: https://www.linkedin.com/in/jonkschneider/ Youtube: https://www.youtube.com/watch?v=JbcKFKiBU60

17. Questions To learn more, please visit: youtube channel NetflixData twitter handle @NetflixData

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma

Semelhante a Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma (20)

Mais de Spark Summit

Mais de Spark Summit (20)

Último

Último (20)

Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma