Realtime predictive analytics using RabbitMQ & scikit-learn

•Transferir como PPT, PDF•

1 gostou•3,866 visualizações

In this talk, AWeber's Michael Becker describes how to deploy a predictive model in a production environment using RabbitMQ and scikit-learn. You'll see a realtime content classification system to demonstrate this design.

Tecnologia Educação

Realtime
Predictive
Analytics
Using scikit-learn and RabbitMQ
Using scikit-learn and RabbitMQMichael Becker
PyData Boston 2013

Who is this guy?
Software Engineer @ AWeber
Founder of the DataPhilly Meetup group
@beckerfuffle
beckerfuffle.com
These slides and more @ github.com/mdbecker

What I'll cover
•Model Distribution
•Data flow

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo
•Scalability

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo
•Scalability
•Other considerations

38 top wikipedias
Arabic ‫العربية‬
Bulgarian Български
Catalan Català
Czech Čeština
Danish Dansk
German Deutsch
English English
Spanish Español
Estonian Eesti
Basque Euskara
Persian ‫فارسی‬
Finnish Suomi
French Français
Hebrew ‫עברית‬
Hindi िहिन्दी
Croatian Hrvatski
Hungarian Magyar
Indonesian Bahasa Indonesia
Italian Italiano
Japanese 日本語
Kazakh Қазақша
Korean 한국어
Lithuanian Lietuvių
Malay Bahasa Melayu
Dutch Nederlands
Norwegian (Bokmål) Norsk (Bokmål)
Polish Polski
Portuguese Português
Romanian Română
Russian Русский
Slovak Slovenčina
Slovenian Slovenščina
Serbian Српски / Srpski
Swedish Svenska
Turkish Türkçe
Ukrainian Українська
Vietnamese Tiếng Việt
Waray-Waray Winaray

Enter RabbitMQ
Reliability
Flexible Routing
Clustering
HA Queues
Many clients

AMQP
Reliability
Flexible Routing
Clustering
HA Queues
Many clients

Thank you
API & Worker: Kelly O’Brien (linkedin.com/in/kellyobie)
UI: Matt Parke (ordinaryrobot.com)
Classifier: Michael Becker (github.com/mdbecker)
Images: Wikipedia

My info
Tweet me @beckerfuffle
Find me at beckerfuffle.com
These slides and more @ github.com/mdbecker

Mais conteúdo relacionado

Destaque

scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry. scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models. The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline. We will also cover how to build machine learning models on text data, and how to handle very large datasets.

Machine Learning with scikit-learn

odsc

Data Science and Machine Learning Using Python and Scikit-learn

Asim Jalis

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn

Arnaud Joly

Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

Pôle Systematic Paris-Region

Tree models with Scikit-Learn: Great models with little assumptions

Gilles Louppe

Exploring Machine Learning in Python with Scikit-Learn

Kan Ouivirach, Ph.D.

Intro to scikit-learn

AWeber

Intro to scikit learn may 2017

Francesco Mosconi

Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...

PyData

Intro to machine learning with scikit learn

Yoss Cohen

Scikit-learn is a popular machine learning tool. What can it do for you?Why you you want to use it? What can you do with it? Where is it going?In this talk, I will discuss why and how scikit-learn became popular. Iwill argue that it is successful because of its vision: it fills an important slot in the rich ecosystem of data science. I will demonstrate how scikit-learn makes predictive analysis easy and yet versatile.I will shed some light on our development process: how do we, as a community, ensure the quality and the growth of scikit-learn?

Scikit-learn for easy machine learning: the vision, the tool, and the project

Gael Varoquaux

Random Forests are without contest one of the most robust, accurate and versatile tools for solving machine learning tasks. Implementing this algorithm properly and efficiently remains however a challenging task involving issues that are easily overlooked if not considered with care. In this talk, we present the Random Forests implementation developed within the Scikit-Learn machine learning library. In particular, we describe the iterative team efforts that led us to gradually improve our codebase and eventually make Scikit-Learn's Random Forests one of the most efficient implementations in the scientific ecosystem, across all libraries and programming languages. Algorithmic and technical optimizations that have made this possible include: - An efficient formulation of the decision tree algorithm, tailored for Random Forests; - Cythonization of the tree induction algorithm; - CPU cache optimizations, through low-level organization of data into contiguous memory blocks; - Efficient multi-threading through GIL-free routines; - A dedicated sorting procedure, taking into account the properties of data; - Shared pre-computations whenever critical. Overall, we believe that lessons learned from this case study extend to a broad range of scientific applications and may be of interest to anybody doing data analysis in Python.

Accelerating Random Forests in Scikit-Learn

Gilles Louppe

Converting Scikit-Learn to PMML

Villu Ruusmann

Text Classification/Categorization

Oswal Abhishek

Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets. The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.

Introduction to Machine Learning with SciKit-Learn

Benjamin Bengfort

Big data analysis relies on exploiting various handy tools to gain insight from data easily. In this talk, the speaker demonstrates a data mining flow for text classification using many Python tools. The flow consists of feature extraction/selection, model training/tuning and evaluation. Various tools are used in the flow, including: Pandas for feature processing, scikit-learn for classification, IPython, Notebook for fast sketching, matplotlib for visualization.

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...

Jimmy Lai

Given at the PyData NYC 2013 conference (http://vimeo.com/79517341), and will be given at PyTennessee 2014. Scikit-learn is one of the most well-known machine learning Python modules in existence. But how does it work, and what, for that matter, is machine learning? For those with programming experience but who are new to machine learning, this talk gives a beginner-level overview of how machine learning can be useful, important machine learning concepts, and how to implement them with scikit-learn. We’ll use real world data to look at supervised and unsupervised machine learning algorithms and why scikit-learn is useful for performing these tasks.

A Beginner's Guide to Machine Learning with Scikit-Learn

Sarah Guido

Slides of the talk "Gradient Boosted Regression Trees in scikit-learn" by Peter Prettenhofer and Gilles Louppe held at PyData London 2014. Abstract: This talk describes Gradient Boosted Regression Trees (GBRT), a powerful statistical learning technique with applications in a variety of areas, ranging from web page ranking to environmental niche modeling. GBRT is a key ingredient of many winning solutions in data-mining competitions such as the Netflix Prize, the GE Flight Quest, or the Heritage Health Price. I will give a brief introduction to the GBRT model and regression trees -- focusing on intuition rather than mathematical formulas. The majority of the talk will be dedicated to an in depth discussion how to apply GBRT in practice using scikit-learn. We will cover important topics such as regularization, model tuning and model interpretation that should significantly improve your score on Kaggle.

Gradient Boosted Regression Trees in scikit-learn

DataRobot

Statistical Machine Learning for Text Classification with scikit-learn and NLTK

Olivier Grisel

Make Sense Out of Data with Feature Engineering

DataRobot

Destaque (20)

Machine Learning with scikit-learn

Data Science and Machine Learning Using Python and Scikit-learn

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn

Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

Tree models with Scikit-Learn: Great models with little assumptions

Exploring Machine Learning in Python with Scikit-Learn

Intro to scikit-learn

Intro to scikit learn may 2017

Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...

Intro to machine learning with scikit learn

Scikit-learn for easy machine learning: the vision, the tool, and the project

Accelerating Random Forests in Scikit-Learn

Converting Scikit-Learn to PMML

Text Classification/Categorization

Introduction to Machine Learning with SciKit-Learn

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...

A Beginner's Guide to Machine Learning with Scikit-Learn

Gradient Boosted Regression Trees in scikit-learn

Statistical Machine Learning for Text Classification with scikit-learn and NLTK

Make Sense Out of Data with Feature Engineering

Semelhante a Realtime predictive analytics using RabbitMQ & scikit-learn

Drools and jBPM 6 Overview

Mark Proctor

Reveal's Advanced Analytics: Using R & Python

Poojitha B

Qcon beijing 2010

Vonbo

Object Oriented Programming in Swift Ch0 - Encapsulation

Chihyang Li

Neo4j in Depth

Max De Marzi

Continuum Analytics and Python

Travis Oliphant

PyData Texas 2015 Keynote

Peter Wang

WordCamp 2012 - Seth Carstens Presentation (Responsive Width)

Seth Carstens

Intro to Machine Learning with H2O and AWS

Sri Ambati

Análisis del roadmap del Elastic Stack

Elasticsearch

We have a lot to do on the cybersecurity side, and we are almost always lacking people, or budget, or both. Can we take lessons and approaches from entrepreneurship to apply to our cybersecurity programs? Can we do more with what we have, or for each addition can we make sure it has a large impact? We’ll explore some entrepreneurship principles and then dive into some ways to improve security without large increases in headcount or budget.

Lean Security

Ben Johnson

Are you tapping into automation for keyword research? If not, why not? When it comes to SEO, automation is awesome. For starters, it can help free up a lot of time that is normally spent on menial tasks. What’s more, it can also aid deep analysis, and even facilitate innovation. If you are still doing keyword research manually, this is a must-attend session. Paul will show you how to get started with automated keyword research, using some easy-to-use tools. You’ll see first-hand how they can help you uncover valuable insights automatically. Overall, you will walk away with an immediately actionable plan to start automating your keyword research today.

SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research

Distilled

Ds @ bol

Asparuh Hristov

Si è tornato a parlare molto di Machine Learning negli ultimi anni. Grazie anche al fatto che è possibile oggi processare enormi moli di dati in tempi (relativamente) veloci questa parte dell'informatica sta vivendo una seconda giovinezza. In questa sessione vedremo cos'è il machine learning, quali sono le diverse casistiche tecniche e funzionali in cui può essere usato ed inizieremo a "giocare" con i dati per vedere fin dove possiamo spingerci, usando strumenti On-Premise e quindi spostandoci poi sull'offerta Azure Machine Learning dove, una volta fatta propria la teoria, si possono realizzare soluzioni estremamente complesse in modo molto visuale, oppure integrandosi con R ed IPython e sfruttare la scalabilità di Azure per avere performance ottimali. Il tutto senza dimenticare che gli algoritmi così ottenuti possono essere facilmente integrati nelle nostre applicazioni semplicemente invocando un web service.

Azure Machine Learning

Davide Mauri

Análisis de las novedades del Elastic Stack

Elasticsearch

Elastic Stack roadmap deep dive

Elasticsearch

In this session you will learn about how H&M have created a reference architecture for deploying their machine learning models on azure utilizing databricks following devOps principles. The architecture is currently used in production and has been iterated over multiple times to solve some of the discovered pain points. The team that are presenting is currently responsible for ensuring that best practices are implemented on all H&M use cases covering 100''s of models across the entire H&M group. <br> This architecture will not only give benefits to data scientist to use notebooks for exploration and modeling but also give the engineers a way to build robust production grade code for deployment. The session will in addition cover topics like lifecycle management, traceability, automation, scalability and version control.

Automated Production Ready ML at Scale

Databricks

Architecting for Data Science

Johann Schleier-Smith

HTML Semantic Tags

Bruce Kyle

From its humble beginning as a place where people would pay $5 to get a funny video, Fiverr has grown into the world’s largest marketplace for digital services. Along the way, our frontend architecture has had to evolve as well - with technologies changing at a rapid pace and frontend developers in general always wanting to work with the latest, shiniest thing, not being adaptable to the environment around you can easily lead you down a road where your stack can’t support your needs & where you’re constantly playing catch-up to whatever it is everyone else is doing. In this talk, I’ll give an overview of the FE path that Fiverr took — where we started, what we’re currently doing and where we’re (hopefully) going.

Run Fast, Try Not to Break S**t

Michael Schmidt

Semelhante a Realtime predictive analytics using RabbitMQ & scikit-learn (20)

Drools and jBPM 6 Overview

Reveal's Advanced Analytics: Using R & Python

Qcon beijing 2010

Object Oriented Programming in Swift Ch0 - Encapsulation

Neo4j in Depth

Continuum Analytics and Python

PyData Texas 2015 Keynote

WordCamp 2012 - Seth Carstens Presentation (Responsive Width)

Intro to Machine Learning with H2O and AWS

Análisis del roadmap del Elastic Stack

Lean Security

SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research

Ds @ bol

Azure Machine Learning

Análisis de las novedades del Elastic Stack

Elastic Stack roadmap deep dive

Automated Production Ready ML at Scale

Architecting for Data Science

HTML Semantic Tags

Run Fast, Try Not to Break S**t

Mais de AWeber

ASCEND Summit 2014 provided tons of learning opportunities specific to improving your efforts in content marketing. If content marketing is your top priority, these five ASCEND sessions will show how to build your business with proven tactics, the latest trends and tools, and sage advice from experts in the field. Featuring: Andy Crestodina, Tim Paige, Michael Brenner, Marcus Sheridan, and Lynette Young We've also organized these speakers into a video package to help you capture the energy, inspiration and actionable takeaways from ASCEND Summit 2014. Order your Content Marketing Power Tools video today: http://contentvideo.ascendsummit.com

ASCEND Content Marketing Power Tools

AWeber

ASCEND Summit 2014 provided tons of learning opportunities specific to improving your efforts in multichannel marketing. Want to drill down into marketing channels like SEO, email, affiliate marketing, landing pages and mobile? These four ASCEND sessions cover today's most effective marketing methods, with actionable insights you can use right away. Featuring: Justine Jordan, Hunter Boyle, Oli Gardner, Brian Massey, Mohammed Ahmed, Tricia Meyer, Sarah Bundy, Jennifer Myers Ward, Geno Prussakov, and Brian Littleton We've also organized these speakers (and two others - Peter Shankman and Wil Reynolds) into a video package to help you capture the energy, inspiration and actionable takeaways from ASCEND Summit 2014. Order your Multichannel Marketing Power Tools video today: http://multichannelvideo.ascendsummit.com

ASCEND Multichannel Marketing Power Tools

AWeber

Beginner's Guide to Marketing on Social Networks

AWeber

5 Content Blind Spots and How to Avoid Them

AWeber

ASCEND is a two-day summit that focuses on delivering digital marketing tactics and strategies that will cover a range of topics intended to help you create successful, results-driven digital marketing initiatives, including: email marketing, content creation, social media, SEO, conversion optimization and buyer psychology. Email is the hub for successful marketing programs. Marketers have been growing their businesses with AWeber over the past 15 years and we thought it was time to share our collective knowledge, and experiences from other customers, in a more personal way. AWeber is bringing an awesome roster of content and digital marketers to its hometown of Philadelphia to help your business ASCEND to the next level. Seats are limited, so now's the time to secure your ticket for this exclusive event. Join us at ASCEND 2014 - http://bit.ly/1mUsB5K

Digital Marketing Tips from Experts at the Top of the Summit

AWeber

Looking at a photo and deciding whether the person depicted is happy, angry or sad may seem like a trivial task for anyone to do. However, differing contexts and other subtle factors make it very costly for a computer to do the same. Being able to analyze subjective information automatically is an invaluable tool for small businesses. This data can be used to shape business decisions and drive profits. One way to achieve this goal is through crowdsourcing. In other words, getting a large group of volunteers to participate in a common problem and combining their contirbutions. Actually organizing, funding, and managing a project like this can be daunting and expensive, this is where Amazon's Mechanical Turk comes in. This talk explains how Mechanical Turk works and cover various ways in which it can be leveraged by anyone. We will cover use cases that have been successful, the mechanics of posting, processing and testing tasks, and specific tools for accomplishing these goals. This talk was given by Michael Becker and Kelly O'Brien at the 2013 Philly Tech Week on April 23, 2013.

Data Processing with Mechanical Turk

AWeber

5 WordPress Plugins that will Rock Your World

AWeber

How to Grow Your Email List Like the Pros

AWeber

How to Create Killer Emails that Make Readers Love You

AWeber

Has your email marketing become a routine? It happens. When we get too bogged down in patterns, our creative juices can get stagnant. Let's shake things up for 2013. Infuse your campaigns with new flavor as we review clever, fun campaigns that worked (and a few that didn't). You'll come way with ideas and inspiration you can put to work right away to revitalize your ROI. Presented by Hunter Boyle at MarketingSherpa's Email Summit 2013, Las Vegas Learn more at: http://www.aweber.com/blog

Breathing Life (and ROI) Back Into Your Email Marketing

AWeber

Want to turn strangers into raving fans while you sleep? It may not happen overnight, but automated marketing can help you build your audience, nurture relationships and grow your bottom-line results. All at a fraction of the effort and investment of standard email marketing processes. Want to learn more? Whether you're just starting out or improving an existing program, join us to get the lowdown on simple ways to make automated marketing do your heavy lifting. We'll look at real-world examples and research, so you'll come away ready to take action. Presented by Hunter Boyle of AWeber & DJ Waldow of Waldow Social at Explore Social Media, Portland

More Engagement, Less Effort: The Lowdown on Marketing Automation

AWeber

Learn how to dramatically grow your email marketing lists with these 25 ideas and resources. Compiled with input and real examples from a variety of marketing all-stars, you're sure to find new tricks to increase your subscriber base and keep them more engaged with your content. Presented by Hunter Boyle at Affiliate Summit East NYC, #ASE12, Aug. 2012. For more tricks, visit: http://www.aweber.com/blog

25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI

AWeber

Email List-Building 101: How to Reel In New Readers with a Few Simple Steps

AWeber

30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012

AWeber

How To Get The Results You Want From An Email Campaign

AWeber

What does it mean to market with email? To some, it simply means slapping a form on a website and sending out the occasional newsletter. But to the savvy small business marketer, it means creating a valuable incentive for subscribing, respecting the subscriber's time and attention, and using email to increase the lifetime value (LTV) of a subscriber. For more email marketing tips, visit http://www.aweber.com/blog/

Smart Email Marketing: Engage Your Customers and Grow Your Business

AWeber

Get More Email Subscribers

AWeber

Efficient Marketing: The Tools You Need and How to Use Them

AWeber

Dustin Maher (dustinmaherfitness.com) graduated from the University of Wisconsin in 2006 with a degree in Kinesiology and Business knowing that he wanted to help people get in shape. Soon after graduating, he launched MamaTone Fitness in Madison, Wisconsin. In this presentation at the Greater Philly Email Marketers Meetup on June 6, Crystal Gouldey shared how this fresh-out-of-college fitness instructor grew his local fitness company to a national business with 10 locations, 28 DVDs, a published book, and an email list of 12,000+ subscribers using online marketing tactics.

From Local Business to National Sensation

AWeber

Live h2gs

AWeber

Mais de AWeber (20)

ASCEND Content Marketing Power Tools

ASCEND Multichannel Marketing Power Tools

Beginner's Guide to Marketing on Social Networks

5 Content Blind Spots and How to Avoid Them

Digital Marketing Tips from Experts at the Top of the Summit

Data Processing with Mechanical Turk

5 WordPress Plugins that will Rock Your World

How to Grow Your Email List Like the Pros

How to Create Killer Emails that Make Readers Love You

Breathing Life (and ROI) Back Into Your Email Marketing

More Engagement, Less Effort: The Lowdown on Marketing Automation

25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI

Email List-Building 101: How to Reel In New Readers with a Few Simple Steps

30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012

How To Get The Results You Want From An Email Campaign

Smart Email Marketing: Engage Your Customers and Grow Your Business

Get More Email Subscribers

Efficient Marketing: The Tools You Need and How to Use Them

From Local Business to National Sensation

Live h2gs

Último

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Bhuvaneswari Subramani

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

Six Myths about Ontologies: The Basics of Formal Ontology

johnbeverley2021

Vector Search -An Introduction in Oracle Database 23ai.pptx

Remote DBA Services

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MadyBayot

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Architecting Cloud Native Applications

WSO2

Following the popularity of “Cloud Revolution: Exploring the New Wave of Serverless Spatial Data,” we’re thrilled to announce this much-anticipated encore webinar. In this sequel, we’ll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you’re building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

💥 You’re lucky! We’ve found two different (lead) developers that are willing to share their valuable lessons learned about using UiPath Document Understanding! Based on recent implementations in appealing use cases at Partou and SPIE. Don’t expect fancy videos or slide decks, but real and practical experiences that will help you with your own implementations. 📕 Topics that will be addressed: • Training the ML-model by humans: do or don't? • Rule-based versus AI extractors • Tips for finding use cases • How to start 👨‍🏫👨‍💻 Speakers: o Dion Morskieft, RPA Product Owner @Partou o Jack Klein-Schiphorst, Automation Developer @Tacstone Technology

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

UiPathCommunity

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Realtime predictive analytics using RabbitMQ & scikit-learn

1. Realtime Predictive Analytics Using scikit-learn and RabbitMQ Using scikit-learn and RabbitMQMichael Becker PyData Boston 2013

2. Who is this guy? Software Engineer @ AWeber Founder of the DataPhilly Meetup group @beckerfuffle beckerfuffle.com These slides and more @ github.com/mdbecker

3. What I'll cover •Model Distribution

4. What I'll cover •Model Distribution •Data flow

5. What I'll cover •Model Distribution •Data flow •RabbitMQ

6. What I'll cover •Model Distribution •Data flow •RabbitMQ •Demo

7. What I'll cover •Model Distribution •Data flow •RabbitMQ •Demo •Scalability

8. What I'll cover •Model Distribution •Data flow •RabbitMQ •Demo •Scalability •Other considerations

9. Supervised Learning

10. 38 top wikipedias Arabic ‫العربية‬ Bulgarian Български Catalan Català Czech Čeština Danish Dansk German Deutsch English English Spanish Español Estonian Eesti Basque Euskara Persian ‫فارسی‬ Finnish Suomi French Français Hebrew ‫עברית‬ Hindi िहिन्दी Croatian Hrvatski Hungarian Magyar Indonesian Bahasa Indonesia Italian Italiano Japanese 日本語 Kazakh Қазақша Korean 한국어 Lithuanian Lietuvių Malay Bahasa Melayu Dutch Nederlands Norwegian (Bokmål) Norsk (Bokmål) Polish Polski Portuguese Português Romanian Română Russian Русский Slovak Slovenčina Slovenian Slovenščina Serbian Српски / Srpski Swedish Svenska Turkish Türkçe Ukrainian Українська Vietnamese Tiếng Việt Waray-Waray Winaray

11. The model

12. Distributing the model

13. Data input

14. The client

15. Message loss

16. Enter RabbitMQ Reliability Flexible Routing Clustering HA Queues Many clients

17. AMQP Reliability Flexible Routing Clustering HA Queues Many clients

18. Data processing

19. The worker

20. The design

21. Demo time!

22. Demo time!

23. Demo time!

24. Scaling

25. Realtime vs batch

26. Monitoring

27. Load

28. Verify

29. Thank you API & Worker: Kelly O’Brien (linkedin.com/in/kellyobie) UI: Matt Parke (ordinaryrobot.com) Classifier: Michael Becker (github.com/mdbecker) Images: Wikipedia

30. My info Tweet me @beckerfuffle Find me at beckerfuffle.com These slides and more @ github.com/mdbecker

Notas do Editor

Good morning everyone, My name is Michael Becker, I work in the Data Analysis and Management team at AWeber, an email marketing company in Chalfont, PA I'm also the founder of the DataPhilly Meetup group You can find me online @beckerfuffle on Twitter. At beckerfuffle.com, and I'm also mdbecker on github. I'll be posting the materials for this talk on my github.
This talk will cover a lot of the logistics behind utilizing a trained scikit learn model in a real-life production environment. In this talk I’ll cover: How to distribute your model
I’ll discuss how to get new data to your model for prediction.
I’ll introduce RabbitMQ, what it is and why you should care.
I’ll demonstrate how we can put all this together into a finished product
I’ll discuss how to scale your model
Finally I cover some additional things to consider when using scikit learn models in a realtime production environment.
To start off, let's recap what the supervised model training process looks like. 1) You have your training data and labels 2) You vectorize your data, you train your machine learning algorithm. 3) ??? 4) Make predictions with new data 5) Profit
In this case I'm going to talk about one of the first models I created. A model that predicts the language of input text. To create this model, I used 38 of the top Wikipedias based on number of articles. I then dumped several of the most popular articles as defined by their number of hits.
I converted the wiki markup to plain text. I trained a LinearSVC (Support Vector Classifier) model using a bi/trigram (n-gram) approach I had read worked well for language classification. This approach involves counting all combinations of 2 (bigram) or 3 (trigram) character sequences in your dataset. I tested the model and I was seeing ~99% accuracy. Here I've defined a pipeline combining a text feature extractor with a simple classifier. A pipeline is a utility used to build a composite classifier. To extract features, I'm using a TfidfVectorizer. The vectorizer first counts the number of occurrences of each n-gram in each document to "vectorize the text." It then applies the TF-IDF (term frequency–inverse document frequency) algorithm. TF-IDF reflects how important a word is to a document in a collection of documents. The TF-IDF value increases based on the number of times a n-gram appears in the document, but is offset by the frequency of the n-gram in the rest of the documents. So for example an common word like "the" would get down weighted compared to a less common word like "automobile."
So the first thing you might ask yourself after you've trained your awesome model is "now what?" So one of the first problems you'll want to solve is how to distribute your model? The easiest thing to do this is to pickle (serialize) the model to disk and distribute it as part of your application. You can also store it in a database such as GridFS or Amazon S3. In the case of my model, it took up roughly 400MB in memory. This is pretty big, but easily storable on disk (and more importantly in memory).
Next let's discuss how we’re going to get data into our model. You're data could be coming from many types of sources, a web front-end, a DB trigger, etc.. In many cases, you can't easily control the rate of incoming data and you don't want to hold up the front-end or the database while you wait for a prediction to be made. In these cases, it's useful to be able to process your data asynchronously.
In the example I'm giving today, we created a simple web front-end (similar to google translate) where a user can enter some text to be classified, and get a classification back. We don't want to hold up a thread or process in the client waiting on our classifier to do its thing. Rather the front-end sends the input to a REST API which will record the text input and return a tracking_id that the client can then use to get the result.
Decoupling the UI from the backend in this way solves one design issue. However another thing to consider is weather you can afford to lose messages. If all of your data needs to be processed you have 2 options. You either need to have a built in retry mechanism in the front end, or you need a persistent and durable queue to hold your messages.
Enter RabbitMQ. One of the many features provided by RabbitMQ is Highly Available Queues. By using RabbitMQ, you can ensure that every message is processed without needing to implement a fancy (and likely error prone) retry mechanism in your front-end.
RabbitMQ uses AMQP (Advanced Message Queuing Protocol) for all client communication. Using AMQP allows clients running on different platforms or written in different languages, to easily send messages to each other. From a high level, AMQP enables clients to publish messages, and other clients to consume those messages. It does all this without requiring you to roll your own protocol or library.
Once you hook your data input source into RabbitMQ and start publishing data, all you need to do is put your model in a persistent worker and start consuming input.
In the case of my language classification model, we implemented a simple worker that unpickles the classifier and subscribes to an input queue. It then runs an event loop (main) that pulls new messages as they become available and passes them to process_event. Process event calls predict on our model and converts the numerical prediction to a human readable format. This prediction is then stored in our DB for the front-end to retrieve.
So that’s basically it. Our design looks a little something like this: The input comes from the UI where the user enters some text they wish to classify. The UI hits a Flask REST API via a GET request. The API stores the request in the DB. The API sends a message to RabbitMQ with the text to classify and the tracking_id for storing the resulting classification. The API returns a json response to the UI with the tracking_id. The worker pulls the message off the queue in RabbitMQ. The worker calls predict on the classifier with the text as input. The classifier returns a prediction. The worker updates the database with the result. The UI displays the result.
Alright so let’s see what this all looks like in action!
Alright so let’s see what this all looks like in action!
Alright so let’s see what this all looks like in action!
Besides the basic design concerns I’ve already covered, there’s a few more things worth mentioning. The worst thing that can happen when you're processing data asynchronously is for your queue to backup. Backups will result in longer processing times, and if unbounded, you'll likely crash RabbitMQ. The easiest way to scale your workers is to start another instance. Using this strategy, processing should scale roughly linearly. In my experience, you can easily handle thousands of messages a second this way.
Another way to scale your worker is to convert it to processing requests in batches. Many of the algorithms scale super-linearly when you pass multiple samples to the predict method. The downside of this is that you will no longer be able to process results in realtime. However, if you're restricted on resources (memory & cpu), this might be a worthwhile alternative.
Keep an eye on your queue sizes, alert when they backup. Scale as needed (possibly automatically).
Understand your load requirements. Load test end-to-end to verify you can handle the expected load.
Periodically re-verify your algorithm using new data. Build in a feedback loop so that you can collect new labeled samples to verify the performance Version control your classifier. Keep detailed changelogs and performance metrics/characteristics.
I’d like to thank Kelly O’brien and Matt Parke for helping me with the front-end and back-end for the demo. Without them things would be a lot less exciting!
You can find me online @beckerfuffle on Twitter. At beckerfuffle.com, and I'm also mdbecker on github. I'll be posting the materials for this talk on my github.

Realtime predictive analytics using RabbitMQ & scikit-learn

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Realtime predictive analytics using RabbitMQ & scikit-learn

Semelhante a Realtime predictive analytics using RabbitMQ & scikit-learn (20)

Mais de AWeber

Mais de AWeber (20)

Último

Último (20)

Realtime predictive analytics using RabbitMQ & scikit-learn

Notas do Editor