SlideShare a Scribd company logo
1 of 31
Download to read offline
Indexing and searching NuGet.org
with Azure Functions and Search
Maarten Balliauw
@maartenballiauw
“Find this type on NuGet.org”
“Find this type on NuGet.org”
In ReSharper and Rider
Search for namespaces
& types that are not yet referenced
“Find this type on NuGet.org”
Idea in 2013, introduced in ReSharper 9
(2015 - https://www.jetbrains.com/resharper/whatsnew/whatsnew_9.html)
Consists of
ReSharper functionality
A service that indexes packages and powers search
Azure Cloud Service (Web and Worker role)
Indexer uses NuGet OData feed
https://www.nuget.org/api/v2/Packages?$select=Id,Version,NormalizedVersion,LastEdited,Published&$
orderby=LastEdited%20desc&$filter=LastEdited%20gt%20datetime%272012-01-01%27
NuGet over time...
https://twitter.com/controlflow/status/1067724815958777856 https://github.com/NuGet/Announcements/issues/37
NuGet server-side API
V3 Protocol
JSON based
A “resource provider” of various endpoints per purpose
Catalog (NuGet.org only) – append-only event log
Registrations – materialization of newest state of a package
Flat container – .NET Core package restore (and VS autocompletion)
Report abuse URL template
Statistics
…
https://api.nuget.org/v3/index.json (code in https://github.com/NuGet/NuGet.Services.Metadata)
Catalog seems interesting!
Append-only stream of mutations on NuGet.org
Updates (add/update) and Deletes
Chronological
Can continue where left off (uses a timestamp cursor)
Can restore NuGet.org to a given point in time
Structure
Root https://api.nuget.org/v3/catalog0/index.json
+ Page https://api.nuget.org/v3/catalog0/page0.json
+ Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json
NuGet.org catalog
demo
“Find this type on NuGet.org”
Refactor from using OData to using V3?
Mostly done, one thing missing: download counts (using search now)
https://github.com/NuGet/NuGetGallery/issues/3532
Build a new version?
Welcome to this talk ☺
Building a new version
What do we need?
Watch the NuGet.org catalog for package changes
For every package change
Scan all assemblies
Store relation between package id+version and namespace+type
API compatible with all ReSharper and Rider versions
What do we need?
Watch the NuGet.org catalog for package changes periodic check
For every package change based on a queue
Scan all assemblies
Store relation between package id+version and namespace+type
API compatible with all ReSharper and Rider versions always up, flexible scale
Sounds like functions!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Collecting from catalog
demo
Functions best practices
@PaulDJohnston https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535
Each function should do only one thing
Easier error handling & scaling
Learn to use messages and queues
Asynchronous means of communicating, helps scale and avoid direct coupling
...
Bindings
Help a function do only one thing
Trigger, provide input/output
Function code bridges those
Build your own!*
SQL Server binding
Dropbox binding
...
NuGet Catalog
*Custom triggers not officially supported (yet?)
Trigger Input Output
Timer ✔
HTTP ✔ ✔
Blob ✔ ✔ ✔
Queue ✔ ✔
Table ✔ ✔
Service Bus ✔ ✔
EventHub ✔ ✔
EventGrid ✔
CosmosDB ✔ ✔ ✔
IoT Hub ✔
SendGrid, Twilio ✔
... ✔
Creating a trigger
binding
demo
We’re making progress!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Downloading packages
demo
Next up: indexing
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Indexing
Opening up the .nupkg and reflecting on assemblies
System.Reflection.Metadata
Does not load the assembly being reflected into application process
Provides access to Portable Executable (PE) metadata in assembly
Store relation between package id+version and namespace+type
Azure Search? A database? Redis? Other?
Indexing packages
demo
“Do one thing well”
Our function shouldn’t care about creating a search index.
Better: return index operations, have something else handle those
Custom output binding?
Blog post with full story and implementation
https://bit.ly/2lzba32
👀
Almost there…
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Making search work
with ReSharper and Rider
demo
We’re done!
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
We’re done!
Functions
Collect changes from NuGet catalog
Download binaries
Index binaries using PE Header
Make search index available in API
Trigger, input and output bindings
Each function should do only one thing
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
We’re done!
All our functions can scale (and fail)
independently
Full index in May 2019 took ~12h on 2 B1 instances
~ 1.7mio packages (NuGet.org homepage says)
~ 2.1mio packages (the catalog says ☺)
~ 8 400 catalog pages
with ~ 4 200 000 catalog leaves
(hint: repo signing)
January 2020: ~ 2.6 mio packages / 3.5 TB
NuGet.org catalog Watch catalog
Index command
Find type API
Find namespace API
Search index
Index package
Raw .nupkg
Index as JSON
Download packageDownload command
Closing thoughts…
Would deploy in separate function apps for cost
Trigger binding collects all the time so needs dedicated capacity (and thus, cost)
Others can scale within bounds/consumption (think of $$$)
Would deploy in separate function apps for failure boundaries
Trigger, indexing, downloading should not affect health of API
Are bindings portable...?
Avoid them if (framework) lock-in matters to you
They are nice in terms of programming model…
Thank you!
https://blog.maartenballiauw.be
@maartenballiauw
Blog post with full story and implementation
https://bit.ly/2lzba32
👀

More Related Content

What's hot

JIRA REST Client for Python - Atlassian Summit 2012
JIRA REST Client for Python - Atlassian Summit 2012JIRA REST Client for Python - Atlassian Summit 2012
JIRA REST Client for Python - Atlassian Summit 2012
Atlassian
 
Using React with Grails 3
Using React with Grails 3Using React with Grails 3
Using React with Grails 3
Zachary Klein
 

What's hot (20)

Mastering Grails 3 Plugins - GR8Conf EU 2016
Mastering Grails 3 Plugins - GR8Conf EU 2016Mastering Grails 3 Plugins - GR8Conf EU 2016
Mastering Grails 3 Plugins - GR8Conf EU 2016
 
Nordic APIs - Automatic Testing of (RESTful) API Documentation
Nordic APIs - Automatic Testing of (RESTful) API DocumentationNordic APIs - Automatic Testing of (RESTful) API Documentation
Nordic APIs - Automatic Testing of (RESTful) API Documentation
 
Basic Git commands
Basic Git commandsBasic Git commands
Basic Git commands
 
Mastering Grails 3 Plugins - G3 Summit 2016
Mastering Grails 3 Plugins - G3 Summit 2016Mastering Grails 3 Plugins - G3 Summit 2016
Mastering Grails 3 Plugins - G3 Summit 2016
 
WordPress RESTful API & Amazon API Gateway (English version)
WordPress RESTful API & Amazon API Gateway (English version)WordPress RESTful API & Amazon API Gateway (English version)
WordPress RESTful API & Amazon API Gateway (English version)
 
Google Ajax APIs
Google Ajax APIsGoogle Ajax APIs
Google Ajax APIs
 
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
[Kotlin Serverless 工作坊] 單元 3 - 實作 JSON API
 
JIRA REST Client for Python - Atlassian Summit 2012
JIRA REST Client for Python - Atlassian Summit 2012JIRA REST Client for Python - Atlassian Summit 2012
JIRA REST Client for Python - Atlassian Summit 2012
 
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
Docker and serverless Randstad Jan 2019: OpenFaaS Serverless: when functions ...
 
OpenAPI Specification概要
OpenAPI Specification概要OpenAPI Specification概要
OpenAPI Specification概要
 
Building a Serverless company with Node.js, React and the Serverless Framewor...
Building a Serverless company with Node.js, React and the Serverless Framewor...Building a Serverless company with Node.js, React and the Serverless Framewor...
Building a Serverless company with Node.js, React and the Serverless Framewor...
 
Build REST API clients for AngularJS
Build REST API clients for AngularJSBuild REST API clients for AngularJS
Build REST API clients for AngularJS
 
Intro to Github Actions @likecoin
Intro to Github Actions @likecoinIntro to Github Actions @likecoin
Intro to Github Actions @likecoin
 
Using React with Grails 3
Using React with Grails 3Using React with Grails 3
Using React with Grails 3
 
用 Kotlin 打造讀書會小幫手
用 Kotlin 打造讀書會小幫手用 Kotlin 打造讀書會小幫手
用 Kotlin 打造讀書會小幫手
 
Consuming REST services with ActiveResource
Consuming REST services with ActiveResourceConsuming REST services with ActiveResource
Consuming REST services with ActiveResource
 
DevOps with GitHub Actions
DevOps with GitHub ActionsDevOps with GitHub Actions
DevOps with GitHub Actions
 
Introduction to bower
Introduction to bowerIntroduction to bower
Introduction to bower
 
goaを使った開発TIPS@六本木一丁目
goaを使った開発TIPS@六本木一丁目goaを使った開発TIPS@六本木一丁目
goaを使った開発TIPS@六本木一丁目
 
Alfresco javascript api - Alfresco Devcon 2018
Alfresco javascript api - Alfresco Devcon 2018Alfresco javascript api - Alfresco Devcon 2018
Alfresco javascript api - Alfresco Devcon 2018
 

Similar to Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and Search"

China Science Challenge
China Science ChallengeChina Science Challenge
China Science Challenge
remko caprio
 

Similar to Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and Search" (20)

CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and S...
CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and S...CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and S...
CloudBurst 2019 - Indexing and searching NuGet.org with Azure Functions and S...
 
Indexing and searching NuGet.org with Azure Functions and Search - Cloud Deve...
Indexing and searching NuGet.org with Azure Functions and Search - Cloud Deve...Indexing and searching NuGet.org with Azure Functions and Search - Cloud Deve...
Indexing and searching NuGet.org with Azure Functions and Search - Cloud Deve...
 
NDC Oslo 2019 - Indexing and searching NuGet.org with Azure Functions and Search
NDC Oslo 2019 - Indexing and searching NuGet.org with Azure Functions and SearchNDC Oslo 2019 - Indexing and searching NuGet.org with Azure Functions and Search
NDC Oslo 2019 - Indexing and searching NuGet.org with Azure Functions and Search
 
Let's build Developer Portal with Backstage
Let's build Developer Portal with BackstageLet's build Developer Portal with Backstage
Let's build Developer Portal with Backstage
 
NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017NuGet beyond Hello World - DotNext Piter 2017
NuGet beyond Hello World - DotNext Piter 2017
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Bapi jco
Bapi jcoBapi jco
Bapi jco
 
RESTful OSGi middleware for NoSQL databases with Docker
RESTful OSGi middleware for NoSQL databases with DockerRESTful OSGi middleware for NoSQL databases with Docker
RESTful OSGi middleware for NoSQL databases with Docker
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Akash rajguru project report sem v
Akash rajguru project report sem vAkash rajguru project report sem v
Akash rajguru project report sem v
 
OGCE Project Overview
OGCE Project OverviewOGCE Project Overview
OGCE Project Overview
 
Release 8.1 - Breakfast Paris
Release 8.1 - Breakfast ParisRelease 8.1 - Breakfast Paris
Release 8.1 - Breakfast Paris
 
China Science Challenge
China Science ChallengeChina Science Challenge
China Science Challenge
 
SgCodeJam24 Workshop
SgCodeJam24 WorkshopSgCodeJam24 Workshop
SgCodeJam24 Workshop
 
Continuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:InventContinuous Deployment @ AWS Re:Invent
Continuous Deployment @ AWS Re:Invent
 
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
 
Serverless Framework Workshop - Tyler Hendrickson, Chicago/burbs
 Serverless Framework Workshop - Tyler Hendrickson, Chicago/burbs Serverless Framework Workshop - Tyler Hendrickson, Chicago/burbs
Serverless Framework Workshop - Tyler Hendrickson, Chicago/burbs
 

More from Fwdays

More from Fwdays (20)

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Maarten Balliauw "Indexing and searching NuGet.org with Azure Functions and Search"

  • 1. Indexing and searching NuGet.org with Azure Functions and Search Maarten Balliauw @maartenballiauw
  • 2. “Find this type on NuGet.org”
  • 3. “Find this type on NuGet.org” In ReSharper and Rider Search for namespaces & types that are not yet referenced
  • 4. “Find this type on NuGet.org” Idea in 2013, introduced in ReSharper 9 (2015 - https://www.jetbrains.com/resharper/whatsnew/whatsnew_9.html) Consists of ReSharper functionality A service that indexes packages and powers search Azure Cloud Service (Web and Worker role) Indexer uses NuGet OData feed https://www.nuget.org/api/v2/Packages?$select=Id,Version,NormalizedVersion,LastEdited,Published&$ orderby=LastEdited%20desc&$filter=LastEdited%20gt%20datetime%272012-01-01%27
  • 5. NuGet over time... https://twitter.com/controlflow/status/1067724815958777856 https://github.com/NuGet/Announcements/issues/37
  • 7. V3 Protocol JSON based A “resource provider” of various endpoints per purpose Catalog (NuGet.org only) – append-only event log Registrations – materialization of newest state of a package Flat container – .NET Core package restore (and VS autocompletion) Report abuse URL template Statistics … https://api.nuget.org/v3/index.json (code in https://github.com/NuGet/NuGet.Services.Metadata)
  • 8. Catalog seems interesting! Append-only stream of mutations on NuGet.org Updates (add/update) and Deletes Chronological Can continue where left off (uses a timestamp cursor) Can restore NuGet.org to a given point in time Structure Root https://api.nuget.org/v3/catalog0/index.json + Page https://api.nuget.org/v3/catalog0/page0.json + Leaf https://api.nuget.org/v3/catalog0/data/2015.02.01.06.22.45/adam.jsgenerator.1.1.0.json
  • 10. “Find this type on NuGet.org” Refactor from using OData to using V3? Mostly done, one thing missing: download counts (using search now) https://github.com/NuGet/NuGetGallery/issues/3532 Build a new version? Welcome to this talk ☺
  • 11. Building a new version
  • 12. What do we need? Watch the NuGet.org catalog for package changes For every package change Scan all assemblies Store relation between package id+version and namespace+type API compatible with all ReSharper and Rider versions
  • 13. What do we need? Watch the NuGet.org catalog for package changes periodic check For every package change based on a queue Scan all assemblies Store relation between package id+version and namespace+type API compatible with all ReSharper and Rider versions always up, flexible scale
  • 14. Sounds like functions! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 16. Functions best practices @PaulDJohnston https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535 Each function should do only one thing Easier error handling & scaling Learn to use messages and queues Asynchronous means of communicating, helps scale and avoid direct coupling ...
  • 17. Bindings Help a function do only one thing Trigger, provide input/output Function code bridges those Build your own!* SQL Server binding Dropbox binding ... NuGet Catalog *Custom triggers not officially supported (yet?) Trigger Input Output Timer ✔ HTTP ✔ ✔ Blob ✔ ✔ ✔ Queue ✔ ✔ Table ✔ ✔ Service Bus ✔ ✔ EventHub ✔ ✔ EventGrid ✔ CosmosDB ✔ ✔ ✔ IoT Hub ✔ SendGrid, Twilio ✔ ... ✔
  • 19. We’re making progress! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 21. Next up: indexing NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 22. Indexing Opening up the .nupkg and reflecting on assemblies System.Reflection.Metadata Does not load the assembly being reflected into application process Provides access to Portable Executable (PE) metadata in assembly Store relation between package id+version and namespace+type Azure Search? A database? Redis? Other?
  • 24. “Do one thing well” Our function shouldn’t care about creating a search index. Better: return index operations, have something else handle those Custom output binding? Blog post with full story and implementation https://bit.ly/2lzba32 👀
  • 25. Almost there… NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 26. Making search work with ReSharper and Rider demo
  • 27. We’re done! NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 28. We’re done! Functions Collect changes from NuGet catalog Download binaries Index binaries using PE Header Make search index available in API Trigger, input and output bindings Each function should do only one thing NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 29. We’re done! All our functions can scale (and fail) independently Full index in May 2019 took ~12h on 2 B1 instances ~ 1.7mio packages (NuGet.org homepage says) ~ 2.1mio packages (the catalog says ☺) ~ 8 400 catalog pages with ~ 4 200 000 catalog leaves (hint: repo signing) January 2020: ~ 2.6 mio packages / 3.5 TB NuGet.org catalog Watch catalog Index command Find type API Find namespace API Search index Index package Raw .nupkg Index as JSON Download packageDownload command
  • 30. Closing thoughts… Would deploy in separate function apps for cost Trigger binding collects all the time so needs dedicated capacity (and thus, cost) Others can scale within bounds/consumption (think of $$$) Would deploy in separate function apps for failure boundaries Trigger, indexing, downloading should not affect health of API Are bindings portable...? Avoid them if (framework) lock-in matters to you They are nice in terms of programming model…
  • 31. Thank you! https://blog.maartenballiauw.be @maartenballiauw Blog post with full story and implementation https://bit.ly/2lzba32 👀