SlideShare uma empresa Scribd logo
1 de 24
So you think you can crawl?
Stretching the Boundaries of SharePoint 2013!
Petter Skodvin-Hvammen
AD-Gruppen, Norway
Who am I?
Petter Skodvin-Hvammen
Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD
• Solutions Architect
• SharePoint Consultant
• Search Enthusiast
• Community Lead
@pettersh - psh@adgruppen.no
www.adgruppen.no
Enterprise Search
Index thousands
of sources
Automate index
management
Infrastructure
sizing
Challenges and Solutions
Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
Enterprise Search using SharePoint Server 2013
• 30,000 users
• 85 locations in 30 countries
• 15,000 daily searches
• 100,000,000 documents(?)
• 60 core systems, 2,000 applications
The Mission…
What do we index?
100,000,000
documents
3,000
fileshares
500
servers
Where is the data?
• Datacenters
• Time zones
• Bandwidth
www.sharepointeurope.com
* http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx
How can we get it?
• Limit bandwidth usage for specific server locations
• Limit crawler impact within local business hours
• Grant read access to crawler per file share
• Avoid token bloat issues with more than 1,015*
groups per account
How do we operate it?
• File shares are created, changed, and deleted every
day using a custom self service solution
• File shares are moved between servers every day by
automation rules
• Manage indexing and crawling of each file shares with
minimum manual effort
www.sharepointeurope.com
What can SharePoint do?
• Max 50 content sources per service application
– Max 500 with October 2013 CU installed
• Max 100 start addresses per content source
– Max 500 with October 2013 CU installed
• Max 20 concurrent crawls per service application
– Limitation has been removed
http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
It’s complicated
• More data than we have space for
• It’s located all over the place
• Everything changes all of the time
• There are limitations in SharePoint
• Someone’s gotta maintain this
• It has to be secure and relevant
www.sharepointeurope.com
What did we do?
• Created logical groups of file shares
• Used symbolic linking
www.sharepointeurope.com
fewer
content
sources
file01share01
file02share03
file03share03
file00sharesym01
file00sharesym02
file00sharesym03
file00share
Start address
What did we do?
• Grouped file shares based on region
• One content source per region
• Incremental crawls every night
www.sharepointeurope.com
crawling
based on
time zones
What did we do?
• Created DNS alias per impact rule in
etc/hosts on crawl servers
www.sharepointeurope.com
reduced
crawler
impact
What did we do?
• Granted file share access to the
account included in least groups
• Monitored group memberships
• Grouped file shares by crawl account
• Crawl rules matched folder structure
managed pool
of crawl
accounts
file://.*/spcrwl01/.*
file://.*/spcrwl02/.*
Include
Include
SPspcrwl01
SPspcrwl02
www.sharepointeurope.com
The bigger picture
• Folder structure:
• Start addresses:
<content source>/<crawler impact>/<crawl account>/<symbolic link>
file://<crawler impact>/<content source>/<crawler impact>
Source Start addresses Folder Crawl rule Impact rule
Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default
europe/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default
asia/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
How did we manage this?
www.sharepointeurope.com
self service portal for
enabling indexing of
file shares
custom web service
integration in self service portal
custom solution for
granting access to
crawl accounts
custom timer job to get list of file shares
to crawl from self service portal
custom timer job for creating
and removing symbolic links
custom lists for mapping
server to content source, schedule
and impact, shares to crawl accounts
and metadata, UNC to symlink
content enrichment service for
replacing symlinks in paths with actual file paths
www.sharepointeurope.com
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: Assigned automatically
Crawl Account: Assigned automatically
CancelSave
Example: Self Service Portal Example: Custom Lists
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: file01share01
Crawl Account: SPspcrawl01
Symlink: defaulteuropedefaultspcrwl01e5dc12a41d
Location: europe (server file01 is located in Oslo DC)
Bandwidth: 5Mbps
Index-0
Query
WFE
Doc Proc
Crawling
Central Admin
Enrichment
Query
WFE
Index-2
Index-1
Index-3
Index-0
Index-2
Index-1
Index-3
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Crawling
Analytics
AdminAdmin
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Analytics
Doc Proc
Enrichment
Doc Proc
Enrichment
40Million
Documents
10Queries /
Second
SQL Server SQL Server
• Admin DB
• Analytics DB
• Crawl DB
• Link DB
• Other SP DBs
Caching Caching
Capacity testing
Purpose
• Crawling of symbolic links
• Scaling of virtual machines
• Sizing of disk space
• Verify Microsoft’s advises
Approach
• 4 server farm with 2 partitions
• 8 vCPU, 16 GB RAM, 850 GB
• Crawl 10 file shares (3.7M files)
• Replay top 300 queries
• Apache JMeter
www.sharepointeurope.com
Capacity testing – findings
• Crawl rate declined 1% per million items indexed
• Query latency increased exponentially from 12 million items
indexed per partition
• Database latency was insignificant during crawling
• Successfully crawled file shares via symbolic directory links
• Disk space usage was significant lower than expected
– Reduced data volume from 850 GB to 450 GB
– 40+ servers => huge cost savings
www.sharepointeurope.com
Infrastructure – VM sizing
Dedicated ESX Cluster
• 14 x VM for SharePoint 2013
– 4 physical machines
– 4 x 32 = 128 CPUs
– 4 x 56 = 1024 GB memory
• HA max utiliization = ¾
– 3 x 32 = 96 CPUs
– 3 x 56 = 768 GB memory
• CPU and Memory can be over-
commited
• CPU over-commited 1,34
(1,78 if one physical host fail)
• VM’s must wait for physical CPU
Wait time for 8 cpu = 2 x 4 cpu
• Mitigation:
a) Reduce allocated virtual CPU, or
b) Increase physical CPU
• Memory factor 0,44 (0,59)
• Reserved and locked memory
prevents HA failover
www.sharepointeurope.com
Infrastructure – VM tuning
www.sharepointeurope.com
DC Role vCPU Peak Average Calculated Recommended Change
A Web, Query, Admin 8 187,55 37,03 2 4 -4
B Web, Query, Admin 8 621,88 92,69 8 8 0
A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0
B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0
A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2
B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2
A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2
B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2
A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2
B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2
A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2
B Index 3, Content, CEWS 8 621,88 92,69 8 8 0
A Distributed Cache 4 91,71 5,99 2 2 -2
B Distributed Cache* (added later) - - - - - -
100 78 80 -20
Peak and average CPU usage is calculated over 30 days
Summary
1. Indexing thousands of content sources
2. Automation for rapid changing index requirements
3. Sizing the infrastructure for performance and HA
www.sharepointeurope.com
Questions?
petter.skodvin-hvammen@adgruppen.no http://linkedin.com/in/petterskodvin@pettersh

Mais conteúdo relacionado

Mais procurados

SharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldSharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldJethro Seghers
 
SharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsSharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsEric Shupps
 
Sps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvSps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvamitvasu
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...European Collaboration Summit
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandboxElaine Van Bergen
 
Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid worldJethro Seghers
 
How to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBHow to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBMicrosoft Tech Community
 
Rev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesRev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesSPC Adriatics
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Nik Patel
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.releaseDan Usher
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectEuropean Collaboration Summit
 
SPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchSPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchNCCOMMS
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft AzureK.Mohamed Faizal
 
What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?Jason Himmelstein
 
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and TaxonomyEuropean Collaboration Summit
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandboxElaine Van Bergen
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointDan Usher
 
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012Don Donais
 

Mais procurados (20)

SharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldSharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid world
 
SharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsSharePoint 2013 Performance Enhancements
SharePoint 2013 Performance Enhancements
 
Sps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvSps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitv
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid world
 
How to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBHow to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DB
 
Rev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesRev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best Practices
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
 
SPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchSPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint Search
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
 
What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?
 
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePoint
 
O365 Sydney - Hybrid Dev
O365 Sydney - Hybrid DevO365 Sydney - Hybrid Dev
O365 Sydney - Hybrid Dev
 
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
 

Semelhante a ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenI2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenSPS Paris
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...Petter Skodvin-Hvammen
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesEric Shupps
 
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...DIWUG
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSPC Adriatics
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)Brian Culver
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectNoorez Khamis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...European SharePoint Conference
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016Michael Kehoe
 
What's new in sharepoint 2016
What's new in sharepoint 2016What's new in sharepoint 2016
What's new in sharepoint 2016Mike Maadarani
 
Leveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationLeveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationDon Donais
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Brian Culver
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Eric Shupps
 
Tips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineTips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineAndries den Haan
 
SharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsSharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsNick Hobbs
 

Semelhante a ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013! (20)

I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenI2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
 
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search Operations
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
What's new in sharepoint 2016
What's new in sharepoint 2016What's new in sharepoint 2016
What's new in sharepoint 2016
 
Leveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationLeveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organization
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Tips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineTips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint Online
 
SharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsSharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - Announcements
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
 

Último

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 

Último (20)

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 

ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

  • 1. So you think you can crawl? Stretching the Boundaries of SharePoint 2013! Petter Skodvin-Hvammen AD-Gruppen, Norway
  • 2. Who am I? Petter Skodvin-Hvammen Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD • Solutions Architect • SharePoint Consultant • Search Enthusiast • Community Lead @pettersh - psh@adgruppen.no www.adgruppen.no
  • 3. Enterprise Search Index thousands of sources Automate index management Infrastructure sizing Challenges and Solutions Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
  • 4. Enterprise Search using SharePoint Server 2013 • 30,000 users • 85 locations in 30 countries • 15,000 daily searches • 100,000,000 documents(?) • 60 core systems, 2,000 applications The Mission…
  • 5. What do we index? 100,000,000 documents 3,000 fileshares 500 servers
  • 6. Where is the data? • Datacenters • Time zones • Bandwidth www.sharepointeurope.com
  • 7. * http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx How can we get it? • Limit bandwidth usage for specific server locations • Limit crawler impact within local business hours • Grant read access to crawler per file share • Avoid token bloat issues with more than 1,015* groups per account
  • 8. How do we operate it? • File shares are created, changed, and deleted every day using a custom self service solution • File shares are moved between servers every day by automation rules • Manage indexing and crawling of each file shares with minimum manual effort www.sharepointeurope.com
  • 9. What can SharePoint do? • Max 50 content sources per service application – Max 500 with October 2013 CU installed • Max 100 start addresses per content source – Max 500 with October 2013 CU installed • Max 20 concurrent crawls per service application – Limitation has been removed http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
  • 10. It’s complicated • More data than we have space for • It’s located all over the place • Everything changes all of the time • There are limitations in SharePoint • Someone’s gotta maintain this • It has to be secure and relevant www.sharepointeurope.com
  • 11. What did we do? • Created logical groups of file shares • Used symbolic linking www.sharepointeurope.com fewer content sources file01share01 file02share03 file03share03 file00sharesym01 file00sharesym02 file00sharesym03 file00share Start address
  • 12. What did we do? • Grouped file shares based on region • One content source per region • Incremental crawls every night www.sharepointeurope.com crawling based on time zones
  • 13. What did we do? • Created DNS alias per impact rule in etc/hosts on crawl servers www.sharepointeurope.com reduced crawler impact
  • 14. What did we do? • Granted file share access to the account included in least groups • Monitored group memberships • Grouped file shares by crawl account • Crawl rules matched folder structure managed pool of crawl accounts file://.*/spcrwl01/.* file://.*/spcrwl02/.* Include Include SPspcrwl01 SPspcrwl02 www.sharepointeurope.com
  • 15. The bigger picture • Folder structure: • Start addresses: <content source>/<crawler impact>/<crawl account>/<symbolic link> file://<crawler impact>/<content source>/<crawler impact> Source Start addresses Folder Crawl rule Impact rule Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default europe/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60 Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default asia/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
  • 16. How did we manage this? www.sharepointeurope.com self service portal for enabling indexing of file shares custom web service integration in self service portal custom solution for granting access to crawl accounts custom timer job to get list of file shares to crawl from self service portal custom timer job for creating and removing symbolic links custom lists for mapping server to content source, schedule and impact, shares to crawl accounts and metadata, UNC to symlink content enrichment service for replacing symlinks in paths with actual file paths
  • 17. www.sharepointeurope.com Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: Assigned automatically Crawl Account: Assigned automatically CancelSave Example: Self Service Portal Example: Custom Lists Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: file01share01 Crawl Account: SPspcrawl01 Symlink: defaulteuropedefaultspcrwl01e5dc12a41d Location: europe (server file01 is located in Oslo DC) Bandwidth: 5Mbps
  • 18. Index-0 Query WFE Doc Proc Crawling Central Admin Enrichment Query WFE Index-2 Index-1 Index-3 Index-0 Index-2 Index-1 Index-3 Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Crawling Analytics AdminAdmin Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Analytics Doc Proc Enrichment Doc Proc Enrichment 40Million Documents 10Queries / Second SQL Server SQL Server • Admin DB • Analytics DB • Crawl DB • Link DB • Other SP DBs Caching Caching
  • 19. Capacity testing Purpose • Crawling of symbolic links • Scaling of virtual machines • Sizing of disk space • Verify Microsoft’s advises Approach • 4 server farm with 2 partitions • 8 vCPU, 16 GB RAM, 850 GB • Crawl 10 file shares (3.7M files) • Replay top 300 queries • Apache JMeter www.sharepointeurope.com
  • 20. Capacity testing – findings • Crawl rate declined 1% per million items indexed • Query latency increased exponentially from 12 million items indexed per partition • Database latency was insignificant during crawling • Successfully crawled file shares via symbolic directory links • Disk space usage was significant lower than expected – Reduced data volume from 850 GB to 450 GB – 40+ servers => huge cost savings www.sharepointeurope.com
  • 21. Infrastructure – VM sizing Dedicated ESX Cluster • 14 x VM for SharePoint 2013 – 4 physical machines – 4 x 32 = 128 CPUs – 4 x 56 = 1024 GB memory • HA max utiliization = ¾ – 3 x 32 = 96 CPUs – 3 x 56 = 768 GB memory • CPU and Memory can be over- commited • CPU over-commited 1,34 (1,78 if one physical host fail) • VM’s must wait for physical CPU Wait time for 8 cpu = 2 x 4 cpu • Mitigation: a) Reduce allocated virtual CPU, or b) Increase physical CPU • Memory factor 0,44 (0,59) • Reserved and locked memory prevents HA failover www.sharepointeurope.com
  • 22. Infrastructure – VM tuning www.sharepointeurope.com DC Role vCPU Peak Average Calculated Recommended Change A Web, Query, Admin 8 187,55 37,03 2 4 -4 B Web, Query, Admin 8 621,88 92,69 8 8 0 A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0 B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0 A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2 B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2 A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2 B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2 A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2 B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2 A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2 B Index 3, Content, CEWS 8 621,88 92,69 8 8 0 A Distributed Cache 4 91,71 5,99 2 2 -2 B Distributed Cache* (added later) - - - - - - 100 78 80 -20 Peak and average CPU usage is calculated over 30 days
  • 23. Summary 1. Indexing thousands of content sources 2. Automation for rapid changing index requirements 3. Sizing the infrastructure for performance and HA www.sharepointeurope.com