This document discusses storing binary large objects (BLOBs) in SharePoint and the options for externalizing BLOB storage. It notes that typically 80% of enterprise SharePoint data consists of BLOBs stored in SQL Server databases. Externalizing BLOBs to other storage options can improve performance and reduce storage costs. The document compares using external BLOB storage (EBS) and remote BLOB store (RBS) interfaces, and evaluates factors to consider like backup/recovery and the benefits of various third-party solutions.
2. Agenda
What are BLOBs?
How BLOBs affect SharePoint storage
Options for externalizing BLOBs
Benefits & drawbacks of externalization
Understanding the Big Picture
FILESTREAM provider vs third party
Q&A
7. The burden of BLOBs
Typically, as much as 80 percent of
data for an enterprise-scale
deployment of SharePoint
Foundation consists of file-based
data streams that are stored as BLOB
data. These BLOB objects comprise
data associated with SharePoint files.
msdn.microsoft.com/en-us/library/bb802976.aspx
8. The SharePoint storage dilemma
Documents, databases, and BLOBs
Storage growth
SharePoint
SQL Server
2008/R2
Content
Database
Content
Database
Content Database
Active Content Actual Content
9. Planning for SharePoint storage
Planning for implementation and future growth
From: Physical Storage Recommendations (SharePoint Server)
http://technet.microsoft.com/en-us/library/cc298801.aspx
10. Content scaling support & guidance
SharePoint 2010 RTM July 2011
Content Database Content Database
200 GB (collaboration) 200 GB (out-of-box)
1 TB (Records Center) 4 TB (collaboration)*
Unlimited (archive)*
Site Collection
Site Collection
100 GB
100 GB (out-of-box)
200 GB (1 site in a CDB)
Up to size of CDB*
Items per CDB
60 million
* Conditions apply
10
11. How big are your
biggest content
databases?
0-100 GB
100-200 GB
200-500 GB
More than 500 GB
I have no idea
13. Advantages of keeping BLOBs in SQL
One storage container to
Maintain
Monitor
Recover
Tier I storage
Performance relative to lower tiers of storage
benefits all content access
SQL caching
Performance of reads/writes of small documents
SQL Caching benefits reads
14. Disadvantages of large CDBs & BLOBs
Storage cost SQL operations
Content Locking
Current version Reindexing
Previous versions
Performance
Transaction logs
Slower reads with large
Backups documents
SLA timeframes BLOBs written twice
Backup
Recovery
15. Overview of BLOB externalization
Pointer
(stub)
EBS/RBS
Upload Database
User
Web Front-end
Disk Storage
16. Options for externalizing
External BLOB Storage (EBS)
Released with SharePoint 2007 SP1
Supported by SharePoint 2010
Remote BLOB Store (RBS)
Released in SQL Server 2008 R2 Feature Pack
Can be installed on SQL Server 2008 SP1
Supported by SharePoint 2010
Both are interface specifications
Need provider to communicate with BLOB store
20. Externalized BLOBs are transparent
Check-in/Check-out
Versioning
Office applications
Search crawling (indexing)
Workflows
Alerts
Anything using the SharePoint API
Third party tools
Custom code
22. Why Are BLOBS a Problem?
Maintaining large quantities of BLOB
data in a SQL Server database is a
suboptimal use of SQL Server
resources. You can achieve equal
benefit at lower cost with equivalent
efficiency by using an external data
store to contain BLOB data.
msdn.microsoft.com/en-us/library/bb802976.aspx
23. Advantages of externalizing
Storage Cost
Performance
Read*
Large documents
Write*
All other activity by users of the CDB and SQL server
SLA timeframes
* Dependent on performance characteristics of BLOB store
Storage Cost
Performance
24. What’s the performance gain?
SQL BLOB RBS Gain
Database Size - 1 TB 2292 GB 26 GB 98.9%
Database Backup Size - 100 GB 217 GB 7 GB 96.8%
Database Backup Time - 217 GB 2490 sec 38 sec 98.5%
Database Defrag Time - 100 GB 120 sec 4 sec 96.7%
Avg. SharePoint Response Time 28 msec 21 msec 25.0%
Large File Upload - 500 MB 55 seconds 29 seconds 47.6%
Source: SQL Server RBS Performance with SharePoint
Microsoft Download Center
25. Advantages of externalizing
Storage Cost
Performance
Read*
Write*
All other activity by users of the CDB and SQL server
SLA timeframes
Access to features of BLOB storage platform
Efficient content restructure
Storage Cost
Performance
26. EBS and RBS
EBS RBS
API SharePoint SQL + SharePoint client
SharePoint version WSS v3 or MOSS SharePoint 2010, v.Next
2007 SP1 or later
SQL version Any SQL Server 2008 or R2
Externalization rules File size and type File size
Scope Site collection Content database
Microsoft support Deprecated Likely to continue
Provider Third party SQL FILESTREAM
Third party
27. Choosing third-party externalization
Performance
BLOB store platform
File system, SAN, NAS
Shared folder
Cloud
Integration with your storage platform
Externalization rules
Manageability
Backup, recovery
High availability
Disaster recovery
Long term retention, archiving & tiered storage
Cost
28. All things considered
Storage Cost
Archiving and
Performance
Disposition
Maintenance
Migration and
Administration
Backup &
Updates and
Recovery SLAs
Upgrades
(RTO/RPO)
High Availability
& Disaster
Recovery
SharePoint Uncensored
30. Content lifecycle meets RBS
RBS determines whether BLOB gets externalized to
another storage platform on upload
Archiving & disposal
Third party utilities can add business rules to
externalization
S
Archiving and
disposition
31. Externalization: Support and storage
THIRD
FEATURE FILESTREAM
PARTY
SharePoint 2010 (Server and Foundation)
SharePoint 2007 (MOSS 2007 and WSS v3)
All SQL versions (2000, 2005, 2008, 2008 R2)
All SQL editions (Express/Standard/Enterprise)
Externalize BLOBs to DAS, iSCSI NAS/SAN
Externalize BLOBs to file share, WORM
Externalize to the Cloud (Azure, Amazon etc)
Native compression and encryption
Externalize to multiple storage providers within 1 CDB
32. Externalization: Backup, recovery, DR
THIRD
FEATURE FILESTREAM
PARTY
Synchronous backup of BLOB store & SharePoint
Backup of content DB independent of BLOB store
Item level Recovery
Platform Level Recovery
Restore without DB staging
33. Externalization: Content lifecycle support
THIRD
FEATURE FILESTREAM
PARTY
Content restructure (shallow copy) across Web apps
Content replication
Connect to and manage file shares through SharePoint
Connect to and manage media shares through SharePoint
Business rule support (content type, metadata, access date)
Externalize to hardware-based HSM
34. Resources
SharePoint Team Blog: Data Storage Changes in SP1
http://tinyurl.com/3rlvfnp
MSDN: External Storage of Binary Large Objects
(BLOBs) in SharePoint Foundation
http://tinyurl.com/yay545y
White paper: SQL Server RBS Performance with
SharePoint Server 2010…
http://tinyurl.com/3ccucww
35. Dan Holme Randy Williams
dan.holme@avepoint.com randy.williams@avepoint.com
Submit your questions!
ABSTRACTSharePoint and SQL Server offer several methods with which you can externalize the BLOBs—the files that are stored in libraries and as attachments—to reduce the size of your content databases. BLOB externalization offers benefits beyond reduced storage cost—benefits that may surprise you! But there's a lot of hype and misunderstanding about BLOB externalization, which is neither a silver bullet nor evil incarnate. Join SharePoint MVPs Dan Holme and Randy Williams for a balanced, intelligent, detailed examination of the technologies and issues surrounding BLOB externalization. Dive deep into the performance impact, scalability implications, and critical considerations for backup and restore, high availability, and disaster recovery. To BLOB or not to BLOB? There are no easy answers, but after this session you will have what it takes to maximize the potential of BLOB externalization in your storage architecture.
IT DEPENDS!!!
Introduce concept of documents being stored as BLOBs in CDBBUILD: Diagram of architectureDiscuss storage growthBUILD: Bloat of data, mostly inactiveBUILD: Burden on CDBsDiscuss need to thin about storage holistically: lifecycle, compliance, SLAs, cost
SharePoint Content Databases leverage a few tables related to documents:dbo.AllUserDataContains metadata for all items (list and library) in the content databaseIdentifiers for any documentdbo.AllDocsContains references to a document along with metadata about document. This metadata is internal SharePoint metadata—not user metadata (which is in AllUserData)dbo.AllDocVersionsIf versioning is enabled, this table contains metadata about the version.Identifier for the streamdbo.AllDocStreamsContains the document itself (or a reference to the remoted BLOB if using RBS)Identifier for the streamReferencehttp://msdn.microsoft.com/en-us/library/dd303586(v=PROT.13).aspx
Introduce concept of documents being stored as BLOBs in CDBBUILD: Diagram of architectureDiscuss storage growthBUILD: Bloat of data, mostly inactiveBUILD: Burden on CDBsDiscuss need to thin about storage holistically: lifecycle, compliance, SLAs, cost
We can estimate the amount of storage space necessary by doing a few simple calculations.Let’s assume a medium sized organization with 1,000 users is rolling out SharePoint. Assuming the system is well-adopted, users are adding 1MB per day into the farm.After a year, that’s 250GB, and we haven’t even discussed Versioning.Don’t forget that 100 versions of a 10MB Word Document will ultimately take up 1GB of storage in your content database.Also, the central-admin recycle bin settings default to storing 50% of the live sites’ data for second-stage deleted itemsThen there’s the SQL Server transaction logsSee how SharePoint databases can grow so quickly?
Discuss the challenges of RTM guidance: what was “guidance” and what was “support”?CONDITIONS APPLYContent databases of up to 4 TB are supported when the following requirements are met:Disk sub-system performance:0.25 IOPs per GB minimum2.00 IOPs per GB recommended for optimal performanceTTFB of 20msArchitecture and tools must support performance expectations, future capacity, backup, restore, high availability, disaster recoveryDiscussion: Does anyone have more than a terabyte of data in their farm? Does anyone have a database larger than 200GB? Are there any negative performance impacts? Does anyone have 2GB / 1GB / 500MB files stored in SharePoint? How do they perform? How fast is your SharePoint farm growing? If you haven’t deployed SharePoint, how do you know how much storage you’ll need?
Out of the box, RBS monitors for file uploads into SharePoint and checks the file size. If it’s over a certain configurable threshold, the upload will be split apart:The metadata goes to the databaseThe file goes to a file shareThe end user never knows the difference. All normal SharePoint features (site quotas, workflows, etc) still apply to the content.
There was a note on this slide:Cannot have both providers in “write” modeWhat’s this? I’m not familiar with this, though it does kind of make sense… What’s the scope? Per WFE/CDB?
Optimizing SharePoint storage improves performance. Here are the results of a Microsoft Whitepaper on RBS performance that you can download from their website. I’ll provide the link at the end of this session.Database SizeBy optimizing the database, the database size was reduced by over 98% which reduced the backup size and consequently, the backup time.Index RebuildsOccasionally SharePoint will rebuild the database indexes to prevent fragmentation, when it does the farm typically becomes unavailable. Externalizing the BLOBs via RBS significantly helps alleviate this problem because the smaller database requires less time to rebuild the indexes.Avg. SharePoint Response TimeEnabling the RBS feature results in smaller SharePoint content databases that in turn require fewer resources on the SQL Server database server to execute the queries. The saved resources are freed up to process the existing queries faster and to service more queries.Given that productivity and satisfaction of SharePoint users are often dependent on the SharePoint transaction response times, a 25% reduction in response times would result in higher levels of productivity and satisfaction.Large File Uploads:People won’t use SharePoint for large files if the uploads take too long. They’ll stick to file shares (BAD). RBS significantly boosts the performance of file uploads for large files because it doesn’t store the BLOB in the database and therefore neither the database overhead nor the double write penalties apply.