The Truth Behind Shredded Storage

I’ve had many people ask me about how Microsoft SharePoint 2013 Shredded Storage and Remote BLOB Storage work together, and how AvePoint supports this from a storage optimization perspective. Bill Baer, Senior Product Marketing Manager and Microsoft Certified Master for SharePoint, posted about this topic during SharePoint Conference 2012 and hopefully clarified a lot of the confusion around Shredded Storage. SharePoint MVP Dan Holme, AvePoint Enterprise Trainer & Evangelist Randy Williams, and I put together an article for Dan’s weekly column on Shredded Storage a few weeks ago which is also worth a read.

In my opinion, the name is the most confusing thing about it – by using the word “storage” it implies that this feature is for storage optimization and, as Bill Baer rightly points out, this functionality was totally focused on file i/o optimization, although there have been some benefits of storage savings as we’ll talk about below.

Shredded Storage comes in two parts: part one is about getting the document to the Microsoft SQL Server and part two is about storing it in SQL. Part one is an enhancement of the Cobalt feature introduced in SharePoint 2010 and part two is a new feature introduced to store deltas of documents in SQL. The feature set is only available on SharePoint 2013 and is available in SQL 2008 R2 + patch and SQL 2012.

In a nutshell, the Cobalt feature “shreds” the BLOB sent from the client machine (e.g. the user’s desktop PC) to the Web Front End (WFE) server and then directly through to the SQL Server. The SharePoint 2010 Cobalt framework sent the “shred” from the client to the WFE server ONLY. In SharePoint 2013, it continues to support only Office XML document types and sends the shreds all the way to the SQL server. Why is this a good thing? Well, because the shred only went to the WFE server in Cobalt v1, it meant that the WFE server had to fetch the whole BLOB binary file from the SQL server, doing a merge on the WFE and then send it back to the SQL server – which meant a lot of file i/o duplication and network hopping.

The confusion begins during the second part of storage in SQL, it doesn’t do a merge and create a new whole BLOB binary of the new version of the document, it only stores the shreds or “deltas” as discussed in the SharePoint Conference 2012 keynote. When you open the latest version of the document, it combines the shreds and returns it to the WFE which in turn sends it to the client. The shreds are stored in a new table called DocStreams in the SQL Content Database allocated for the site collection, and a separate table keeps a list of all the pointers that make the overall BLOB up for that version.

Supported Formats

The Cobalt aspects of Shredded storage work only with Office XML documents still. The storage to SQL shredding works with all documents we’ve tested from Office XML documents (2010/2013 format), Office Binary Documents (< 2007 format), PDF, JPEG, etc. But as discussed later, Office XML Documents have some benefits over others. This is interesting as all non-Office XML documents will actually be shredded on the SQL server as it receives the entire binary BLOB file, so in the case of non-Office XML documents, shredded storage is purely a storage optimization benefit.
Differentials

For each version of the BLOB saved in SharePoint, it only stores the differential shreds. It does not touch the existing shreds created, but “magically” works out the differential shreds to store. Interestingly, it actually shreds all documents over the defined size regardless of whether versioning is turned on in the library but, if versioning is turned on will store deltas of each cumulative version of that library item. This is a huge savings for companies that do have lots of document versions in their SharePoint libraries, but there is obviously no benefit of shredding a document if versioning is not enabled. What we have found already is that the efficiency of the deltas to save storage, compared to de-duplication if RBS is enabled, is significantly less optimal.
One thing to point out, though, is if I have 50 copies of the same document across multiple SharePoint sites, it does not do the differentials at this level, it only does it at the document (item) scope. So there is no saving in this scenario, either.

One nice feature that was not included in prior versions of SharePoint is that if I just edit the metadata in the list item within SharePoint without editing the attached Office XML Document, it doesn’t create a new version of the BLOB in the SQL table. Note this isn’t the case with non-Office XML documents. This will result in tremendous storage capacity savings for some customers.

Shred Size
It appears to shred any BLOB, and, to date, our research has shown that the shred size is inconsistent and varies depending on the file format. For example, a 156K JPEG file had 6 shreds in version 0.1, a 1Mb .docx had 12 shreds as shown in screenshot below. Please note that in this example, the sum total of the shreds is in fact LARGER than the original 1Mb document and is therefore inefficient storage optimization.

Shredded Storage
There are some variables in the API that can be set at content database level; the default is 64320Kb for the maximum size of the shred. If the file is less than the maximum size set, then it simply won’t shred the file at all. More details are available in Bill’s post.

Existing Data

A key issue to point out is that if you upgrade your existing SharePoint 2010 Content Databases to SharePoint 2013, they will not benefit from Shredded Storage until a new document version is created.

Turning Off Shredded Storage
Shredded Storage can be turned off for a web application, site collection, and site (web) level – the default setting is AlwaysDirectToShredded. If you turn off Shredded Storage, SharePoint goes back to acting like it did in SharePoint 2010…Cobalt v1 style. This means that you have potentially higher file i/o on between the WFE and no storage savings on deltas of versioned files.

What happens when you enable RBS?

When you turn on RBS with a content database that has Shredded Storage enabled, the real-time RBS provider receives each shredded BLOB individually. These shreds are extremely small and as our RBS research in 2010 proved with our white paper, storing BLOBs outside of the SQL database that are less than 1Mb is, in general, inefficient. This is why we recommend setting up RBS rules that leave files less than 1Mb in the content database.

Our scheduled RBS product (DocAve Storage Manager) will work fine with Shredded Storage, as when Storage Manager calls SharePoint to externalize it we do get the full BLOB. We can also do more sophisticated business rules to decide whether we externalize it with RBS also.

By adding the RBS Provider into the mix, when I’m fetching the 69th version of a document, it’s going to get REAL chatty with the RBS provider fetching all the individual shreds. The shred size can potentially be changed up to 1Mb to be more efficient from an RBS perspective, but until we get more data from our labs we have no concrete guidance here yet. Some preliminary performance stats are available below.

Fetch Performance

From a performance perspective, for instance, if I save a 10Mb document 100 times and store each version – changing randomly a few paragraphs all over the document – to fetch version #69 or even the latest version it must merge all of the relevant shreds and do so all in the SQL software layer. This concerns me A LOT as it will be a huge performance overhead to do this over simply fetching the entire BLOB version like in SharePoint 2010!

The table below illustrates the time it took to perform a full SP-Export on the entire site collection, based on the different configurations with exactly the same content data set:

Shredded Storage
DB size (Mb)
RBS size (Gb)
Export time (secs)
Off
24724.88
Off
1477
Off
54.58
23.40
1882
Default – 64Kb chunk
6000.31
Off
2471
Default – 64Kb chunk
103.25
6.35
3502
1Mb chunk
6749.30
Off
2005
1Mb chunk
95.19
6.25
3309
1Gb chunk
13349.81
Off
1745
1Gb chunk
74.00
12.40
2096

From this you can see that there is a 40% increase in the time it takes to perform an export with Shredded Storage switched on, and in this content sample set a 75% saving in storage size. This will differ a lot depending on the type of content you are versioning. You’ll note that with Shredded Storage and Remote BLOB Storage on, there is a 58% increase in time taken. More notably, there is only a 22% increase if only Remote BLOB Storage is enabled and de-duplication was switched on – dramatically reducing the externalized BLOBs.

These initial performance tests were done on virtualized hardware based on recommendations on TechNet and NetApp infrastructure for the externalized content.

Our Current Recommendation

The main reason that Microsoft built Shredded Storage was to overcome the file i/o problems in Office 365 – SharePoint Online. This problem may not exist in your farm, and if you are simply looking for storage optimization you should consider the same technique that was common in SharePoint 2010 with a Remote BLOB Storage provider and de-duplication in your attached storage.

De-duplication will also work across all externalized BLOBs – not just at library item scope – and also work with all document types. De-duplication is also a hardware file i/o operation rather than a software operation as its built into the bare metal of the attached storage devices. From speaking to various infrastructure vendors, a general rule of thumb is that you can realize 88% storage capacity saving by doing this.

A key point to note here is that if you do externalize your BLOBs using RBS and have de-duplication on, you will immediately realize the storage optimization savings unlike Shredded Storage, which requires you to create a new version of the document before you start receiving the benefits.

The Future

We are working hard in a lab environment right now to produce our own figures taking into account our DocAve platform. A white paper will be published shortly with these findings, so please keep a look out on DocAve.com.

AvePoint are already signed up as Diamond Sponsors for the European SharePoint Conference 2013.

Stay tuned for more SharePoint content by joining our community or by following us on twitter or  facebook

Share this on...

Rate this Post:

Share: