This is an insightful blog by Wictor Wilen. Wictor was a speaker at the
European SharePoint Conference 2011. Why
not see if you can expand your knowledge from
reading Wictor's blog post!
For a couple of weeks (ahem, months) I've been struggling with a
strange Search Service Application issue. Some time back I went to
check out on some Crawled Properties when making a tool to help
copying settings between SSA's (more on this tool in another post).
Then I noticed that there were tons of Crawled Properties with just
garbled binary data(!) as the property name.
I searched like crazy for a while to find where these came from,
there was nothing in the logs of any kind related to this. I could
not locate any documents related to the Crawled Properties. I could
not delete them, somehow they are connected to some content (at
least that's what the system says), but there were no document
samples. I created a new SSA and crawled the same corpus and the
same (almost) corrupted junk crawled properties appeared.
A couple of days back I finally found out where they came from.
With the grace from Microsoft support I did some queries on the SSA
databases and found another crawled property that was inserted at
the same time as these junk properties. This property had document
samples! And those samples were e-mail .msg files. And specifically
it was e-mails with encrypted content! I copied these files onto a
brand new farm without any content and was able to reproduce the
How to reproduce the corrupted Crawled
To verify that this had to do with encrypted e-mail messages, and
just not the ones I found; I encrypted an e-mail, sent
it to myself and exported it as an .msg file. I took this .msg file
and added it to a document library in a hot new VM (yea, I only got
new ones since I had to rebuild all my VM's last weekend due to a
corruption issue on one of my base images). Then I fired of a full
crawl, with full logging enabled, and watched the Crawled
Properties of the SSA. And as expected they showed up after just a
So, be careful about having encrypted e-mail messages in your
farm! Or prevent the issue...
How to prevent the issue
So, how do I get rid of these corrupted properties?
Unfortunately there is no good (supported) way, at the time of this
writing, except deleting your SSA and create a new one and before
crawling the data remove the files or create a Crawl Rule.
If you already have these corrupted properties or if you want to
prevent new corrupted properties you can create a Crawl Rule that
excludes .msg files. That will help the situation - but you will
not be able to search the .msg files (if they are encrypted you
cannot search them anyways!).
These corrupted properties does not do any harm. You
cannot use them and you don't notice anything else on the SSA
except that they are there and annoys you. Only thing I can think
of is that if you have a lots of them, you can run into trouble.
There is a limit of 500.000 crawled properties per SSA! Sounds a
lot, but for two .msg files I saw about 1.600 corrupted
I hope this helps someone and I hope there will be a fix for
this in the future - if that happens I'll update the post.
Why not keep up to
date with Wictor's amazing blogs by joining our community or by
following us on twitter or facebook!