Tuning Search Relevance for Precision, Freshness, Keywords, File Types & Authoritative Content

Search Search relevance tuning is one of my favourite topics. Search relevance is an often forgotten or neglected activity, both before launching new solutions and most certainly after solution has been running for a while.

At my current client we have included findability as a key topic when training our intranet editors. Our new intranet is built on top of SharePoint search, and the amount menus and navigation has been reduced to a bare minimum. Our intranet front page has a huge search box across the top, and several webparts rendering content retrieved by contextual search queries. The intranet experience covers a publishing solution, enterprise search, user profiles, team sites and a document management system.

Customization Techniques

Here are some of the techniques we used to tune the search relevance.

Search Schema

Customization of the SharePoint 2013 search schema is the most common and easiest way to customize recall and precision of the search results. You can think of recall as the number results for your search queries, while precision is related to the order of the results.

Crawled properties are the metadata the crawler is able to retrieve and extract from the source content. The crawled properties must be included in the full text index in order to affect the search results. By default, crawled properties only contributes to recall and not precision (ranking).

For the crawled properties to influence ranking, they must be mapped to managed properties, and the managed properties must be configured with a context weight different from 0.

For details see https://technet.microsoft.com/library/7c8ddec1-c8ff-4a90-afae-387b27a653f1.aspx#Ranking_Schema

SharePoint Search image

Query Templates

Query templates are used by SharePoint 2013 to perform transformation of the user query entered in the search box before submission to the search index. Query templates can be configured for

• Search web parts (results and content by search)

• Query rules

• Result sources

Query templates can be used both to affect recall and precision. You can for instance add a query rule to add a block of results from an additional result source. The rule can be specified to trigger on specific conditions, such as search terms, or it could always trigger. Result sources are usually defined to scope the results.

Ranking Models

The ranking models define how the search engine calculates the relevance rank using various factors, which are represented in the ranking model as rank features. For details, see
https://technet.microsoft.com/library/7c8ddec1-c8ff-4a90-afae-387b27a653f1.aspx
https://msdn.microsoft.com/EN-US/library/office/dn169052.aspx

You can implement a custom ranking model in SharePoint 2013, but not in Office 365. Microsoft recommends that you base your custom ranking models on the Search Ranking Model with Two Linear Stages.

Our Actual Customizations

Proximity Boosting

One of the weaknesses of the default SharePoint 2013 ranking models is the small effect proximity has on ranking. This is often experienced as documents with complete title matches, or near title matches, is ranked lower than documents where term frequency is high (search term occurs many places in the same document).

Rank Only the top 1000 results are re-ranged using proximity boosting in the ranking model, since the built in proximity features are performance intensive.

We wanted to rank documents with complete and perfect title matches higher.
This was done by

1. Using XRANK in a query template to boost all items that contains the query phrase in the title. Proximity is not considered. We customized our search result sources for the Everything, Pages and Documents and Images and Videos verticals.

2. Tuning the weighting of proximity features in the ranking model. There are four proximity features in the Search Ranking Model with Two Linear Stages that has been customized. The weighting has been increased with 10 times the OOB values. The proximity features are only processed for the top 1000 items.

a. If all of the query terms are found in the item’s content (body), in the same order and with a maximum of one word in between (maximum distance is 1), the item is given extra rank.

b. If all of the query terms are found in the item’s title, in the same order and with a maximum of one word in between (maximum distance is 1), the item is given extra rank.

c. If all of the query terms are found in the item’s title, in the same order and with no words in between (maximum distance is 0), the item is given extra rank

d. If the query terms match the complete item’s title exactly, the item is given extra rank

Freshness Boosting

If a document was recently modified or modified years ago doesn’t affect the search results in SharePoint 2013 by default.
We wanted to rank newer documents higher than older documents in general, based on the date when the document was last modified.

Freshness boosting was added to both steps in the custom ranking model, and has effect on the entire result set.

Keywords

Keyword Boosting

By default, metadata columns in SharePoint 2013 don’t contribute to ranking of search results.
We wanted to rank items where query terms match keywords used to tag the content higher than content that hasn’t been tagged with the query terms.

This was achieved by mapping the crawled keywords to searchable managed properties, with context weight similar to filename. Relative contribution weight to ranking for keywords and filename is 0.15, while title contributes to 0.36 and content (body) contributes 0.02.

File Type Boosting

By default, SharePoint 2013 considers PowerPoint presentations to be more important than Word documents in general, and give additional rank for items of these types regardless of the query.

We wanted to rank pdf and word documents higher than PowerPoint presentations in general.
File type boosting was adjusted in both steps of the custom ranking model, and have effect on the entire result set.

“Trusted sources” content Boosting

SharePoint 2013 supports definition of authoritative sites to make content with few clicks away more relevant than other content. This technique was not sufficient to define authoritative content for my client’s sources.

We wanted the following content to be is considered as from trusted sources:
• All content crawled from a specific file share

• Content declared as a record in our custom built knowledge portals

• Approved and Published content in our document management system (DMS)

This was achieved by using a custom content processing enrichment service to set a value for content that satisfies the requirements above. In addition XRANK was added to the query templates for the Everything and Pages and documents result sources to boost content that has the value set in this property.

T4 - Petter Skodvin -Hvammen

About the author Petter Skodvin-Hvammen:

Petter _Skodvin -HvammenPetter Skodvin-Hvammen is a senior consultant, solutions architect and enterprise search advisor, working for Puzzlepart in Norway. With over 16 years in the industry, and experience from Microsoft, FAST and Accenture, he has architectured and built business critical solutions for dozens of clients all over Europe.

Petter has been a passionate community contributor for 10 years. He was a Subject Matter Expert in Microsoft and recognized as «top contributor» in the «Enterprise Search Community» in 2011. Petter was a speaker at last year’s European SharePoint Conference in Barcelona and the Smart Search Conference in Stockholm.

Check out last years European SharePoint Conference video:

European SharePoint Conference 2015 takes places in Stockholm Sweeden from 9-12 November 2015. View Programme>>

 

Share this on...

Rate this Post:

Share: