I'm creating an extranet at the moment where people will be able to download documents specifically for them. Each user will have access to at least 1 document but it will vary a lot. 100+ documents will not be uncommon.

Around 10GB of documents will need to be uploaded. The sheer number of documents is for me a good indication that a custom index can help me optimize on how the extranet gets used.

A perfect scenario for a custom Lucene index

The most important reasons for creating a separate index are the following:

  • you can index only the items you need, speeding up (re-)indexation
  • you can index only the fields you really need, speeding up (re-)indexation
  • your index will only contain what you need, making it faster to search what you need
  • you can control when the index gets updated

I do want to mention that it is not always advisable to store these kind of documents/assets in the media library of Sitecore. A CDN will help to reduce the load on your webserver, provides ample caching options and will reduce latency.

Setting up a new index

Some steps are required off course. Lets go through them!

1. Add the index definition

Your Website\App_Config\Include folder contains several index configurations already. Have a look at the information on the Sitecore documentation website.
An example is the Master index: Sitecore.ContentSearch.Lucene.Index.Master.config

1.1 Create new config file

Let's be lazy, I always copy an existing index definition file and rename it.
Copy the Master index file and rename it appropriately: Sitecore.ContentSearch.Lucene.{your-index-name}.{Database}.config.
Example: Sitecore.ContentSearch.Lucene.ExtranetDocuments.Web.config.

1.2 The very basic xml structure it contains

Each index will contain the following basic xml structure:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
        <indexes hint="list:AddIndex">
          <!-- Your custom index configuration comes here. -->
        </indexes>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>
1.3 Update the index definition itself

You need to be aware of 4 major configuration settings for an index definition. Ask yourself the following questions:
1) Which unique id will I use for my index?
I've chosen for: id="extranet_documents_index"

2) Which update strategy do I want to use?
The default is onPublishEndAsync which will index the items after a publish end:

<strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />

You can specify more than 1 strategy but for performance reasons you shouldn't define more than 3. Use this Sitecore documentation to decide the ones that are best for you.

3) Which database do I use for fetching the items that need to be indexed? web or master?
The items you index must come from somewhere. Since I only need this for the public website, I'll use the web database.

<Database>web</Database>

If you need different results on the Content Management server, you'll need to create a 2nd index that uses the master database.

4) Which root item do I want the indexing to start from?
Custom indexes are often used for specific purposes. Speeding up indexation helps by specifying a root item, limiting the number of Sitecore items it will try to process.
All my documents are stored in an extranet folder in the media library.

<Root>/sitecore/media library/Extranet</Root>

Have an answer to those questions? Update your definition with it!

<index id="extranet_documents_index" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
  <param desc="name">$(id)</param>
  <param desc="folder">$(id)</param>
  <!-- This initializes index property store. Id has to be set to the index id -->
  <param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
  <configuration ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration" />
  <strategies hint="list:AddStrategy">
    <!-- NOTE: order of these is controls the execution order -->
    <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
  </strategies>
  <commitPolicyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch">
    <policies hint="list:AddCommitPolicy">
      <policy type="Sitecore.ContentSearch.ModificationCountCommitPolicy, Sitecore.ContentSearch">
        <Limit>300</Limit>
      </policy>
    </policies>
  </commitPolicyExecutor>
  <locations hint="list:AddCrawler">
    <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
      <Database>web</Database>
      <Root>/sitecore/media library/Extranet</Root>
    </crawler>
  </locations>
  <enableItemLanguageFallback>false</enableItemLanguageFallback>
  <enableFieldLanguageFallback>false</enableFieldLanguageFallback>
</index>

By having the copied and modified index definition config file, you will already have a usable index. So what else is there to do?

2. The index configuration

What exactly gets indexed now? Which fields are usable?
In the definition file, a reference was made to the defaultLuceneIndexConfiguration. Which is a reference to a node defined in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file.

<configuration ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration" />

You can create your own configuration so you have more control of what gets indexed.

It is not recommended to just blindly copy the DefaultIndexConfiguration.config file.
The file contains more than just the indexConfiguration node. For example the settings nodes.

A great suggestion by Mikkel is to start with a configuration that references sections of the defaultLuceneIndexConfiguration.
Please note that the referenced article is about SC 7.

This will keep your custom configuration small and you can overwrite only the sections you want.

The defaultLuceneIndexConfiguration is around 950 lines long and is full of comments to help you along. Perhaps a future blog post will dedicate more about all the different options.


Some additional resources