Sitecore Search indexing


To have content available in Sitecore Search you add it as a source. There can be multiple sources and you can even filter on the origin source(s) when searching content.

The are various ways how content can be added to the source. The primary idea is that the website is crawled and content extracted from the page. That is a good and straightforward way that removes any requirements for the underlying technology stack of the website and solves the struggles we had in Sitecore CMS traditionally when pages are composed by multiple items/components.

Indexing

Sitecore Search have a few different connectors that let you get content into the index:

  • Web Crawler
  • Web Crawler (Advanced)
  • Feed Crawler
  • API Crawler
  • API Push

Often, the fastest and simplest method is to use the Web Crawler, I always use the Advanced variant, the simple one removes options from the UI, while the advanced variant allows you to skip the settings you won’t need.

The short introduction is that you will need to configure:

  • Trigger - this is the starting url for indexing. Often you have a Sitemap or Sitemap Index, as this is really important for your website ranking in public search engines, and this can be used out of the box.

  • Document Extractor - this is how to find the information on the pages. You configure how information can be found for each relevant attribute in your schema. This can be structured with simple XPath like expressions and fixed values, and even multiple expressions and fallback value for each attribute. Alternative you can write a Javascript function to resolve all the relevant attributes for the page.

It is important that a value is provided for all required fields.

We didn’t have any value for og:type on several pages and as the field is required we ended up with the pages not being added to the index. It was just a matter of specifying a fallback value, but keep an eye on it, and there can be a bit of feedback loop time when you have to wait for reindexing to see the result.

Adding attributes

Sitecore Search requires attributes to be defined in the schema, and there is no feature like the dynamic fields we had in Lucene/SolR. There is a basic set of attributes out of the box, I find that I will soon need additional custom fields for filtering, facets, boosting, etc. This is completely supported with Sitecore Search.

A note about permissions

As mentioned you can add attributes to the search schema from the domain settings page. I have now experienced several times that when being added to a Sitecore Search account I was assigned the Admin role and hereby cannot see the Domain Settings option but only the .

To have access to the Domain Settings you need to have the role called “Tech Admin”. I have been told that the Admin is intended as a Business admin while the technical admin ensures that data is available on a technical level. It has not been a problem to get the proper access assigned, even though another round of assignment request was required, it was more about the confusion that we could not find the expected settings.