Have page content available with Sitecore Content Search


Search is an important part of most websites to allow visitors to find content across the structure. Sitecore has with Content Search and the underlying indexing mechanism an effective search engine available where most of the tasks are handled out of the box. When publishing an item a SolR document is created with all fields on the page.

Indexing can be extended with additional field configuration and computed fields allow logic to create the content for the field. There are already several computed fields available, eg. for creating the url of the site.

What is ContentSearch

ContentSearch is an abstraction that allows Sitecore to support multiple search engines and still have the same code for the website. Behind the scenes this is solved with a LinqQueryProvider so in code we can create queries with Linq and Sitecore ContentSearch handles the translation to actual SolR queries and the required syntax. I am aware of three providers for ContentSearch, SolR is today the most common and supported out of the box. Previously there have also been a Lucene provider which was the traditional search engine in earlier versions of Sitecore, it is working with physical index files placed within the web server so it gave challenges with scaling eg. when you have multiple Content Delivery servers, content must be indexed for each server and could potentially give different results. SolR is an independent server based on Lucene so much of the query syntax is similar while several of the scaling is solved. There is also an Azure Search based provider for ContentSearch that allows you to use this cloud technology from Microsoft but everything is not working the exactly same was as with SolR.

Often we want the editors to be able to create and customize the content of the page, eg. by inserting components with certain functionality such as carousels, timelines, special heroes, tables, etc. and with the superior decoupling of content from presentation in Sitecore you can have the same content rendered on multiple ways and on multiple pages.

However, as the default indexing is working on item level there are no links to those components and hereby any text within those components can not be found when searching for pages.

With computed fields and GlassMapper we have been able to solve this challenge and keep the logic in each individual component to define if there is any content that should be indexed on page level.

How we solved this

In a computed field we can fetch the layout of the page and hereby find components inserted on the page and loop through those. With GlassMapper we register a specific implementation class to handle the data source item (and often use directly in a View). We have defined an interface the component data source models can implement and provide content for the computed field.

1
2
3
4
5
6
7
namespace Foundation.Search
{
    public interface ISearchPageContentComponent
    {
        StringBuilder ToSearchPageContentText(ISitecoreService sitecore);
    }
}

Here is one of the models for a component where the content is provided for search. By implementing the interface it is very simple for the component developers to understand the requirements and support searching site wide

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
namespace Feature.Article.Models
{
    [SitecoreType(TemplateId = "{A9943086-93F9-44B5-AA0F-C0C4E97620D3}", EnforceTemplate = SitecoreEnforceTemplate.Template)]
    public class InfoGraphicListModel : SitecoreItemBase, ISearchPageContentComponent
    {
        [SitecoreField("Headline")]
        public virtual string Headline { get; set; }

        [SitecoreQuery(".//*[@@templatename='Info Graphic Datasource']", IsRelative = true, EnforceTemplate = SitecoreEnforceTemplate.Template)]
        public virtual IEnumerable<InfoGraphicModel> InfoGraphics { get; set; }

        StringBuilder ISearchPageContentComponent.ToSearchPageContentText(ISitecoreService sitecore)
        {
            var sb = new StringBuilder();
            sb.AppendLine(Headline);
            foreach (var i in InfoGraphics)
            {
                sb.AppendLine(i?.Text);
            }

            return sb;
        }
    }
}

The actual computed field is in implemented by inheriting abstract Sitecore.ContentSearch.ComputedFields.AbstractComputedIndexField class. To be able to use GlassMapper it is important to set the SiteContext, even though this is usually not the case during indexing.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
namespace Feature.Search.Solr
{
    public abstract class PageContentComputed : AbstractComputedIndexField
    {
        private static string _rootPath;

        private static bool IsIndexingAllowed(Item item)
        {
            if (_rootPath == null)
            {
                _rootPath = Sitecore.Context.Site.StartPath;
            }

            return item.Paths.FullPath.StartsWith(_rootPath, StringComparison.OrdinalIgnoreCase);
        }

        public override object ComputeFieldValue(IIndexable indexable)
        {
            Item item = indexable as SitecoreIndexableItem;

            using (new SiteContextSwitcher(SiteContext.GetSite("website")))
            {
                if (!IsIndexingAllowed(item))
                {
                    return null;
                }

                var sitecore = new SitecoreService(item.Database)
                {
                    CacheEnabled = false

                };
                var sb = GetTextForField(item, sitecore);
                var output = sb?.ToString().Trim();
                if (string.IsNullOrEmpty(output)) return null;

                // Replace &aring; with å etc.
                output = WebUtility.HtmlDecode(output);

                // Remove html tags
                output = output.Replace("</p>", "</p>\n").Replace("<br", "\n<br");
                output = StringUtil.RemoveTags(output);

                return output;
            }
        }

        protected virtual StringBuilder GetTextForField(Item item, SitecoreService sitecore)
        {
            var renderings = item.GetRenderings();
            if (!renderings.Any())
            {
                return null;
            }

            var sb = new StringBuilder();
            foreach (var rendering in renderings)
            {
                var model = GetModel(sitecore, rendering, item);
                if (model == null) continue;
                var txt = GetTextFromModel(model, sitecore);
                sb.Append(txt);
                sb.AppendLine();
            }

            return sb;
        }

        private ISearchPageContentComponent GetModel(ISitecoreService sitecore, RenderingReference rendering, Item item)
        {
            var dataSource = rendering.Settings.DataSource;
            if (string.IsNullOrEmpty(dataSource))
            {
                return null;
            }

            if (Guid.TryParse(dataSource, out var id))
            {
                return GetInferredItem(sitecore, new GetItemByIdOptions(id)
                {
                    Language = item.Language,
                    VersionCount = true,
                });
            }

            if (!dataSource.ToLower().Contains("/sitecore"))
            {
                dataSource = dataSource.Replace("query:", "").Replace(".", "");
                dataSource = item.Paths.FullPath + dataSource;
            }

            return GetInferredItem(sitecore, new GetItemByPathOptions(dataSource)
            {
                Language = item.Language,
                VersionCount = true
            });
        }

        protected ISearchPageContentComponent GetInferredItem(ISitecoreService sitecore, GetItemOptions options)
        {
            options.InferType = true;
            options.Lazy = LazyLoading.OnlyReferenced;
            try
            {
                var itm = sitecore.GetItem<SitecoreItemBase>(options);
                var result = itm as ISearchPageContentComponent;
                return result;
            }
            catch (NullReferenceException)
            {
                // Glass gives null reference exception when no registered type is found for the template
                return null;
            }
        }

        protected override StringBuilder GetTextFromModel(ISearchPageContentComponent model, ISitecoreService sitecore)
        {
            return model.ToSearchPageContentText(sitecore);
        }
    }
}

The actual implementation have a few more checks and we have an implementation for both page content and secondary related content that can be prioritized/boosted differently when searching.

The computed field is registered with a Sitecore Patch

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:search="http://www.sitecore.net/xmlconfig/search/">
    <sitecore role:require="Standalone or ContentManagement or ContentDelivery" search:require="solr">
        <contentSearch>
            <indexConfigurations>
                <defaultSolrIndexConfiguration>
                    <documentOptions>
                        <fields hint="raw:AddComputedIndexField">
                            <field fieldName="pagecontent"    indexType="tokenized" storageType="no" returnType="text">Feature.Search.Solr.PageContentComputed, Feature</field>
                        </fields>
                    </documentOptions>
                </defaultSolrIndexConfiguration>
            </indexConfigurations>
        </contentSearch>
    </sitecore>
</configuration>

The actual search request

Now we can search the content, eg.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
using (var context = ContentSearchManager.GetIndex(IndexName).CreateSearchContext())
{
    var query = CreateQuery<CustomSearchResultItem>(context, criteria, language);
    var results = query.Page(criteria.Page - 1, criteria.PageSize).GetResults();

    return Map(results);
}

private static IQueryable<T> CreateQuery<T>(IProviderSearchContext context, SearchCriteriaModel criteria, Language language) where T : CustomSearchResultItem
{
    return context.GetQueryable<T>(new CultureExecutionContext(language.CultureInfo))
        .Filter(PredicateBuilder.Create(LanguagePredicate.Create<T>(language))
        .And(x => x.ExcludeFromSearch != true)
        .Where(PhrasePredicate.Create<T>(criteria.Phrase))
}

public class PhrasePredicate
{
    public static Expression<Func<T, bool>> Create<T>(string searchPhrase) where T : IPhraseFields
    {
        if (string.IsNullOrWhiteSpace(searchPhrase))
        {
            return PredicateBuilder.True<T>();
        }

        var predicate = PredicateBuilder.False<T>();
        foreach (var phrase in searchPhrase.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))
        {

            var p = phrase.ToLower();

            predicate = predicate
                .Or(_ => _.Headline == p.Boost(1000))
                .Or(_ => _.Headline.Contains(p).Boost(500))
                .Or(_ => _.PageContent == p.Boost(60))
                .Or(_ => _.PageContent.Contains(p)).Boost(50);
        }
    }
}