Thursday, June 16, 2016

Sitecore Index dependencies

I recently stumbled upon a question on how to trigger re-indexing of related content in a Sitecore (Lucene) index. Different answers were given and I got the feeling that not everyone already knows about the getDependencies pipeline. So we write a blog post...

Re-index related content

As I mentioned, there are other solutions that could do the trick. 
  • Custom update strategy

    You could write your own update strategy and include your dependency logic in there. This approach has the benefit that you can use it in one index only without affecting others.
  • Custom save handler

    With a custom save handler you could detect save actions, get the dependent items and register them as well for index updating. I'm not convinced that this will work in all update strategy scenario's but if you have working code, feel free to share ;)
These are probably also valid solutions, but I'll leave those to others as I want to show the Sitecore pipeline that looks like the ideal candidate for the job.

getDependencies pipeline

There is a pipeline.. there always is. One drawback I'll mention already is that the pipeline is for all indexes and so far I have not found a way to trigger it for one index only (see update below on disabling). I also tried to get the index (name or anything) in the code but that didn't work out either. We could get the name of the job, but that was only relevant for the first batch of items - after that, multiple jobs were started and the name became meaningless. 

Anyway, the pipeline. In the Sitecore.ContentSearch.config you'll find this:
<!-- INDEXING GET DEPENDENCIES
  This pipeline fetches dependant items when one item is being index. Useful for fetching related or connected items that also
  need to be updated in the indexes.
  Arguments: (IQueryable) Open session to the search index, (Item) The item being indexed.
  Examples: Update clone references.
  Update the data sources that are used in the presentation components for the item being indexed.
-->

<indexing.getDependencies help="Processors should derive from Sitecore.ContentSearch.Pipelines.GetDependencies.BaseProcessor">
  <!-- When indexing an item, make sure its clones get re-indexed as well -->
  <!--<processor type="Sitecore.ContentSearch.Pipelines.GetDependencies.GetCloningDependencies, Sitecore.ContentSearch"/>-->
  <!-- When indexing an item, make sure its datasources that are used in the presentation details gets re-indexed as well -->
  <!--<processor type="Sitecore.ContentSearch.Pipelines.GetDependencies.GetDatasourceDependencies, Sitecore.ContentSearch"/>-->
</indexing.getDependencies>

As you can see, some processors are in the box, but in comments. You can simply enable them if you want your clones and/or datasources to be indexed with the main items.

And you can write your own processor of course. An example:
public class GetPageDependencies : Sitecore.ContentSearch.Pipelines.GetDependencies.BaseProcessor
{
    public override void Process(GetDependenciesArgs context)
    {
        Assert.IsNotNull(context.IndexedItem, "indexed item");
        Assert.IsNotNull(context.Dependencies, "dependencies");
            
        var scIndexable = context.IndexedItem as SitecoreIndexableItem;
        if (scIndexable == null) return;
            
        var item = scIndexable.Item;
        if (item == null) return;
            
        // optimization to reduce indexing time by skipping this logic for items not in the Web database
        if (!string.Equals(item.Database.Name, "web", StringComparison.OrdinalIgnoreCase)) return;
            
        if (!item.Paths.IsContentItem) return;
            
        if (item.Name.Equals("__Standard Values", StringComparison.OrdinalIgnoreCase)) return;
            
        if (Sitecore.Context.Job == null) return;
            
        // logic here - example = get first child
        if (!item.HasChildren) return;
            
        var dependency = item.Children[0];
        var id = (SitecoreItemUniqueId)dependency.Uri;
        if (!context.Dependencies.Contains(id))
        {
            context.Dependencies.Add(id);
        }
    }
}

In the example here we keep it simple and just add the first child (if any). That logic can contain anything though.

As you can see we try to get out of the processor as fast as possible. You can add even more checks based on template and so on. Getting out fast if you don't want the dependencies is important!

The benefit of the solution is that the pipeline is executed when the indexing starts but before the list of items to index is finalized - which is the best moment for this task. All "extra" items are added to the original list so they are executed (indexed) by the same job and we let the Sitecore handle them they way it was meant.

Performance might not seem an issue, but when having quite some items and dependencies, and these get updated frequently it will be. You might be triggering way too much items towards the index, so be careful (no matter what solution you go for). The indexing is be a background job but if it goes berserk you will notice.
Note that it is a good thing that your dependencies don't have to go through all kind of processes before being added, they are just "added to the list".

I found this pipeline solution very useful in scenario's where the amount of dependent items that actually got added was not too big. Don't forget you can also disable the pipeline processor temporarily (and perform a rebuild) if needed.

How to Enable/Disable 

(from the Sitecore Search and Indexing on SDN) - thx jammykam for the info

The pipeline is executed from within each crawler if the crawler’s ProcessDependencies property is set to true, which is the default. To disable this feature, add the following parameter to the appropriate index under the <Configuration /> section.
<index id="content" ...>
 ...
 <Configuration type="...">
...
 <ProcessDependencies>false</ProcessDependencies>
Alternatively, if the indexes don’t override default configuration with a local one, you can also globally change this setting in the DefaultIndexConfiguration.

Known issues with the indexing.getDependencies pipeline

https://kb.sitecore.net/articles/116076

Thursday, June 2, 2016

Sitecore WebApi missing context

Sitecore & WebApi


A lot has already been written about Sitecore and WebApi the last years since your custom WebApi calls didn't work anymore without a little tweaking. We have used the solution by Patrick Delancy a few times now and it worked fine. Until today..
Well, the issue seemed to be in the WebApi but it turned out to be something else. My journey of the day:

WebApi call is missing all Sitecore context

Our starting point was indeed a web api request that had no Sitecore context. The request looked like this: 
"http://website/api/stores/nearby/50.860421/4.422365"

First thing to do was compare configs and code with other projects where it was working, but that didn't help. All was fine there.. 
When I tried to place a language in between the domain and the path (../en/api/...) I got a 404 error from IIS. Weird. Nothing from Sitecore, although this should work. So I had my first conclusion: Sitecore is rejecting the request. 

Inspect the httpRequestBegin pipeline

I started inspecting the httpRequestBegin pipeline and noticed that it was skipped in the first step, the CheckIgnoreFlag. A custom processor placed before this step got hit, the one right behind it didn't. So I had to continue my search in.. 

The preprocessRequest pipeline

The preprocessRequest performs several checks in order to determine whether the request is valid for Sitecore. After staring at it for a while my eye fell on the dot. A simple stupid ".". The web api action was expecting 2 doubles as parameters and as this all should work find, there is a filter in this Sitecore pipeline on extensions: the FilterUrlExtensions. And of course, Sitecore is thinking that our extension is 422365 :)

The fix

Fixing this seemed very simple: just a a trailing slash to the request. And there we had our context again!

So remember when using doubles in web api request: use a trailing slash if it is your last parameter...