This is another insightful blog by Wictor Wilen. Wictor was a
speaker at the European SharePoint Conference
2011.
It is always fun to get back on site after a couple of days off
work. SharePoint
2010 is like an annoying little critter, if you're not there to
cuddle with it it will do the most strange things.
I currently have a support case open regarding some issues with
crawled properties (I hope that will be another story to tell
another day) and went into the Search Service Application admin
pages in Central Admin to check some things. When poking around I
noticed that the incremental crawl hasn't been run for a few days -
actually it stopped working on the 31st of December last year
(sounds like ages ago now :-). In this farm we have three Search
Service Applications and only this one hadn't been incrementally
crawled - the other two worked just fine. I fired up an incremental
crawl manually and that worked, waited for the next incremental
crawl to start - and it didn't. Also tried a full crawl manually -
which worked fine, but the scheduled crawls never started.

I searched through the trace logs and event logs and found
nothing that I didn't expect to see.
I tried to Synchronize() the Search Service Instances - which
I've previously done when similar stuff happened by running some
PowerShell snippets on the servers in the farm. Nothing
happened!
Ok, what is actually triggering the incremental crawls then? It
is triggered by timer jobs called Indexing Schedule Manager on XXX,
and there is one job per server. In this case we have two servers
running search and I took a look at them and noticed that one of
these jobs hadn't been run since the last scheduled crawl - while
the other one had been running all the time (every 5 minutes).

I tried starting the timer job both
through the Central Admin web UI and through PowerShell -but it
just refused to start. Now this is strange! Now I took a look at
the Timer Job History on this specific server and noticed that no
timer jobs had been executed on that specific machine since that
time when the scheduled crawl last was executed. Thankfully the
trace logs were there and I could see that just after 2:00 AM on
the 31st of December the Timer Service stopped working (no
references to owstimer.exe at all since then). There's no warning
or error or nothing that indicates that it would have stopped
working (and the sptimerv4 service was still running on the
machine). The only thing in there was a warning about a Forefront
conflict. Nevertheless I needed to get it working and currently
don't have time to spend on why this happened (if it happens again,
then let's dig deeper).

I tried to do a restart-service
sptimerv4 on the failing machine, without luck, it couldn't be
gracefully shut down. So I actually had to kill the owstimer.exe
process and then start the timer service.

Once the timer service was killed and started everything spun up
and the incremental crawled kicked in after a couple of minutes and
have been working perfect since then.
Interesting thing here was that I did not see any early warning
signs, errors in the logs or similar, and the only way we found out
that the Timer process had gone into a stale state was that the
scheduled crawl for ONE of the Search Service Applications was not
running.
Stay tuned for more content from our
speakers by joining our
community or by following us on twitter or facebook!