Monday, March 26, 2012

How can I programmatically tell if agents are running?

Ideally I am looking for a procedure that I can call in T-SQL to verify that the Log Reader, Snapshot, and Sql Agents are all up and running on a given server. I found the undocumented sp_MSrepl_distributionagentstatussummary proc and got it to work for the distribution agent. I also found sp_MSrepl_agentstatusummary, but I've yet to get this to work. Overall, I'd really like to have a better way to do this.

This absolutely must be something callable from the sqlcmd interface. It can not be through a UI nor anthing more complicated than standard T-SQL.

Any suggestions?Query the status column in the history tables in the distribution database. there should be MSsnapshot_history, MSlogreader_history, MSdistribution_history.|||

Thanks for your response Greg.

I've spent some time looking at these tables and can't quite figure how to determine if one of the agents are down or not. I've induced some failures, took some of the agents offline, and generally tried to muck up the environment but couldn't find a reliable way to indicate if any of the darned things were actually running. I suppose I can look for the max timestamp of a 'started' process that doesn't have a 'error' or 'completed' record whose timestamp is more recent, but that doesn't feel to be a reliable method.

|||

Books Online explains each column for each table, as well as the various values for column runstatus.

For example, MSdistribution_history (this is cut/paste from books online):

runstatus

int

The Running status:

1 = Start.

2 = Succeed.

3 = In progress.

4 = Idle.

5 = Retry.

6 = Fail.

|||

Yup, this is a great example of what I was saying about the table not being a reliable indicator of the actual status of the agent. When I manually stopped the Sql Agent, I noticed rows in MSdistribution_history that looked like the following:

agent_id runstatus start_time time duration comments

5 2 <time> <time> 88854 The process was successfully stopped.

It seems that the runstatus 'Success' really means stopped in this case. It was immediately followed by a 'Starting agent' row when I turned it back on again. This led me to wonder if there are any other conditions I'd have to look for in order to see if things are enabled.

Maybe that's the confusion here... I don't care if a particular job is running, just that the agent is started / enabled / turned on / in a state where it can be used, etc. I know I can start looking for the most recent status log that indicates the agent is operable, but I've been in this business too long not to be lazy and just look for a darned enabled / disabled flag.

|||

Replication agents can be invoked numerous ways - from command line, ActiveX, or SQL jobs. The history tables will give you the status of the agent at its current state. For example, if your agent is set to run in continuous mode, and status = 3, then all is good. If status = 6, then your agent has failed. If your agent is scheduled to run at regular intervals, and you query it one hour later, and status = 2, that means the agent job ran successful. IF it's 6, then it failed. If it's 3, then the agent is still running.

Now, how the agent is invoked is totally up to you. HOw you determine if it's in a state where it can be used, well, I'm not sure how to define that for you. YOu need to be more specific with what you're trying to accomplish. Otherwise the replication history tables will give you the status of the current/last agent run.

No comments:

Post a Comment