TLDR: If you have the SQL Browser Service disabled then you must also have Shared Memory enabled, otherwise SQL Server can’t communicate with itself.
While setting up a sandbox Always On Availability Group cluster recently I ran into some unexpected behavior. The first sign of problems was that SQL Agent would not start. The service seemed to start just fine but then would immediately crash. In order to troubleshoot that I went to the SQL Server logs, which failed to load. Never in my career has opening the logs been an issue, so my curiosity was piqued. I decided to try running the stored procedure xp_readerrorlog manually to see if that would work. That returned this error:
When Googling for more information, I came across several posts where DBA’s had encountered this error. However, almost every one of them seemed to have hit it for different reasons, and none helped resolve my situation. The most common cause of this error seems to be the one covered by Pinal Dave’s post, concerning version 13 of the ODBC driver. I experimented with that to no avail. This honestly became one of those situations where I became so frustrated that I just had to walk away from it for the day.
The next morning I was idly sipping coffee and just kind of looking over the event logs, and something about how the instance name was listed in the Windows event logs struck me as odd. It wasn’t explicitly specifying the port, and I was using a non-default port with the SQL Browser Service disabled. On a whim I enabled the service and, voila, everything worked. This struck me as odd because it is very common to disable the browser service for security purposes, and I’d never seen it interfere with other SQL Server functionality. I even reached out to #sqlhelp on Twitter (a very useful tip, btw), but no one seemed to have encountered a situation where disabling the browser service caused that type of behavior.
In the end, it was a conversation with my coworker (Todd Libeau) about the issue that sparked the light to resolution. You see, much of the purpose of this sandbox cluster is to see how far it can be locked down. But, as we all know, on that journey you will eventually hit a point where excess security leads to a loss of functionality. So it was in this case. He mentioned that he’d always assumed the Shared Memory protocol was used for that local server communication. Sure enough, I had disabled every protocol other than TCP/IP unless it was proven they were absolutely necessary (turns out it is). I reconfigured the nodes to have Shared Memory enabled and the SQL Browser Service disabled, and all was right with the world.
It makes sense in the end. If the server can’t reach itself by the default Shared Memory locally, it’s going to try and use TCP/IP. But it will not account for the non-default port on its own, instead relying on the SQL Browser Service to handle that. With that service disabled, there’s simply nowhere for it to go.