Dynamics CRM Install\Import “The SQL Server {sqlserver_name} is unavailable”

My apologies for the significant gap between my last post on our CRM 2016 Upgrade Adventure and this one. My time has been consumed with preparing for go-live, but I’ve been keeping track of the roadblocks and caveats we encounter as I go so that I can post about them at later dates.

One of the challenges with this deployment that we did not encounter with the last is a significant increase in firewall and context configuration. LU has, to their credit, made great efforts over the last several years to ensure our network is as secure as possible. With an increase in security, however, comes an increase in complexity.

While setting up CRM you might expect to open a port for the SQL instance (ie 1433). It might also occur to you that UDP 1434 should be opened for the SQL Browser Service. Now your app server has a clear line open to the SQL instance. Everything should be ready, so you go to create or import your organization only to encounter “The SQL Server {sqlserver_name} is unavailable”.

You might also encounter a message about not being able to verify that the SQL Agent is running. Being a thorough Sys Admin\DBA you check the SQL services for these and confirm both are up. You also use telnet or another utility to confirm that the ports are indeed open, so what on earth could CRM need in order to reach SQL?

TCP 445… that’s right. Because of the unique setup of CRM it requires TCP 445 to do any kind of setup. What is TCP 445 you ask? “Active Directory service required for Active Directory access and authentication.” (https://technet.microsoft.com/en-us/library/hh699823.aspx). Why an app server would need an AD authentication port opened to the SQL server is anybody’s guess, but it cleared our issue right up. All system checks passed and it happily imported our database.

It should be noted, if you’re using an Availability Group setup then this port will need to be opened to the other servers in the AG as well. I have had the most success when opening it to the AG listener name as well as all nodes.

Bonus Round

If none of this helps you, here are some other things I’ve found are necessary to appease the install\import wizards.

  • Make sure you’re in the local Administrators group on the app servers as well as every node in the SQL cluster or AG (added explicitly, not through a group) .
  • Make sure your account has the sysadmin role on the SQL instance.
  • Specify the SQL server name using the backslash notation, even if the AG name doesn’t contain it. For instance, if your AG is normally accessed as SQLAGINSTANCE,50000 you would use SQLAGINSTANCE\SQLAGINSTANCE,50000 in the wizard. It seems to be hard-coded to only accept it in that manner.

Dynamics CRM Import Fails on “Upgrade Indexes”

As I mentioned in the last post, I’m taking you through our adventure in upgrading the existing on-premise Dynamics CRM 2011 environment to 2016 (and eventually 2016 cloud). Previously I discussed the first show-stopper error we received, “Must declare the scalar variable “@table”.” Following that resolution the import continued past the stage “Metadata xml upgrade: pass 1” but then failed at “Upgrade Indexes”.

Through the use of trace logs obtained by using the CRM diagnostic tool, we discovered that the import wizard was marking a number of indexes to be dropped and then recreated. However, as observed through a SQL Profiler deadlock trace, it was trying to drop and add indexes on the same table at the same time. As I mentioned in my previous post, our database is in excess of 1.5TB. One of the largest tables is ActivityPointerBase, and it’s also one on which many index operations were being executed by the import wizard. The result is that some of the index operations would be chosen as the deadlock victim, causing the import wizard to throw an error and exit. Also, if you restarted the import it would process the entire list again, not taking into account any that it had dropped and recreated already.

My coworker, and local wizard, Bret Unbehagen used the trace logs to determine which tables the import wizard was using to store its index operation information. He then created the query below to produce a list of indexes that it was trying to recreate as well as generate a drop statement for each of those.

So, the basic workflow is 1) let the import wizard proceed until it fails at “Upgrade Indexes”, 2) run the script above against your organization database to list the indexes that it wants to rebuild, 3) use the generated drop statements to preemptively drop those indexes, 4) restart the import so that it continues were it left off (feature of 2013+).

In our experience, this allowed the import wizard to continue through the “Upgrade Indexes” section without deadlocking and proceed with the import. Hopefully it can help you achieve success as well. If you have any questions please feel free to comment. Also, if you’d like to see more from Bret his information is listed below.

Bret Unbehagen (Twitter: @dbaunbe; Web: dba.unbe.org)

Dynamics CRM Import Error “Must declare the scalar variable “@table”

This post is the first in a new series I’m going to call “CRM 2016 Upgrade Adventure”. Summary: my organization has taken on the ambitious challenge of not only upgrading our existing Dynamics CRM 2011 environment to the 2016 version but  of moving it to the cloud service as well. Aside from getting the vanilla components through three versions (2013, 2015, 2016) there are all of the custom integrations that have been tied into CRM over the years that must come along too. That is why it is neither a joke nor hyperbole when I label this as an adventure. We are only in the initial months of this effort and I promise you that plenty of adventure has already been had.

Our first headache… I mean, adventure… was encountered while importing the 2011 database to 2013.  (In order to get to 2016 you have to “hop” your database through 2013, 2015, and 2016 by importing it to each version.) Initially we encountered some messages about incompatible components from the CRM 4.0 days, which our database started out in. That was no surprise. The developers quickly updated our current system and we went to import again assuming the path was clear. What I encountered instead was almost instant failure. Within 20 minutes the import had failed. Knowing this was a process that should take several hours (our database exceeds 1.5TB), I took a look at the logs to see what the issue was. During the stage “Metadata xml upgrade: pass 1” the following was listed:

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> System.Data.SqlClient.SqlException: Must declare the scalar variable “@table”.

I’m sure you can appreciate my confusion since this is a closed-source application written by Microsoft in which I have no option to declare variables. Googling only returned articles about people writing their own applications. Feeling we had no other recourse, we opened up a support ticket with Microsoft. That in itself was quite an adventure that spanned two weeks, but I’ll give you the short version. (Props to my coworker John Dalton for his endless hours on the phone with Microsoft through nights and weekends.) In the end the culprit was a trigger that was on our database, but not just any trigger. This one is named “tr_MScdc_ddl_event” and is created by Microsoft when you enable CDC on a database. After scripting out that trigger to make sure we could recreate it later, then dropping it, the import continued past “Metadata xml upgrade: pass 1” successfully.

TLDR version

Microsoft’s database level CDC trigger tr_MScdc_ddl_event interferes with the Dynamics CRM import operation. Drop that trigger before the import it then add it back once it’s finished and you shouldn’t have issues with this error.

So that’s the end of the adventure right? Everything worked fine after that? Not even close! Stay tuned…

Using Powershell to Refresh a Test SQL Server Instance from Production

A project I’ve been wanting to work on for a long time is how to automate restores of our Production databases to the Test instance. There are a number of challenges associated with this. First, the restore has to be able to find which backup it needs to use. Secondly, many of our databases do not have their files configured in the same way (for example one may have a single .mdf and log whereas another may have multiple .ndf files). Third, restoring from a different instance causes the SQL Authentication users to be orphaned and therefore connections to the databases to fail. And this is just at the high level. There are many smaller complications along each of those roads. Because of these our restore model has typically been per-request, manually executed, and filled with many small and tedious steps. I wanted to replace it with one process that would restore all of the production databases from the proper backup, configure any “with MOVE” options dynamically, and fix any orphaned users. A secondary goal was also to make it as portable as possible so that I could easily reconfigure it for our other PROD/Test/Dev instances.

The result of this labor of love is copied below. You’ll notice that at the top you can configure the Source and Destination SQL instances to be used, as well as the Data\Log paths for both. This is the section that allows me to reconfigure it for other instances. You can make these parameters that are passed in if you want to call it manually. For my purposes it is a scheduled process that always runs against  specific instance. Following that section is an exclusion list, built to avoid system databases as well as any others you want to skip (use cautiously to avoid angering coworkers in the Development or Testing departments).

The only function is one called “FixOrphans”. If you’ve guessed that’s where the orphaned users are repaired, then you’re today’s winner! This works by pulling a list of available logins from the destination, creating a list of orphans, and using the Source to find which login users should be matched to.  It will also match “dbo” to a different login if that’s the way the Source database was configured. Of course, this breaks down if the logins are not named the same on both instances. This is the case for some of mine as well. In those cases I have a section at the bottom to take care of one-off scenarios by hard coding them. It isn’t ideal, but will have to do until I can change some policies in my organization.

A fun side note about the FixOrphans function. It started as a small section of the script, then quickly grew to become its own function. I became so interested in what could be done with it that I side-tracked for a while and wrote a stand-alone script just for repairing orphans. Being a faithful reader, you will remember it being covered in Fixing Orphaned SQL Users via PowerShell. So, the restore project was actually first and fixing orphans spun out of that. I then took that work and rolled it back into the restore script, so the code will be very similar to that first post.

After that function is declared we move on to the real work. A variable is created to store Source backup information so that we know which backup to use. It then loops through each of the objects stored in that variable. If the database is in the exclusion list it notes that and moves on. Otherwise, it sets a baseline for the restoration query and starts building on that. This is the portion that allows me to account for differing file configurations per database on the instance. For each database it will pull Source information about the files and dynamically build the MOVE statements. At the end it packages them all together and adds any remaining options, such as keep_cdc. After the full statement is built the script will set the database to single user, closing any connections. It will then drop the database and execute the restoration. Dropping isn’t entirely necessary, but our Test environment is often short on space. Dropping first allows me to free up room before starting the restore.

There are two things to note at this section. The first is that, while documentation will tell you that not specifying a timeout value for Invoke-Sqlcmd means that it’s unlimited, that simply isn’t true. Unless you specify one it will die after thirty seconds. Secondly, once the script successfully kicks off that restore command it will truck happily along its way to the next line, whether your restore finishes instantaneously or not (my wager is that it doesn’t). For that reason I built a Do…While loop to monitor for restore commands to finish and then allow the script to proceed. Otherwise it gets awkward trying to set your database to simple when it doesn’t actually exist yet. The commands to set that recovery option and shrink the log file are also both in the interest of saving space.

Once all of that is finished, it’s just a matter of resetting variables so that the next database has a clean run and calling the aforementioned FixOrphans function to give those SQL Auth users a home. After all of the elements in the array have been processed I write an error string that I’ve been compiling to a file and call it a day. Two files will be created. RefreshLog.txt contains all of the operational information, and Errors.txt contains, you guessed it, errors.

As with my other scripts, much of this will be different in your environment than it is in mine. However, I hope that this will be easily adaptable for your use and, if nothing else, that you can find useful elements that can be incorporated into  your own scripts. As always, feel free to send me any questions, and I hope this helps.

P.S. I chose to use Invoke-Sqlcmd so that all versions of SQL, new or old, would be compatible. For newer instances feel free to use SMO.

 

Using PowerShell to Perform Availability Group Failovers

In the past we’ve explored how to use PowerShell to automate failovers on SQL Failover Clusters for Windows Updates and other scheduled maintenance. But what if you’re using Availability Groups instead of the traditional cluster? Fear not, there is still a PowerShell option for that. Usually I would have our night team use SSMS to fail over instances, but recently I transitioned to having them use the PowerShell method below. There are two primary reasons for this. 1) The system administrator does not need to have SQL Server rights in order to carry out the failover and 2) having pre-written commands helps cut down on human error.

For the purposes of this example we have two nodes (AGNODE1 and AGNODE2), each having their own instances (SQLINSTANCE1 and SQLINSTANCE2) that are part of an Availability Group (AGINSTANCE). This will walk through the process of installing Windows Updates on each of those nodes. We will assume that at the outset AGNODE1 is the primary for AGINSTANCE.

  1. Install updates on AGNODE2 and reboot as necessary.
  2. Log into AGNODE2.
  3. Right-click PowerShell and click “Run as Administrator”.
  4. Make AGNODE2 the primary by running the following command: Switch-SqlAvailabilityGroup -Path SQLSERVER:\Sql\AGNODE2\SQLINSTANCE2\AvailabilityGroups\AGINSTANCE
  5. Confirm that the AG instance is now owned by AGNODE2 using this command: Get-ClusterGroup
  6. Install updates on AGNODE1 and reboot as necessary.
  7. Log into AGNODE1.
  8. Right-click PowerShell and click “Run as Administrator”.
  9. Make AGNODE1 the primary by running the following command: Switch-SqlAvailabilityGroup -Path SQLSERVER:\Sql\AGNODE1\SQLINSTANCE1\AvailabilityGroups\AGINSTANCE
  10. Confirm that the AG instance is now owned by AGNODE1 using this command: Get-ClusterGroup

A couple of key points to keep in mind are this: 1) you must run the command from the destination server and 2) all of the confusing syntax is simply specifying the node, instance name, and Availability Group name for that destination server.

I hope this helps, and as always feel free to send me any questions!

Fixing Orphaned SQL Users via PowerShell

In SQL Server, a login is an instance level object that is used for authentication. It is mapped to a database user, which controls permissions at the database level. These objects (login and user) are tied to one another via a SID. If the login is deleted and then recreated, or if you restore your production database to a test environment, those SID’s will no longer match and orphaned users will result.

Information on how to resolve this situation can be found on MSDN. However, if you need to fix more than one user this can be painful. It also does not work for the “dbo” user, requiring you to take additional steps in order to repair both. In the interest of handling refreshes to our development and test instances in a more efficient way I’ve created the script below. It takes two parameters, one for the source instance and another for the destination. It will cycle through the databases on the destination instance and query the source for login information pertaining to any orphaned users. Some informational messages have been built in to help you identify issues, such as a login not existing on your destination instance.

There are a couple of disclaimers. This script assumes that your logins will have the same name on both instances. If your production instance has an account named “sqlsvc” and the test instance equivalent is named “sqlsvc_test”, then it will not sync the user to that login. For the situation I’m working with there is no reliable standard in account names for me to rely on. If your environment is more standardized then please feel free to build in that additional logic.

I hope this will be of help to those out there fatigued by running sp_change_users_login one user at a time. You call the script as “./FixOrphans.ps1 -SourceSQLInstance YourSourceName -DestSQLInstance -YourDestinationName”. If you don’t provide the parameters up front it will prompt you for them.

As always, if you have any questions please let me know.

 

Using PowerShell to Execute SQL Maintenance

It’s an odd truth that laziness leads to better systems administration. That is, so long as it spurs you to automate and thoroughly document a repetitive or tedious task. For instance, I was recently tasked with reducing the excessive size of some system tables in our Microsoft Dynamics CRM environment. To start with, I accomplished this the way you would any one-off task. I RDP’d to each of the app nodes, disabled the service that interacts with the tables we’re performing maintenance on, RDP’d to my utility box, used SSMS to disable each of the SQL Agent jobs that might interfere (on two different nodes), opened the script (provided by Microsoft), tweaked it to my liking, and executed it. The next morning I did all of this in reverse, starting with cancelling the script. For one evening this isn’t really a big deal. However, we soon realized that in order to get the record count down to where we wanted it that several iterations of this maintenance would have to occur over the course of multiple weekends. Reviewing all the steps I’d just performed, my thought was “ain’t nobody got time for that”.

Confronted with performing multiple GUI-based steps during each of these maintenance windows I did what any good/lazy Sys Admin does, I scripted it. Below you’ll find an example of what I used. I run it from PowerShell ISE, executing whichever block is applicable to what I want to do at the moment. This allowed me to go from starting up the maintenance in fifteen minutes to under one minute. (I know, 14 minutes isn’t a big deal. But when you’re tired and it’s late every minute counts.) As I mentioned before, my particular case is CRM maintenance. So basically I disable services on the app nodes, disable SQL Agent Jobs that might interfere (my database is in an Availability Group, so I disable them on both nodes), start the SQL Agent Job containing the Microsoft script referenced above, and then do it all in reverse the next morning at the end of the maintenance window. I included service status checks at the bottom because I’m paranoid and want to confirm the services are actually stopped before starting the SQL script. Also, I did not script the stopping of the job. I always hope (in vain) that the job will have finished, signaling the end of this particular maintenance need. Since both SSMS and the script run from my utilities box I check it in SSMS every morning and simply alt-tab over to ISE after stopping the job to start everything back up.

It’s unlikely that you’ll have the exact situation as me, but hopefully this can give you some ideas for how to incorporate these methods into your own work. In any case I hope this helps, and feel free to contact me with any questions.

 

Scripting Failovers for SQL Server Maintenance

As much as we’d all like to set up our SQL instances and never again let anyone touch them, that’s just not possible in real life. We live in a world where entities, for good or for evil, are constantly exposing vulnerabilities both in the SQL software and Windows itself. For that reason it is the responsibility of good DBA’s to keep their systems updated, and that means failovers.

It is typical in many environments to have a separation of duties, often keeping DBA’s from having control over the Windows Failover Cluster itself. In my particular case I actually have access to the cluster. However, our OCC (Operations Command Center) is tasked with carrying out Windows Updates on a regular basis. Letting them fail instances over means the servers get updated regularly (and also that I have to get up in the middle of the night for one less thing). I needed a way to allow their team members, regardless of experience level, to carry out Windows Updates on our cluster. Enter PowerShell scripting.

The code below is a modified version of what our OCC uses to carry out Windows Updates on our Failover Cluster. (There are options for Availability Groups too, but that’s a post for another day.) In the simplified example below there are six instances (SQL Instance 1-6) balanced across six nodes (sqlnode1-6). To keep things simple, a few assumptions are made in the script:

1. All instances are named with a particular convention. In this case that is “SQL Instance #”.
2. The instance numbers have their permanent homes on nodes of the same number (ie SQL Instance 1 belongs on sqlnode1).
3. The script is run on one of the nodes of the cluster.
4. The script is run in PowerShell ISE section by section, not all at once.

With that in mind I present the code to you below. It is written in a manner such that a Systems Administrator would be reading it as an instruction sheet and carrying out the tasks on your behalf. I am sure there are more elegant and efficient ways to accomplish this task, but this has been successful for us and I hope it can help you in some way. If you have any questions please feel free to comment or contact me on Twitter. Enjoy!

Is Optimizing for AdHoc Queries Worth It?

As you probably know, there is a server configuration option called “optimize for ad hoc workloads”. For more details on that option check out the MSDN article. The real question is, how do I know the extent to which my environment uses AdHoc queries? Well, as usual there is a DMV for that. Use the query below to get an idea of how many plan caches are being stored but only used once. The number changes often, so it’s a good idea to baseline your typical number. For instance, I’ve created a SQL Agent job to record the number to a table every ten minutes.  If you maintain high numbers then it might be a good idea to optimize for ad hoc workloads. As always, however, I would advise testing thoroughly first. Happy hunting.

USE Diagnostics

INSERT INTO Diagnostics.dbo.AdHocTracking
	SELECT COUNT (*) FROM sys.dm_exec_cached_plans
	WHERE objtype = 'Adhoc'
	AND usecounts = '1'

Credit to the Red Gate blog for my initial background information.

Managing CDC Jobs on an Availability Group

So, here’s the deal. You’ve got Change Data Capture going in your environment and want to move your database to an Availability Group on SQL Server 2012. That’s great, lots of advantages there. But what happens to CDC when your database fails over to a different node? Since the capture process runs through SQL Agent jobs you will no longer be getting the CDC information post-failover. This is the scenario I ran into recently. In order to mitigate the problem I’ve created a simple script to handle the enabling and disabling of jobs on both replicas. Of course, the jobs will need to be created on each replica before it will work. But this is an easy way of making sure you can take advantage of the benefits of AG’s while still keeping up your CDC captures. I have this running every five minutes on each node. Feel free to adapt/improve/delete it to your heart’s content. Note that I have hard-coded the replica_id. I know, not the most elegant method. But good enough for a first version.

****Update****

Through painful experience I’ve discovered that the replica id can change if you remove the replica and add it back in. A slightly better method of checking utilizes sys.dm_hadr_availability_group_states. In this case we’re checking against the instance name, which will always remain the same.

IF (select primary_replica from sys.dm_hadr_availability_group_states) = ‘YourReplicaName’

****Update 2****

For SQL Server 2014 and above you are not allowed to even run the USE statement against a database that is on a read-only replica (ie your secondary replica used for reporting). For that reason I’ve altered it to use sp_start_job instead. The updated version below also simplifies other syntax a bit.