Integrity Checks on Availability Group Secondaries

Background

Everyone knows that it’s a good idea to regularly run integrity checks on your databases (you are doing that right?). What may be less obvious is that it’s generally recommended with AlwaysOn Availability Groups to run it on the secondary as well as the primary. That’s because, even though it’s the same database being replicated, these are two different copies physically stored in two different locations on separate sets of disks.

Problem

While trying to implement this best practice I ran into a bit of a hiccup. Like most sane people, we use Ola Hallengren’s maintenance solution to automate integrity checks, index maintenance, and more. The jobs had been implemented across all of our instances and seemed to be running fine. No failures were reported. However, our Redgate Monitor started to report databases going more than 7 days without an integrity check (the threshold we’d configured it to watch for). Looking more deeply, it was apparent that the job would begin processing the first AG database alphabetically but never finish it. It would then quit the job reporting success and never move on to check the other databases. This was all evident from the CommandLog table (thank you Ola).

SELECT DatabaseName,Command,CommandType,StartTime,EndTime,ErrorNumber,ErrorMessage FROM CommandLog
WHERE CommandType = 'DBCC_CHECKDB'
ORDER BY StartTime DESC

Resolution

After a fair bit of Googling and familiarizing myself with the DBCC options, the problem and resolution became apparent. One of the options we had chosen to use was to do extended logical checks. Again, this is best practice to make sure not only the physical but also logical structure of the database has no issues. Given that any server could become primary at one point or another, they all needed to be able to do this check. However, that particular option cannot be used when your secondary is set to Read-Intent Only. This left us with two options:

  • Change all of our AG replicas to “Yes” (full readable secondary) instead of “Read-intent only”.
  • Build logic into the job to only run extended logical checks on the primary.

You can read more about the differences in the two modes of readable secondary on Microsoft’s documentation, but we agreed as a team that Read-intent Only made more sense for our business and that changing this option across the board would open up more potential for unintended consequences than was necessary. Instead, I altered the logic of the job to look like this:

-- If this server is Primary, run full check
If EXISTS (SELECT Primary_replica FROM sys.dm_hadr_availability_group_states WHERE Primary_replica = @@servername)
    BEGIN
        EXECUTE [DBA].[dbo].[DatabaseIntegrityCheck]
        @Databases = 'ALL_DATABASES',
        @LogToTable = 'Y',
        @ExtendedLogicalChecks = 'Y'
    END
 
-- If this server is Secondary, run only physical checks
ELSE
    BEGIN
        EXECUTE [DBA].[dbo].[DatabaseIntegrityCheck]
        @Databases = 'ALL_DATABASES',
        @LogToTable = 'Y'
    END

Taking this approach lets the job run a different level of check based on whether the server is currently serving as the primary or the secondary. That way we can make sure the extended logical checks are always happening on the live database but that the physical integrity of the secondary is maintained as well.

Conclusion

Not only is it possible, but it’s advisable to run integrity checks on both the primary and secondaries of your Availability Groups. I hope this helps you achieve that!

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

    Up ↑