Choosing a SQL Server High Availability/Disaster Recover (HA/DR) strategy strongly depends on what the management of your company wants. Therefore you have to sit down with them and determine the Recovery Point Objective and the Recovery Time Objective (RPO/RTO). Once that’s done, you can start planning your HA/DR strategy. Here’s how I decided on our HA/DR strategy.
HA choice based on RPO/RTO
In my first post in this series, I explained how my manager, CTO and I determined the RPO/RTO objectives for our databases. Based on this, there are a few options to choose from. Let’s start with the RTO objective for our HA solution. We chose a RTO objective of 1 minute, meaning that if our primary system goes down, our secondary system in the same location has to be active within one minute. HA technologies which accomplish that are Clustering, AlwaysOn Availability Groups (AGs) and synchronous Database Mirroring (DBM).
DBM is deprecated and has the disadvantage of only synchronizing databases. You have to take care of synchronizing logins and jobs on both instances yourself. You also have to set up DBM for each and every database which, when you have a lot of databases, can be a daunting task and possibly an administrative hell. So DBM is not an option.
AlwaysOn AGs is like DBM, except for groups of databases. You don’t have the disadvantage of having to set up each and every database like with DBM, but you still have to maintain logins and jobs on both instances, i.e. keep them synchronized.
Both DBM and AlwaysOn AGs double your storage needs, because the databases exist on both servers. This is not the case with Clustering, where the database files live on shared storage. Because of this and because of the advantages of not having to maintain logins and jobs on two servers, I chose Clustering as our HA solution. Clustering also happens to be able to meet the RPO objective of 1minute, so with Clustering our RPO/RTO objective is covered.
DR choice based on RPO/RTO
Because we don’t have SAN and/or VM replication implemented, the choices I had for synchronizing our databases in our DR site to meet the RPO and RTO of 1 hour were Log Shipping, asynchronous AlwaysOn AGs and asynchronous DBM. Because of reasons mentioned above, DBM fell off. Also because I need at least 1 extra secondary for reporting purposes and with DBM only one secondary is possible. With Log Shipping you can have more than one subscriber, but every time a transaction log is restored, users will be interrupted. With AlwaysOn AGs, a secondary database will be readable all the time in near real time. With Log Shipping, transaction log backups will be sent over a line to our DR site at regular intervals, which can cause big delays and hence could compromise the RPO/RTO goals. So I chose AlwaysOn AGs as our DR solution.
Overview of our desired HA/DR solution
Combining Clustering with AlwaysOn AGs, our infrastructure on a high level is going to look like this:
To meet our RTO/RPO goals for our future database instances, I decided to choose Clustering as our High Availability technology and for our Disaster Recovery strategy I decided to implement AlwaysOn Availability Groups. I mentioned I also want to use Availability Groups for our reporting solution. How I planned for that, will be part of the next post.