Exchange 2010 Design Principles for High Availability and Site Resiliency
A lot of my customers read about the new Exchange 2010 HA and SR capabilities, and have an idea in mind of what their Exchange 2010 Environment will look like . In most cases, what they have in mind is technically possible, but not realistic. Let’s look at a few examples.
When we talk about Exchange 2010 High Availability, we are talking about doubling the exchange roles in a single site.
We can do HA two ways, one with Windows NLB if we split the HUB/CAS role to a separate server, or use a Hardware Load Balancer using only two servers.
Here is what my customers would like to do.
Why Doesn’t this work?
1.This is Site Resiliency, not High Availability. Theoretically, the database would failover automatically, assuming the server in the secondary site can still contact the File Share Witness. This type of site failover can cause clients to have to re authenticate.
3. Windows NLB is not supported across sites. It is not recommended to use an HLB to load balance across sites. No Outlook connection High Availability would be possible.
2. If the Exchange Server in the primary site fails, all external DNS entries would have to manually be pointed to the new site.
4. If the primary site fails, the secondary MB server will have to be manually ejected from the DAG to bring the databases back online.
5. This design is very dependent on the up-time of the WAN link. During a failover scenario, when the active databases are mounted in the secondary site, a WAN hiccup will cause the databases to failback without warning. If the databases are mounted in the secondary site because the mailbox server in the primary site is unavailable, and the WAN link hiccups, the databases will simply dismount.
This is a site resilient design. It is very similar to our bad HA design above, except we do not attempt to load balance the CAS roles, and do not expect automatic failover. There are some considerations with this design however.
1. A two server, two site design posits that the organization is more concerned about a site loss than a single server outage. Organizations are typically at more risk of having a hardware or software failure than an entire site loss. In most cases, it makes more sense to put the second server in the primary site for high availability, instead of a second server for disaster recovery.
High Availability With Site Resiliency
If you have a requirement for High Availability and Site Resiliency, this is where you would start.
This is what my customers would like to do.
Why doesn’t this work?
For many of the same reasons this is not High Availability.
1. The Active Databases in the secondary site will fail over whenever the WAN link hiccups. If the requirement is to have the Active Database of the user in the site they are closest to, this failover behavior will not work in your favor.
2. Again, Windows NLB is not supported across a WAN. It is not recommended to use an HLB across a WAN. In this scenario, it is impossible to have the Outlook connection to the CAS servers highly available.
3. External client connections have to manually be failed over in the case of a single server failure, and no high availability can be provided.
What High Availability and Site Resiliency with Active users in each site really looks like
In this design, we have two separate DAGs providing HA for each site, and SR for each DAG. In this design, each site has a highly available copy of the database, as well as a site resilient copy. This is typically overkill. In most cases a single site with High Availability and a Secondary Site for the Site Resiliency is more economical than the above.
Some Organizations have requirements to have Active copies of the mailbox database for sites on the other side of the world, or over an unreliable WAN link. In these cases, you have to ask yourself if it makes sense to have the remote site as part of the Database Availability Group, considering everything above. Does it make more sense to have a single server at the remote site, or as part of it’s own separate DAG for HA, but not SR.
I hope that brings to light some common misconceptions about designing a Highly Available and Site Resilient Exchange 2010 Environment.