Home > Exchange 2010 > Exchange 2010 Design Principles for High Availability and Site Resiliency

Exchange 2010 Design Principles for High Availability and Site Resiliency

A lot of my customers read about the new Exchange 2010 HA and SR capabilities, and have an idea in mind of what their Exchange 2010 Environment will look like . In most cases, what they have in mind is technically possible, but not realistic. Let’s look at a few examples.

When we talk about Exchange 2010 High Availability, we are talking about doubling the exchange roles in a single site.

We can do HA two ways, one with Windows NLB if we split the HUB/CAS role to a separate server, or use a Hardware Load Balancer using only two servers.

Here is what my customers would like to do.

Why Doesn’t this work?

1.This is Site Resiliency, not High Availability. Theoretically, the database would failover automatically, assuming the server in the secondary site can still contact the File Share Witness. This type of site  failover can cause clients to have to re authenticate.

3. Windows NLB is not supported across sites. It is not recommended to use an HLB to load balance across sites. No Outlook connection High Availability would be possible.

2. If the Exchange Server in the primary site fails, all external DNS entries would have to manually be pointed to the new site.

4. If the primary site fails, the secondary MB server will have to be manually ejected from the DAG to bring the databases back online.

5. This design is very dependent on the up-time of the WAN link. During a failover scenario, when the active databases are mounted in the secondary site, a WAN hiccup will cause the databases to failback without warning. If the databases are mounted in the secondary site because the mailbox server in the primary site is unavailable, and the WAN link hiccups, the databases will simply dismount.

Site Resliency

This is a site resilient design. It is very similar to our bad HA design above, except we do not attempt to load balance the CAS roles, and do not expect automatic failover. There are some considerations with this design however.

1. A two server, two site design posits that the organization is more concerned about a site loss than a single server outage. Organizations are typically at more risk of having a hardware or software failure than an entire site loss. In most cases, it makes more sense to put the second server in the primary site for high availability, instead of a second server for disaster recovery.

High Availability With Site Resiliency

If you have a requirement for High Availability and Site Resiliency, this is where you would start.

This is what my customers would like to do.

Why doesn’t this work?

For many of the same reasons this is not High Availability.

1. The Active Databases in the secondary site will fail over whenever the WAN link hiccups. If the requirement is to have the Active Database of the user in the site they are closest to, this failover behavior will not work in your favor.

2. Again, Windows NLB is not supported across a WAN. It is not recommended to use an HLB across a WAN. In this scenario, it is impossible to have the Outlook connection to the CAS servers highly available.

3. External client connections have to manually be failed over in the case of a single server failure, and no high availability can be provided.

What High Availability and Site Resiliency with Active users in each site really looks like

In this design, we have two separate DAGs providing HA for each site, and SR for each DAG. In this design, each site has a highly available copy of the database, as well as a site resilient copy. This is typically overkill. In most cases a single site with High Availability and a Secondary Site for the Site Resiliency is more economical than the above.

Some Organizations have requirements to have Active copies of the mailbox database for sites on the other side of the world, or over an unreliable WAN link. In these cases, you have to ask yourself if it makes sense to have the remote site as part of the Database Availability Group, considering everything above. Does it make more sense to have a single server at the remote site, or as part of it’s own separate DAG for HA, but not SR.

I hope that brings to light some common misconceptions about designing a Highly Available and Site Resilient Exchange 2010 Environment.

About these ads
Categories: Exchange 2010
  1. vinod
    September 30, 2011 at 12:46 pm | #1

    Dear Friend,

    It is really good article, which helps me to clear many things related with DR site implementation.

    Keep it up.
    Great Work Dude!!

  2. October 22, 2011 at 5:35 am | #2

    Nice article

  3. Ruwantha
    February 15, 2012 at 11:25 pm | #3

    This is perfect. I have been looking for such a document. All my doubts are cleared. Thanks a lot man…

  4. Apoorv Mehrotra
    June 4, 2012 at 3:08 am | #4

    Hi All,

    My current setup is:

    Exchange 2007 CCR (Shared Storage). Approx 2000 Mailbox and the DB size is 1.63 TB (SG – I) and 645 GB (SG – II)

    HUB + CAS -> (NLB x 2 Physical Servers) Clustered (1 Virtual Host) (Sun Blade x6250, 2* Intel x5270 dual core, 16 GB RAM, 300 GB HDD)

    Mailbox x2 physical servers -> Failover Cluster (1 Virtual Host) using Sun shared storage (2* Six core AMD 8345, 16 GB RAM, 300 GB Internal HDD, mapped shared storage)

    We are planning to migrate to Exchange 2010 with complete new set of H/W & storage.

    I wanted to have suggestions of what design can we propose to client. The client needs high availability (most critical) in all the scenarios and is ready to buy 4 to 8 servers. (No DR Site requirement for now)
    Thanks in advance

    • June 5, 2012 at 12:12 pm | #5

      Apoorv,
      There are many discussions and decisions that need to be made in order to propose the correct design for a client. My design workshops with customers can take up to three full days of discussion. I cannot in good conscience recommend anything without knowing the customers business requirements. Even for your simple question about high availability. What is their up-time requirement? Will they require automatic failover? Does the failover solution have to be redundant as well? Is best effort failover good enough, with windows NLB? Is some down time OK, where DNS failover is ok?
      The answer to these questions all will affect the design and numbers of servers required. The HA discussion compose about 10% of the design. There are a slew of other topics to discuss, like retention, security, storage, web publishing etc etc. I recommend that you work with a consulting company to assist in a full design.

  5. June 12, 2012 at 3:44 pm | #6

    Hi Scott,

    great writeup. I am actually designing the “site resiliency only/no HA” part as we are covering HA with VMware’s HA feature. I understand that in case of failure at the main site, the databases will go up on the DR site (since the Witness folder is still visible). Hub Transport is taken care of automatically, but the CAS is bugging me a little. Can I simply create a DNS alias (cas.companyname.com for example) and change the URLs manually for all the services (OWA, ECP, etc…)? We are migrating from 2007 to 2010 and changing the design at the same time. Will I have to revisit all the workstations to point them to the new DNS alias I create? Thanks for your time.

  6. subnet192
    June 12, 2012 at 3:45 pm | #7

    Hi Scott,

    great writeup. I am actually designing the ”site resiliency only/no HA” part as we are covering HA with VMware’s HA feature. I understand that in case of failure at the main site, the databases will go up on the DR site (since the Witness folder is still visible). Hub Transport is taken care of automatically, but the CAS is bugging me a little. Can I simply create a DNS alias (cas.companyname.com for example) and change the URLs manually for all the services (OWA, ECP, etc…)? We are migrating from 2007 to 2010 and changing the design at the same time. Will I have to revisit all the workstations to point them to the new DNS alias I create? Thanks for your time.

    • June 13, 2012 at 7:46 am | #8

      For internal Outlook clients..
      The easiest thing would be to create a CAS Array for both sites (even if you just make it a CNAME for the local CAS Server).
      Then, if you need to failover, change the CNAME record to point to the CAS array/CAS server in the second site. So in the instance of a failover, all the clients will still be connected to “CASArray1.domain.com”, but now that resolves to an IP of the DR CAS server. Once the DNS changes replicate clients will be able to reconnect. They may have to close and re-open Outlook.
      the cmdlets to use are New-ClientAccessArray -Name “CASArraySite1″ -fqdn “CasArray1.domain.com” -site “ADSiteName”
      Then..
      Set-MailboxDatabase “DBName” -RpcClientAccessServer “CasArray1.domain.com”

      • subnet192
        June 13, 2012 at 7:50 am | #9

        So I simply create an array with a single server in it? I assumed CAS arrays implied NLB involvement. Thanks for the response!

      • June 13, 2012 at 8:49 am | #10

        Correct, create an array with a single CAS server. The presence of NLB does not affect the function of the CASarray object. The CASarray is simply an endpoint that Outlook clients can connect to.

  7. subnet192
    June 13, 2012 at 9:10 am | #11

    Thanks for the help, it’s my first Exchange design so I have a lot of stuff to cover!

  1. December 17, 2012 at 2:51 pm | #1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: