Exchange 2013 – Troubleshooting Exchange

When you’re using RTM Server 2012 or RTM Windows 8 to manage Exchange 2013 via the Exchange Administration Center you’ll likely get a pup-up saying “Internet Explorer has stopped working”. Regardless of what option you choose IE will restart & you’ll be stuck in an endless loop of crashes, cursing, & possibly keyboard smashing.

It will typically show its ugly face when managing recipients but you may notice sporadic behavior elsewhere too.

To resolve this you’ll need to install this Microsoft Update for IE10 on Server 2012/Win8. After an install & a reboot you should be fine.

This update was actually released in December but I’m mentioning it now because I find myself building quite a few 2013 labs for self-study as well as some classes I’ll likely be teaching over the coming months. In a production environment with access to a Windows Update source this would probably go unnoticed since Windows would get updated automatically.

However, in a lab environment (with no internet access) where you’re using RTM bits for Server 2012 & Windows 8 it can become quite annoying. So I suggest either making this part of your prerequisite install list before installing 2013 or building your own OS images with it included if you plan on building lab/test environments until there are 2012/8 bits available with this fix already included.

Of course you could always just install another browser but that’s just as much of a pain in a lab as installing this KB.

Referenced KB
http://www.microsoft.com/en-us/download/details.aspx?id=35870

Microsoft Security Bulletin MS12-077 – Critical
http://technet.microsoft.com/en-us/security/bulletin/ms12-077

Exchange 2013 Gotchas
http://theessentialexchange.com/blogs/michael/archive/2013/01/06/exchange-server-2013-gotchas.aspx

Members of the Exchange 2013 Technology Adoption Program (TAP) have known about this issue for a while & the general public had the potential to figure it out once Exchange 2010 SP3 came out last month which allowed co-existence with 2013 in a lab environment; now the Exchange Team has been very clear about it with this recent blog post today. Actually, a Microsoft Support-led session at MEC was when I first heard about it in detail. So what’s the issue? Basically, you have the potential to experience an organization-wide full Offline Address Book download just as a result of installing the first Exchange 2013 server into your existing Exchange environment.

Background:

The Offline Address Book is used by Outlook Cached mode clients to be able to have offline access to Address Lists as well as some Group Metric data when they aren’t connected to the Exchange Server. For a very detailed explanation from Neil Hobson see the following article from him.

Issues can occur when the OAB for an organization grows to a large size, sometimes in the hundreds of MB. Things that contribute to this size are things like number of recipients in AD, number of distribution groups, populated user attributes, & certificate usage (reference ). It’s important to note that GAL Photos are NOT stored in the OAB. The OAB just includes a pointer to AD where the photo is actually stored (reference ). Fortunately, Exchange/Outlook is smart enough to only download the changes to the OAB instead of the entire thing every day. There are still some circumstances where the entire OAB will be downloaded again, which makes it very important to understand the size of your OAB so you know just how much of your networks bandwidth will be used when all Outlook Cached Mode clients perform a full download (reference ).

So as you might imagine, whether or not clients will perform a full OAB download becomes a topic of concern during an Exchange migration.

Offline Address Books in Exchange 2003/2007/2010 are associated with Mailbox Databases, specifically on the Properties>Client Settings Tab of the Mailbox Database in the associated Management Console:

However, in the screenshot above you’ll notice that the “Offline Address Book” field is blank on this Mailbox Database. This is the case by default with all Mailbox Databases. This is not an issue because any Mailbox Database that has its “OfflineAddressBook” attribute set to $Null by default will use the Default Offline Address Book in the Exchange Organization. This OAB can be seen below:

This means if you have configured multiple Offline Address Books in your environment then you would need to manually specify the additional OAB’s on the Databases you would want to use them; otherwise the default OAB would be used. Simply put, if the value for OAB is blank on a Mailbox Database then it will use the Default OAB. If it is hard-set then it will use whichever OAB you hard set it to. Some customers will hard set this value if they want Mailboxes on specific Mailbox Databases to use a specific OAB. Maybe an OAB that only contains a specific Address List instead of the entire GAL like the example below:

Many organizations just have 1 OAB & as a result have never populated the Properties>Client Settings>Offline Address Book value of their Mailbox Databases. This is where a big issue can come into play during an Exchange 2013 migration, or even if you just want a single Exchange 2013 server in your environment for a test group of users.

Issue:

As the recent Exchange Team Blog post announcing 2013 CU1 states, you need to make sure all of your Exchange 2007/2010 Mailbox Databases have an actual value populated for their Offline Address Book. If they are currently blank, then populate them with your current Default OAB. Nothing will change in the environment as a result of this because they will continue to use their current OAB & continue to only download the OAB changes.

Failure to do this will result in each of these Mailbox Databases switching to use the Exchange 2013 Offline Address Book that gets created during installation of your first Exchange 2013 Mailbox Server. This will result in a Full OAB Download for all of your Outlook Cached Mode clients on these Mailbox Databases; a potentially nasty situation which could bring your network to its knees.

You can see the Exchange Team Blog post in question for steps on how to repoint these databases or just use the following commands which I have taken from the post:

“Get-MailboxDatabase | Where {$_.OfflineAddressBook -eq $Null} | FT Name,OfflineAddressBook –AutoSize”

This lists all Mailbox Databases in your environment with their OfflineAddressBook attribute set to $null.

Then run:

“Get-MailboxDatabase | Where {$_.OfflineAddressBook -eq $Null} | Set-MailboxDatabase -OfflineAddressBook (Get-OfflineAddressBook | Where {$_.IsDefault -eq $True})”

This command will grab each of these Mailbox Databases & populate the OfflineAddressBook attribute with the value of your Organizations current Default OAB. Effectively changing nothing in terms of client behavior but ensuring that when you install Exchange 2013, each of these Mailbox Databases do not switch over to using the 2013 OAB; at least not until you are ready & can stage this process, maybe one MDB at a time.

Summary:

These steps should be mandatory for any organization considering implementing an Exchange 2013 Server into their existing Exchange 2007/2010 environment.

Background:

I originally discovered this issue back in early Feb & let a couple people on the Exchange Product Team know about it via the TAP but it seems to be affecting more customers than initially thought so I thought I’d share.

In Outlook 2007 through Outlook 2010 all domain-joined Outlook clients would initially query Active Directory for AutoDiscover information & ultimately find a Service Connection Point (SCP) value that would point them to their nearest Client Access Server’s AutoDiscover virtual directory. If that failed then they would revert to using DNS like any non-domain-joined Outlook client. The order of this non-domain-joined lookup is as follows:

https://company.com/autodiscover/autodiscover.xml

https://autodiscover.company.com/autodiscover/autodiscover.xml

Local XML File

http://company.com/autodiscover/autodiscover.xml (looking for a redirect website)

SCP AutoDiscover Record

Why it ever looked to https://company.com/autodiscover/autodiscover.xml I’ll never really know because honestly I’ve never come across a customer who had it deployed that way; most have https://autodiscover.company.com/autodiscover/autodiscover.xml but I imagine when Exchange 2007 was first being developed they weren’t exactly sure how customers would be implementing AutoDiscover.

Issue:

The above methods have served us well since Exchange 2007 timeframe but for some reason the Outlook team decided to try & implement some giddyup into Outlook & try to speed up the process. They decided to have domain-joined Outlook 2013 clients query both the SCP values in AD as well as the DNS records at the same time. If an SCP record was found it would still be used but in the event it failed then it would already have the DNS response ready to go. Great idea, however there’s one problem in the implementation.

If Outlook 2013 encounters any kind of Certificate error while doing the simultaneous DNS query then you will receive a pop-up in Outlook about the cert.

I actually stumbled upon this while in the middle of the scenario below:

That’s right, I actually get a certificate pop-up for my lab’s domain name (ash15.com) & not autodiscover.ash15.com like one would expect if I were to have a certificate issue on Exchange.

When Outlook 2013 does it’s simultaneous DNS AutoDiscover query the first URL it tries is https://company.com/autodiscover/autodiscover.xml, which in my lab environment resolved to my Domain Controller, which was also serving DNS, as well as a Certificate Authority. Ash15.com resolved to this server because it’s my internal Active Directory domain name & the name server entry resolves to my DC (just ping internaldomainname.local in your AD lab environment & you’ll see the same thing).

Now because I have web enrollment enabled & am listening on 443 in IIS the server responded. Also, because I did not have a cert installed on the server with ash15.com in the Subject or Subject Alternative Name then it gave the certificate error we see above.

Resolution:

The error is easy enough to get through & it only occurred on initial profile creation but this can definitely prove painful for some customers. Obviously my lab environment is a corner case but there have been several other customers report this issue with Outlook 2013 as well.

Here’s an example scenario.

Imagine you have a public website for andrewswidgets.com hosted by a third-party hosting site & you did not pay for HTTPS/443 services. However if you were to query the website using https then it could respond & obviously not return a certificate with andrewswidgets.com on it (because you haven’t paid for it you cheapskate…). Now imagine you begin deploying users using Outlook 2013 in your internal environment. In the past, they would have found the SCP record that would have pointed them to your internal Exchange 07/10/13 server for AutoDiscover & would have been happy as a clam (one Exchange Product Manager’s favorite way to describe Exchange bliss). However, now they may get a certificate pop-up for andrewswidgets.com when creating a new profile.

There are a couple ways around this. Make sure andrewswidgets.com doesn’t listen on 443, or possibly get a proper cert on your website that is listening on 443. Simply put, just make sure whatever andrewswidgets.com resolves to is something that’s not going to throw a certificate error.

I’ve heard nothing concrete or public but the Outlook team is aware of the issue & listening to customer feedback. I suggest contacting Microsoft Support if your organization is running into this issue.

Also, this KB offers methods to control which AutoDiscover methods are used by your Outlook clients

When at first I was looking into this the TechNet documentation was extensive and yet not as specific as I would prefer, so here is the quick and dirty DLP classification!

Creating and importing custom Classifications

First you need to create your custom policy XML (Example Below)
Save as XML Unicode file type (C:\MyNewPolicy.xml)
Open the XML in internet explorer if its formatted correctly you will see the XML.
Then import with Powershell
New-ClassificationRuleCollection –FileData ([Byte[]]$(Get-Content -path C:\MyNewPolicy.xml -Encoding byte -ReadCount 0))
Once its imported you should be able to create a new DLP policy using the EAC

Creating a custom DLP Rule

Login to EAC (i.e https://mail.domain.com/ecp)
Click Compliance Management, data loss prevention
Click the Plus , then New custom policy
Name your policy and Choose your mode (I like to test with Policy tags), and click Save
Select the policy and click the edit your new policy
Select Rules from the left
Click the to Create a new rule
On the Apply this rule if field choose The message contains Sensitive information..
Click *Select sensitive information types….. (if applicable)
Click the to choose from the list,
You should now see your new classification (from the example below it would be Secure Product Codes\ DLP by Exchangemasters.info)

Useful Tools

Regex – http://gskinner.com/RegExr/
GUID creator – http://www.guidgenerator.com/online-guid-generator.aspx
Technet – http://technet.microsoft.com/en-us/library/jj674704(v=exchg.150).aspx

Example of a Rule Classification XML

<?xml version=”1.0″ encoding=”utf-16″?>

<RulePackage xmlns=”http://schemas.microsoft.com/office/2011/mce”>

<RulePack id=”b4b4c60e-2ff7-47b2-a672-86e36cf608be”>

    <Version major=”1″ minor=”0″ build=”0″ revision=”0″/>

    <Publisher id=”7ea13c35-0e58-472a-b864-5f2e717edec6″/>

    <Details defaultLangCode=”en-us”>

      <LocalizedDetails langcode=”en-us”>

        <PublisherName>DLP by Exchangemasters.info</PublisherName>

        <Name>Secure Product Codes</Name>

        <Description>Secure Products</Description>

      </LocalizedDetails>

    </Details>

</RulePack>

<Rules>

    <!– Product Code –>

    <Entity id=”acc59528-ff01-433e-aeee-13ca8aaee159″ patternsProximity=”300″ recommendedConfidence=”75″>

      <Pattern confidenceLevel=”75″>

        <IdMatch idRef=”Regex_Product_Code” />

        <Match idRef=”Code” />

      </Pattern>

    </Entity>

    <Regex id=”Regex_Product_Code”>[A-Z]{3}[0-9]{9}

</Regex>

    <Keyword id=”Code”>

      <Group matchStyle=”word”>

        <Term>Code</Term>

      </Group>

    </Keyword>

    <LocalizedStrings>

      <Resource idRef=”acc59528-ff01-433e-aeee-13ca8aaee159″>

        <Name default=”true” langcode=”en-us”>

          Product Code

        </Name>

        <Description default=”true” langcode=”en-us”>

          A custom classification for detecting product codes that have 3 uppercase letters and 9 numbers

        </Description>

      </Resource>

    </LocalizedStrings>

</Rules>

</RulePackage>

Background:

Had a co-worker ask for some basic DAG setup instructions in Exchange 2013 so I wrote a quick little guide. This covers the high points around creating the DAG as well as configuring the DAG member NICs & networks.

Step 1 – Pre-Stage DAG Computer Account
Reference. When deploying a DAG on Exchange Servers running Server 2012 you need to pre-stage the DAG computer account. The above link points to the official TechNet article for doing this but here are the basics of it:

Create a Computer Account in AD with the name of the DAG. For example, DAG-A.
Disable the Computer Account.
In Active Directory Users & Computers click View>Advanced Features. Go to the Computer Account & select Properties>Security tab.
From here you have two options; either Grant the Exchange Trusted Subsystem Full Control permissions to the DAG Computer Account or give the Computer Account of the first node you plan to join to the DAG Full Control permissions over the DAG Computer Account Object.
Reference2

Step 2 – Configure DAG NIC’s
Reference. Exchange 2013 performs automatic DAG network configuration depending on how the NIC’s are configured. This means if the NIC’s are configured correctly then you should not have to manually collapse the DAG Networks post DAG Setup. Upon adding the nodes to the DAG, it looks for the following properties on the NICs & makes a decision based on them:

NIC Binding Order
Default Gateway Present
Register DNS Checked

The DAG needs to separate MAPI/Public networks from Replication networks. This enables the DAG to properly utilize a network that the administrator has provisioned for Replication traffic & to only use the MAPI/Public networks for Replication if the Replication networks are down.

You want your MAPI/Public NICs to be top of the binding order in the OS & any Replication, Management, Backup, or iSCSI networks at the bottom of the binding order. This is a Core Windows Networking best practice as well as what the DAG looks for when trying to determine which NIC’s will be associated with the MAPI/Public DAG Networks.

The DAG also looks for the presence of a Default Gateway on the MAPI/DAG network NIC. Going along with another Windows Networking best practice, you should only have 1 Default Gateway configured in a Windows OS. If you have additional networks with different subnets on the DAG nodes then you would need to add static routes on each of the nodes using NETSH. More on this later.

Finally, NIC Properties>IPv4 Properties>Advanced>DNS>Register this connection’s addresses in DNS should be unchecked on all adapters except for the MAPI/Public NICs. This means all Replication, iSCSI, dedicated backup or management NICs should have this option unchecked. Again, this is a Windows Networking best practice but is vital for proper Automatic DAG Network Configuration in Exchange 2013.

Step 3 – Configure Routing if Needed (optional depending on DAG design)
If your DAG stretches subnets & you’re using dedicated Replication networks then they should be on their own subnet isolated from the MAPI/Public network. A common setup for a network such as this might be:

Site-Austin:
MAPI Network 192.168.1.0/24; Default Gateway 192.168.1.254
Replication Network 10.0.1.0/24; Default Gateway $Null

Site-Houston:
MAPI Network 192.168.2.0/24; Default Gateway 192.168.2.254
Replication Network 10.0.2.0/24; Default Gateway $Null

Now with the above configuration you would have some form of routing taking place between the two MAPI subnets. You would also have routing between the two Replication subnets. However, because you should only have 1 Default Gateway configured per server, DAG nodes in each site would be unable to communicate with each other over the Replication networks. This is where static routes come into play. You would run the following commands on the nodes to allow them to ping across to each other between the 10.0.1.x & 10.0.2.x networks (in the below example, REPL is the name of each node’s Replication NIC):

On Nodes in Site-Austin: “netsh interface ip add route 10.0.2.0/24 “REPL” 0.0.0.0”

On Nodes in Site-Houston: “netsh interface ip add route 10.0.1.0/24 “REPL” 0.0.0.0”

This is the preferred format for this command. There are some references to using the local interface IP instead of 0.0.0.0 but the format I use above is what is recommended by the Windows Networking Team. Reference.

“According to our Networking Development Groups, the recommendation actually is that on-link routes should be added with a 0.0.0.0 entry for the next hop, not with the local address (particularly because the local address might be deleted) and with the interface specified.”

This all assumes there is physical routing in place between the two subnets, like a Router, layer 3 Switch, or a shared virtual network in Hyper-V/ESX.

Verify connectivity between nodes over these 10.0.x.x networks using Tracert or Pathping. Note that these steps are only required if your DAG spans subnets & has replication networks in different subnets. While it technically should work, it is not recommended to stretch subnets for DAG Networks across the WAN.

It should also be noted that there should be no routing between the MAPI Networks & the Replication Networks. They should be on isolated networks that have no contact with each other. Also, Microsoft wants no greater than 500ms round trip latency between DAG nodes when you have DAG members across latent network connections. It’s important for customers to realize that you should not set your expectations around this number alone. You could easily have a connection over 500ms & not experience copy queues if you have only 20 mailboxes with low usage profiles. Alternatively, you could have a connection with only 50ms of round-trip latency but see high copy queues if you have thousands of high-usage mailboxes & a small bandwidth pipe. Just know that this number is not an end all be all.

Step 4 – Create DAG & Add Nodes
This part is pretty straightforward & you can use the EAC to do it. Just remember to give the DAG an IP address in every MAPI subnet where you have DAG nodes. So in our scenario above you would give the DAG 2 IP addresses; one in the 192.168.1.0 subnet & another in the 192.168.2.0 subnet.

Step 5 – Manually configure DAG Networks if needed
Reference. If you have dedicated management networks, dedicated backup networks, or iSCSI NIC’s then you would actually have to perform some manual steps after your DAG is setup. These networks should be ignored by the DAG & for cluster use. In order to do this we must first enable Manual DAG Network Configuration, which is disabled by default. We would then need to configure the iSCSI or similar network to be ignored by the cluster. Perform the following steps:

Get-DatabaseAvailabilityGroup
Set-DatabaseAvailabilityGroup <DAGName> -ManualDagNetworkConfiguration:$True
Get-DatabaseAvailabilityGroupNetwork
Set-DatabaseAvailabilityGroupNetwork <iSCSI/Backup/Mgmt NetworkName> -IgnoreNetwork:$True

Finally, let’s validate everything. Run the below command:

Get-DatabaseAvailabilityGroupNetwork | Format-List Identity,ReplicationEnabled,IgnoreNetwork

Verify that the iSCSI/Backup/Mgmt networks have IgnoreNetwork set to True (the MAPI & Replication networks should have this set to False). Also verify that the Replication Networks have ReplicationEnabled set to True. Finally, verify that the MAPI network has ReplicationEnabled set to False. This prevents the MAPI network from being used for Replication by default. It can still be used for Replication if all other possible replication paths go down.

References:
http://technet.microsoft.com/en-us/library/ff367878.aspx

http://technet.microsoft.com/en-us/library/dd298065(v=exchg.150).aspx

http://blogs.technet.com/b/scottschnoll/archive/2012/10/01/storage-high-availability-and-site-resilience-in-exchange-server-2013-part-2.aspx

http://blogs.technet.com/b/askcore/archive/2009/05/26/active-route-gets-removed-on-windows-2008-failover-cluster-ip-address-offline.aspx

http://technet.microsoft.com/en-us/library/dd298008(v=exchg.141).aspx

Background:

It seems like this sentiment has been preached widely but yet I still see customers do this. In fact I’m writing this today because earlier this week I had a customer who’s Information Store Service, as well as the Exchange Transport Services, on Exchange 2013 would not start. Then earlier today a coworker actually did this in a lab which caused the same issue.

Summary:

Let’s start off with this, The Exchange Server Product Team performs Zero testing or validation on systems with IPv6 Disabled. So that right there should be a good indicator that you’re trailblazing on your own in the land of Exchange (bring a flashlight, it’s dark & scary).

So I’m going to cover two very different things here:

Unchecking IPv6 on the NIC adapter (BAD)
Properly Disabling IPv6 in the registry (Ok but not recommended by MS)

Unchecking Method (BAD):

Let’s first talk about un-checking IPv6 on your NIC adapters. The problem with this is while the OS still thinks it can & should be using IPv6, the NIC is unable to do so which leads to communications issues. An easy way to test that your OS is still trying to use IPv6 is to ping localhost after you have unchecked IPv6 on your NIC & rebooted. You’re see that you still get an IPv6 response. I actually did a write-up about this topic on the Sysadmin community on Reddit awhile back which you can find here. As a side note, check out the Exchange community a colleague & I moderate on reddit here.

While doing this has always caused sporadic issues with Exchange, Exchange 2013 seems to be even more sensitive in this regard. Since RTM, I’ve seen half a dozen Exchange 2013 issues that were resolved by re-checking IPv6 on the NIC adapter & rebooting. Here’s what I’ve seen so far:

Having Ipv6 unchecked when performing an Exchange 2013 install will result in a failed/incomplete installation which will result in having to perform a messy cleanup operation before you can continue.
Microsoft Exchange Active Directory Topology Service may not start if the Exchange 2013 server is also a Domain Controller and IPv6 has been unchecked. The solution is to re-check it & reboot the server.
Microsoft Exchange Transport Service as well as the Microsoft Exchange Frontend Transport, Microsoft Exchange Transport Submission, & Microsoft Exchange Transport Delivery services may not start if IPv6 has been unchecked on the NIC adapter of an Exchange 2013 Server.
Microsoft Exchange Information Store Service may not start if IPv6 has been unchecked on an Exchange 2013 Server.

Disabling IPv6 in the Registry:

I started this post saying that MS does no testing or validation for systems with IPv6 disabled in ANY WAY. However, some customers may actually have reasons for disabling Ipv6. I’m actually interested in hearing them but I also know some customers are very adamant about it. There actually was an issue in the past where Outlook Anywhere wouldn’t work in certain scenarios with IPv6 enabled but this should not be a problem with a fully updated Exchange Server (reference).

I’ll also say that I personally have never had any issues with properly disabling IPv6 in the registry using this method. You basically add a DisabledComponents key to the registry with a value of 8 F’s (ffffffff) & then reboot the server. After this point IPv6 should be fully disabled. I’ve also spoken with a couple Microsoft Support Engineers who have also said that they have personally never seen any issues with disabling it this way; with Windows or Exchange. However, in my opinion you should have a good reason for doing so (and saying you don’t like IPv6 is NOT a good reason).

Lastly, I’d like to add that if you’re utilizing iSCSI on your Exchange server, there should be no issues with unchecking IPv6 on your iSCSI NICs if you choose to do so. The article was specifically in relation to NICs connected to your production/public/MAPI networks. As usual, follow your SAN vendor’s best practices when configuring iSCSI NICs.

Also, here’s a shameless plug for the ExchangeServer subreddit (http://www.reddit.com/r/exchangeserver) which I help moderate (username=ashdrewness). There’s always people such as myself answering questions on there.

This issue comes fresh from a Microsoft Crit-Sit case I was just on for one of our customers.

Issue:

All client access was broken (specifically OWA) on a standalone Multi-Role Exchange 2013 CU2 Server. User’s would receive “The website cannot display the page” after authenticating to OWA. This started after the customer installed CU2.

Also, if you look in the HTTPProxy logs (C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\Owa) you would see the following error:

“ServerLocatorError,POST,,,,, The database with ID 5105a9bc-cfcd-4842-baaf-451561550e08 couldn’t be found. —> Microsoft.Exchange.Data.ApplicationLogic.Cafe.MailboxServerLocatorException: The database with ID 5105a9bc-cfcd-4842-baaf-451561550e08 couldn’t be found“

Full Error:

Resolution:

After a night’s worth of troubleshooting we escalated to Tier 3 in Microsoft Support & our resolution came from a setting I would not at all have expected. Before installing CU2 the customer had read a blog stating some maintenance steps he should perform on his Exchange Server beforehand. One of them was running “Set-MailboxServer -Identity EXServerName -DatabaseCopyAutoActivationPolicy Blocked”. This customer did not have a DAG so this command was not needed but nonetheless this command should have absolutely no ill effect on the ability of CAS to proxy requests to the mailbox server components. All this command should do is tell the DAG that no mailbox database copies can be automatically activated on this server. It would take an admin action to override this & activate the database. But again, no DAG so it should not matter.

However in this case it was causing CAS to break as it could not find the mailbox database. I was able to replicate this issue in my own lab by setting my DatabaseCopyAutoActivationPolicy to Blocked on my two Exchange 2013 CU2 Mailbox Servers (also not in a DAG so the setting “should” not matter). After making the change & restarting some services I was greeted with the very same errors when trying to login to OWA. I also received the very same “ServerLocatorError” “The database with ID <GUID> couldn’t be found”.

So the resolution in this case is to just run “Set-MailboxServer -Identity EXServerName -DatabaseCopyAutoActivationPolicy Unrestricted“

I was told Microsoft Support would escalate this internally but I am currently unsure if this affects only CU2 or all Exchange 2013 builds as my lab is only 2013 CU2. I’m also unsure if this only affects multi-role servers or only servers not in a DAG but I hope to test & report the findings.

Update: I’ve been told by others that this setting has this same impact on CU1 systems.

Update#2: We tried asking MS Support to classify this as a bug so it would be fixed (also so our support bill would be compensated as is the case with all bugs). However, they would not agree to classify this as a bug. The answer from Support was “the fact that is was not easy to find is simply due to the complexity/functionality of our product”. We were told that if we wanted to push harder to classify it as a bug then we would first have to write-up a business impact statement & then it could be tested/researched internally. However, if the Product Team did not deem it a bug then we would be charged for the hours spent testing. I’m pretty disappointed in this response. I suppose asking someone to fix (or even acknowledge) a product bug simply simply because something is broken is asking too much.

This is a fairly basic post but it happens enough that I’d like to call out the basics of troubleshooting it. I’ve seen many cases over time where mail flow is either being halted or become sluggish due to a third-party transport agent (I actually saw 3 instances of this happening this past month which prompted this post).

Examples of Transport Agents could be Anti-Virus software, Anti-Spam software, DLP software, agents which add disclaimers to email messages, or email archiving solutions. I won’t call out specific vendors as I don’t think there’s necessarily anything wrong with any particular one. Sometimes an install of a piece of software just becomes corrupted or there’s some unforeseen incompatibility between the third-party software & Exchange; or some other software in the environment. However, sometimes the Agent can indeed have a bug which needs to be addressed with the vendor.

Anyways, here’s the ways in which I’ve seen these issues manifest themselves:

Messages Stuck in the Submission queue
A delay in SMTP response (when you telnet to the Exchange Server over 25, it takes longer than expected for the server’s SMTP banner to be displayed)
Messages are slow to flow through the transport pipeline (general slow delivery)
Microsoft Exchange Transport Service will not start or repeatedly crashes

To highlight more recent examples, last week I had a colleague come to me saying he had two Exchange 2010 Hub/CAS boxes, with the same config, yet one of them would have a slower connection when he would telnet to it; the banner would take at least 20 seconds to be displayed. This also resulted in the health checks for the hardware load balancer in place to mark the server as down. Each server had the same Anti-V/Anti-SPAM software installed, yet only one was showing the symptoms. For testing purposes he “disabled” the third-party software using its management interface but the issue persisted.

However, after running a “Get-TransportAgent” on the server, the Transport Agent still showed as being “Enabled”. This demonstrates a point I frequently make with customers, that disabling Anti-Virus software rarely serves as a useful troubleshooting step (even file-based Anti-V). This is because the TransportAgent is typically still enabled. For file-based Anti-Virus, even with the Services disabled there is usually still a network filter driver that is sitting on the TCP/IP stack which could be causing issues (only an uninstall of the 3^rd-party product removes it).

Bottom-line, an uninstall is still the best method to remove potentially problematic Anti-V/Anti-SPAM/Anti-Malware software. So in this case the issue was a bad/corrupted install of the product on that server.

Another scenario (also Exchange 2010) was where messages were stuck in the Submission Queue for extended periods of time. The Application Logs were filled with Event 1050 MSExchange Extensibility events which were stating the installed agent was taking an unusual amount of time to process an event; thus causing the delay in transport (Reference 1 2 3).

After running Get-TransportAgent I was actually greeted by an error message saying it was unable to access a file located in the “C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\agents” directory. This is where the files associated with your Transport Agents are stored. So again, the issue was a corrupted install of the product. Reinstalling the software resolved the issue.

So nothing fancy about this one. Just check Event Viewer for Transport events or use process of elimination if you’re experiencing any of the symptoms above. Having worked with Microsoft Support many times in the past, they will almost always ask you to remove third-party components such as Anti-V if they are unable to pinpoint the issue to its source; so save yourself some time & rule it out first.

I know some people work for companies where this is like pulling teeth but it’s always going to be a battle between usability & security. If your management requires you spend 40 hours on the phone working with a vendor or Microsoft before finally being told you’re going no further until removing the third-party component then I give you my best & suggest you get the coffee started. We all know the most important acronym in IT is CYA after all

For great reading on Exchange Transport Agents see MCM/MCSM/MVP Brian Reid’s two posts on the topic

Creating a Simple Exchange Server Transport Agent

Exchange 2013 Transport Agents

I feel the concepts surrounding this issue have been mentioned already via other sources (1 2) but I’ve seen at least 5 recent cases where our customers were being adversely impacted by this issue; so it’s worth describing in detail.

Summary:

After creating new Receive Connectors on Multi-Role Exchange 2013 Servers, customers may encounter mail flow/transport issues within a few hours/days. Symptoms such as:

Sporadic inability to connect to the server over port 25
Mail stuck in the Transport Queue both on the 2013 servers in question but also on other SMTP servers trying to send to/through it
NDR’s being generated due to delayed or failed messages

This happens because the Receive Connector was incorrectly created (which is very easy to do), resulting in two services both trying to listen on port 25 (the Microsoft Exchange FrontEnd Transport Service & the Microsoft Exchange Transport Service). The resolution to this issue is to ensure that you specify the proper “TransportRole” value when creating the Receive Connector either via EAC or Shell. You can also edit the Receive Connector after the fact using Set-ReceiveConnector.

Detailed Description:

Historically, Exchange Servers listen on & send via port 25 for SMTP traffic as it’s the industry standard. However, you can listen/send on any port you choose as long as the parties on each end of the transmission agree upon it.

Exchange 2013 brought a new Transport Architecture & without going into a deep dive, the Client Access Server (CAS) role runs the Microsoft Exchange FrontEnd Transport Service which listens/sends on port 25 for SMTP traffic. The Mailbox Server role has the Microsoft Exchange Transport Service which is similar to the Transport Service in previous versions of Exchange & also listens on port 25. There are two other Transport Services (MSExchange Mailbox Delivery & Mailbox Submission) but they aren’t relevant to this discussion.

So what happens when both of these services reside on the same server (like when deploying Multi-Role; which is my recommendation)? In this scenario, the Microsoft Exchange FrontEnd Transport Service listens on port 25, since it is meant to handle inbound/outbound connections with public SMTP servers (which expect to use port 25). Meanwhile, the Microsoft Exchange Transport Service listens on port 2525. Because this service is used for intra-org communications, all other Exchange 2013 servers in the Organization know to send using 2525 (however, 07/10 servers still use port 25 to send to multi-role 2013 servers, which is why Exchange Server Authentication is enabled by default on your default FrontEndTransport Receive Connectors on a Multi-Role box; in case you were wondering).

So when you create a new Receive Connector on a Multi-Role Server, how do you specify which service will handle it? You do so by using the -TransportRole switch via the Shell or by selecting either “Hub Transport” or “FrontEnd Transport” under “Role” when creating the Receive Connector in the EAC.

The problem is there’s nothing keeping you from creating a Receive Connector of Role “Hub Transport” (which it defaults to) that listens on port 25 on a Multi-Role box. What you then have is two different services trying to listen on port 25. This actually works temporarily, due to some .NET magic that I’m not savvy enough to understand, but regardless, eventually it will cause issues. Let’s go through a demo.

Demo:

Here’s the output of Netstat on a 2013 Multi-Role box with default settings. You’ll see MSExchangeFrontEndTransport.exe is listening on port 25 & EdgeTransport.exe is listening on 2525. These processes correspond to the Microsoft Exchange FrontEnd Transport & Microsoft Exchange Transport Services respectively.

Now let’s create a custom Receive Connector, as if we needed it to allow a network device to Anonymously Relay through Exchange (the most common scenario where I’ve seen this issue arise). Notice in the first screenshot, you’ll see the option to specify which Role should handle this Receive Connector. Also notice how Hub Transport is selected by default, as is port 25.

After adding this Receive Connector, see how the output of Netstat differs. We now have two different processes listening on the same port (25).

So there’s a simple fix to this. Just use Shell (there’s no GUI option to edit the setting after it’s been created) to modify the existing Receive Connector to be handled by the MSExchange FrontEndTransport Service instead of the MSExchange Transport Service. Use the following command:

Set-ReceiveConnector Test-Relay –TransportRole FrontEndTransport

I recommend you restart both Transport Services afterwards.

Overview:

I’ve come across this with customers a few times now & it can be a real head scratcher. However, the resolution is actually pretty simple.

Scenario:

Customer has multiple Exchange servers in the environment, or has just installed a 2^nd Exchange server into the environment. Customer is able to send directly out & receive in from the internet just fine but is unable to send email to/through another internal Exchange server.

This issue may also manifest itself as intermittent delays in sending between internal Exchange servers.

In either scenario, messages will be seen queuing & if you run a “Get-Queue –Identity QueueID | Formal-List” you will see a “LastError” of “451 4.4.0 DNS query failed. The error was: SMTPSEND.DNS.NonExistentDomain; nonexistent domain”.

Resolution:

This issue can occur because the Properties of the Exchange Server’s NIC have an external DNS server listed in them. Removing the external DNS server/servers & leaving only internal (Microsoft DNS/Active Directory Domain Controllers in most customer environments) DNS Servers; followed by restarting the Microsoft Exchange Transport Service should resolve the issue.

Summary:

The Default Configuration of an Exchange Server is to use the local Network Adapter’s DNS settings for Transport Service lookups.

(FYI: You can alter this in Exchange 07/10 via EMS using the Set-TransportServer command or in EMC>Server Configuration>Hub Transport>Properties of Server. Or in Exchange 2013 via EMS using the Set-TransportService command or via EAC>Servers>Edit Server>DNS Lookups. Using any of these methods, you can have Exchange use a specific DNS Server.)

Because the default behavior is to use the local network adapter’s DNS settings, Exchange was finding itself using external DNS servers for name resolution. Now this seemed to work fine when it had to resolve external domains/recipients but a public DNS server would likely have no idea what your internal Exchange servers (i.e. Ex10.contoso.local) resolve to.The error we see is due to the DNS server responding, but it just not having the A record for the internal host that we require. If the DNS server you had configured didn’t exist or wasn’t reachable you would actually see slightly different behavior (like messages sitting in “Ready” status in their respective queues).

An Exchange server, or any Domain-joined server for that matter, should not have its NICs DNS settings set to an external/ISPs DNS server (even as secondary). Instead, they should be set to internal DNS servers which have all the necessary records to discover internal Exchange servers.

References

http://support.microsoft.com/kb/825036

http://technet.microsoft.com/en-us/library/bb124896(v=EXCHG.80).aspx

“The DNS server address that is configured on the IP properties should be the DNS server that is used to register Active Directory records.”

http://technet.microsoft.com/en-us/library/aa997166(v=exchg.80).aspx

http://exchangeserverpro.com/exchange-2013-manually-configure-dns-lookups/

http://thoughtsofanidlemind.com/2013/03/25/exchange-2013-dns-stuck-messages/

Overview

The title might sound a bit scary but this one was actually a pretty easy fix. It’s a lesson in not digging yourself into a deeper hole than you’re already in during troubleshooting. I wish I would’ve had this lesson 10yrs ago :)

Scenario

The customer was unable to login to OWA, EAC, or Exchange Management Shell on any Exchange 2013 SP1 server in their environment. The errors varied quite a bit, when logging into OWA they would get:

“Something went wrong…

A mailbox could not be found for NT AUTHORITY\SYSTEM.”

When trying to open EMS you would receive a wall of red text which would essentially be complaining about receiving a 500 internal server error from IIS.

In the Application logs I would see an MsExchange BackEndRehydration Event ID 3002 error stating that “NT AUTHORITY\SYSTEM does not have token serialization permission”.

Something definitely seemed to be wrong with Active Directory as this was occurring on all 3 of the customers Exchange 2013 servers; one of which was a DC (more on that later).

Resolution

So one of the 1^st questions I like to ask of customers is “when was the last time this was working?” After a bit of investigation I was able to find out that the customer had recently been trying unsuccessfully to create a DAG from his 3 Exchange 2013 SP1 servers. They could get two of the nodes to join but the 3^rd would not (the one that was also a DC). The customer thought it was a permissions issue so they had been “making some changes in AD” to try to resolve them. I asked if those changes were documented; the silence was my answer….. :)

However, this current issue was affecting all Exchange 2013 servers & not just the one that’s also a DC so I was a bit perplexed as to what could’ve caused this.

So a bit of time on Bing searching for Token Serialization errors brought me to MS KB2898571. The KB stated that if the Exchange Server computer account was a member of a restricted group then Token Serialization Permissions would be set to Deny for it. These Restricted Groups are:

Domain Admins
Schema Admins
Enterprise Admins
Organization Management

The KB mentioned running gpresult /scope computer /r on the Exchange servers to see if they were showing as members of any of the restricted groups (see article for further detail & screenshots of the commands). I ran this command on all 3 Exchange 2013 servers & it showed their Computer accounts were all members of the Domain Admins group. In Active Directory Users & Computers I looked at each Exchange Server Computer account (on the Member Of tab) & unfortunately there were no direct ACL assignments so I had to search the membership chain of each common group that the servers were members of. The common groups that all Exchange Server Computer accounts were members of were:

Domain Computers
Exchange Install Domain Servers
Exchange Servers
Exchange Trusted Subsystem
Managed Availability Servers

Eventually I found that the Exchange Install Domain Servers group had been added as a member of the Domain Admins group during the customers troubleshooting efforts to get all their servers added as DAG members. I removed the Exchange Install Domain Servers group as a member of the Domain Admins group & then rebooted all of the Exchange servers. After the reboots the issues went away & the customer was able to access OWA/EMS.

Now this is where I had to explain to the customer that it was not supported to have an Exchange Server that was also a Domain Controller as a member of a Failover Cluster/DAG. This was why they were having such a hard time adding their Exchange server/DC as a member of their DAG.

Conclusion

I have a saying that I came up with called “troubleblasting”. i.e. “John doesn’t troubleshoot, he troubleblasts!” It started out as just a cheesy joke amongst colleagues back in college but I’ve started to realize just how dangerous it can be. It’s that state you can sometimes get into when you’re desperate, past the point of documenting anything you’re doing out of frustration, & just throwing anything you can up against the wall to see what sticks & resolves your issue. Sometimes it can work out for you but sometimes it can leave you in a state where you’re worse off than when you started. Let this be a lesson to take a breath, re-state what you’re trying to accomplish, & if what you’re doing is really the right thing given the situation. In this case, an environment was brought to its knees because a bit of pre-reading on supportability was not done beforehand & a permission change adversely affected all Exchange 2013 servers.

If you can make it to Exchange Connections in Las Vegas this September, I’ll be presenting a session on “Advanced troubleshooting procedures & tools for Exchange 2013”. Hopefully I can share some tips/tools from the field that have proven useful & can keep you from resorting to the “Troubleblasting Cannon of Desperation” :)

Background

I usually refrain from writing posts on issues where I haven’t been able to fully reproduce them in my lab but enough people seem to be having this issue that it would be good to spread the word should another person find themselves afflicted by it. I’ve seen this issue happen in two different environments & then found out via the forums that several other people have run into it as well.

Issue

I was working with a customer who migrated from Exchange 2007 to Exchange 2013. After decommissioning the 2007 servers, all the Exchange 2013 mailboxes started getting the infamous “The Microsoft Exchange Administrator has made a change that requires you quit and restart Outlook” prompt.

This seemed odd because Exchange 2013 was supposed to all but eliminate those prompts. While it did eliminate the prompts when the RPC Endpoint (Server Name field in Outlook) changed, there are still other scenarios that could result in this prompt (please see reference links at bottom of post for a detailed history). One such thing relates to the Public Folder Hierarchy.

In this customer’s scenario, I determined that the “PublicFolderDatabase” attribute on every Exchange 2013 Mailbox Database was set to a value resembling the screenshot below:

In this case, the decommissioning of Exchange 2007 & its Legacy Public Folders was not done correctly (same issue probably would have occurred if it were 2010). The Public Folder Database was showing up as a deleted object in AD. So the result was that the Outlook clients were trying to access Public Folder information but were reacting in a way that resulted in the frequent prompt to restart Outlook.

The resolution in this case was to drill down to the properties of the Mailbox Database in ADSIEDIT & set the value of “msExchHomePublicMDB” to be blank. Afterwards, a restart of the Information Store Service resolved the issue.

Additional Info

Not long after this issue, I was contacted by a Consultant I know who encountered the exact same issue. After an improperly performed Exchange 2007 migration, the Exchange 2013 mailboxes were getting prompted to restart Outlook. That environment also had Mailbox Databases that were pointed to a deleted object for their default Public Folder Database. Clearing the value & restarting the Information Store Service also resolved their issue.

After hearing this I went online to see if any others were encountering this issue. I found the below two forum posts

Reference A

Reference B

I then tried to reproduce this in my own environment but could not. Manually deleting the Exchange 2007 Server object from AD as well as manually deleting the Public Folder Database object did leave the 2013 Mailbox Databases pointing to the ghosted objects, but I did not receive the prompts. It appears there’s a particular chain of events that causes this issue but even though I could not recreate them in my lab, it certainly seems like people are running into the issue in the wild. If you start receiving these prompts then I suggest looking to make sure your attributes are not also pointed to ghosted objects.

Note: I was also informed that you could leave yourself in this scenario by incorrectly performing a migration from Legacy Public Folders to Modern Public Folders.

During the migration, you run the “Set-Mailbox <PublicFolderMailboxName> –PublicFolder –IsExcludedFromServingHierarchy:$True” command to prevent the Modern Public Folders from serving the Hierarchy requests while you’re moving data over; when you eventually complete the migration you should run “Set-Mailbox <PublicFolderMailboxName> –PublicFolder –IsExcludedFromServingHierarchy:$False” to allow it to serve the Hierarchy requests. If you do not run this command then you may receive the same prompts.

Additional References

http://blogs.msdn.com/b/aljackie/archive/2013/11/14/outlook-and-rpc-end-point-the-microsoft-exchange-administrator-has-made-a-change-that-requires-you-quit-and-restart-outlook.aspx

http://blogs.technet.com/b/exchange/archive/2011/01/24/obviating-outlook-client-restarts-after-mailbox-moves.aspx

http://blogs.technet.com/b/exchange/archive/2012/05/30/rpc-client-access-cross-site-connectivity-changes.aspx

Scenario

Customer had a single server Exchange 2013 SP1 environment that had been migrated from Exchange 2003 over time (double hop migration). The customer installed CU5 & about a month later they noticed Out of Office messages were not being sent for users who had it enabled. The environment had only been running a short while before updating to CU5 so at the time the customer didn’t correlate the issue happening with updating to CU5 (more on this later).

Issue

In this scenario, Mailbox-A would enable their Out of Office, Mailbox-B would send Mailbox-A an email, Mailbox-A would receive the email message but Mailbox-B would never receive an OOF message.

In addition to this, if you looked in the queues you would see the original email message (not the OOF message) queued even though it had already been successfully delivered.

The LastError on this message stated “432 4.2.0 STOREDRV.Deliver; Agent transient failure during message resubmission [Agent: Mailbox Rules Agent]” To make matters worse, this message would eventually timeout & the original sender (Mailbox-B in my above scenario) would receive an NDR containing “#< #4.4.7 smtp;550 4.4.7 QUEUE.Expired; message expired> #SMTP#” which would lead them to the false belief that the original message was never delivered. If Mailbox-A turned off their OOF then the message in the queue would eventually be removed without an NDR.

This situation not only prevented the customer from utilizing Out Of Office as it was intended to be used, but it also caused NDRs to be generated to anyone sending them emails while OOF was enabled.

Troubleshooting

This was a really perplexing case as I threw just about everything I had in my bag of tricks at it. Before relenting & eventually escalating to Microsoft I performed the following:

Using Get-TransportAgent to verify no 3^rd party agents were causing issues
Tested with new mailboxes on new mailbox databases
Set all Connector & Diagnostic logging to Verbose
Recreated the default receive connectors
Enabled Pipeline Tracing for the Transport as well as the Mailbox Transport services to see the messages at each stage of Transport
Installed a second Exchange server in the environment & tested with new database/users
Recreated all of the System Mailboxes
Re-ran setup /PrepareAD

All tests resulted in the same behavior. In fact, I found that I could recreate the issue without even enabling OOF. If I created a mailbox rule to have the system send an email to the original sender (effectively functioning like an OOF) I would experience the same behavior. However, an OOF is really just a mailbox rule, which would explain the error we’re seeing in the queue regarding the “Mailbox Rules Agent”. So obviously something was not allowing the rules to fire & my money was on either permissions or some goofy object in Active Directory (like an invalid character in a LegacyDN or the Organization name) that was causing the Rules Agent to choke when it tried to generate the email message.

I was able to get to Microsoft Tier 3 Support (a benefit of being a Premier Partner) & they were equally perplexed. As was expected, they had me run both an ExTrace as well as an IDNA Trace. In short, each of these tools gives the Microsoft debuggers a deep look at what’s going on within the individual processes. I’ve looked at the output of these traces a few times myself but it tends to be a little too deep for a non-debugger to make use of. Needless to say, if you’ve reached this point with Microsoft Support then you’ve certainly got an interesting case.

Resolution

So what was the eventual resolution? Well it was permissions but it still left me scratching my head. First off, here’s the fix:

We opened ADSIEDIT & navigated to the Default Receive Connector of the Exchange Server, went to the Security tab, selected the ExchangeLegacyInterop security group. By default, there is a Deny entry (seen below) for the “Accept Organization Headers” permission.

In our case, we unchecked Deny for this permission & after restarting the MSExchange Transport service the issue went away; we then started receiving OOF messages as expected. Microsoft Support explained that this permission is used when the system needs to issue the “MAIL FROM” SMTP verb; which is required when generating OOF messages or similar Rules.

I was excited to figure out the cause of the issue but was also curious why this would be affecting us since as far as I understand, only Exchange 2000/2003 servers were ever members of the ExchangeLegacyInterop security group. This group was used to enable mail flow between the legacy servers & Exchange 07/10.

Upon inspection, this customer’s Exchange 2013 servers had been made members of this group. This security group had zero members in my Greenfield Exchange 2013 environment so I suspected the customer added them there for some odd reason in the past.

Oddly enough, the Microsoft Engineer told me they had a lab environment that was also migrated from Exchange 2003 & it also had this group populated with the 2013 servers. So after asking around to several colleagues, every answer I got back was that their ExchangeLegacyInterop security group was empty. So what does this tell us? Either both environments had someone modify this group manually at some time (for whatever strange reason), or there is a particular set of circumstances that could lead to a 2013 environment having the 2013 servers as members of this group. So if it happens to you then hopefully this article will be of some use to you.

Lastly, the thing I find most bizarre about this situation is that I was unable to reproduce this issue in my SP1 lab. After adding my three 2013 SP1 servers to the ExchangeLegacyInterop security group (forcing replication & rebooting each server to be sure) I could not reproduce the issue. However, after installing CU5 on one of the servers (and thereby updating the Active Directory Schema) the issue occurred on all 3 servers. So apparently something about the CU5 schema change triggers this to be an issue. However, everything I have seen has proven that no 2013 servers should be members of this group to begin with so I’d say it’s not a bug but more a series of unfortunate events :)

Note: MS Support told me that they had another case like this but in that situation, the Accept Organization Headers permission had been denied to the Authenticated Users group on the receive connectors. Not sure how that happened either but it just proves this permission is important for the system to have.

Background

I was working with a customer who had Exchange 2010 & were in the process of migrating to Exchange 2013. As part of their migration process they pointed their Exchange 2010 Outlook Anywhere namespace (let’s call it mail.contoso.com) to Exchange 2013 in DNS. At this point all of their Outlook Anywhere clients should have been connecting to Exchange 2013 & then been proxied to Exchange 2010. While this was somewhat working, they also immediately noticed users were randomly being prompted for credentials, resulting in a negative user experience.

Sometimes the prompts would be when connecting to Public Folders while other times mail or directory connections from Outlook to Exchange.

Resolution

When I was approached with this issue/symptom it sounded familiar. After a search through my OneNote I realized I previously had a discussion with some people I know from Microsoft Support regarding this issue. Turns out this issue was recently addressed via http://support2.microsoft.com/kb/2990117 “Outlook Anywhere users prompted for credentials when they try to connect to Exchange Server 2013”.

This is actually an IIS issue with Server 2008 R2 (the operating system Exchange 2010 was installed on) that’s resolved by a hotfix. After installing the hotfix & rebooting the issue was resolved & their users no longer received the prompts.

Overview:
Here’s a quick one regarding an issue I came across not too long ago with a customer. The issue was that members of distribution lists were not getting emails addressed to it.

Issue:
Consider this scenario:

Exchange 2013 CU7 (thought it would also have the same effect on Exchange 2010; have not tested on 2007)
Users:John, Bill, Sam, & Ron

You create a Mail-Enabled Security Group in EAC called TestDL#1 & add John/Bill/Sam/Ron to it. In EAC as well as when using the Get-DistributionGroupMember; John, Bill, Sam, & Ron all show up as members. They can all receive emails sent to this group. You then go to Ron’s user account in AD Users & Computers & on the “Member Of” tab you select the TestDL#1 group & then click the “Set Primary Group” option. Obviously, in ADUC it still shows Ron as being a member of this group. However, in EAC or in shell, Ron is no longer listed as a member of the group.

The biggest problem is that when emailing the group, Ron no longer gets the emails. However, as soon as I change his Primary Group to something else he then shows up & can get the emails. This creates a situation where a user is supposed to be getting emails but isn’t. This issue is easily reproducible in a lab.

Solution:
Nothing advanced or fancy here, just don’t change the Primary Group value in AD to be a Mail-Enabled Security Group. Exchange is unable to query the membership of a user for a group that’s also been set as their Primary Group. This is because modifying this property changes the way the object appears in AD & therefore changes the results of Exchange’s query (when it routes mail to it as well as how it lists membership within its management tools).

This also brings up another suggested practice which can help you avoid this scenario all together; use Mail-Enabled Distribution Groups instead of Mail-Enabled Security Groups when possible.

References:
Distribution groups in Exchange 2010 are not showing some members
http://www.run-corp.com/distribution-groups-in-exchange-2010-are-not-showing-some-members/

Members in Exchange 2010 Distribution Group and Active Directory differ
https://social.technet.microsoft.com/Forums/exchange/en-US/7ff27688-31e1-43c1-ba0f-fee95299f31f/members-in-exchange-2010-distribution-group-and-active-directory-differ?forum=exchange2010

Setting Primary Group Excludes the User from the Group Membership in Active Directory
http://support.microsoft.com/kb/275523

Overview:

I recently had a customer come to me with a simple issue of mail not being received in his Exchange 2013 environment when sending to a Dynamic Distribution Group he had just created. Well it certainly seemed like an easy issue to track down (which it technically was) but unfortunately I was a little too confident in my abilities & made the age-old mistake of overlooking the basics. Hopefully others can avoid that mistake after giving this a read.

Scenario:

Create a Dynamic Distribution Group named TestDL#1 whose membership is defined by a Universal Security Group named TestSecurityGroup using the following command in shell:

New-DynamicDistributionGroup -Name “TestDL#1″ -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”}

Note: This command places the Dynamic DL object into the default Users OU & also sets the msExchDynamicDLBaseDN to the Users OU’s Distibguished Name (CN=Users,DC=ASH,DC=NET). This will become important later.

I can verify the membership of this group by running:

$var = Get-DynamicDistributionGroup “TestDL#1″

Get-Recipient -RecipientPreviewFilter $var.RecipientFilter

In my case, the members show up correctly as John, Bob, Sam, & Dave. However, if I send emails to this group nobody gets them. When looking at messagetracking, the recipients show as {} (see below screenshot)

Now here’s the really interesting part. My security group, as well as my users are in the OU=End_Users,OU=Company_Users,DC=ASH,DC=NET Organizational Unit. However (as mentioned before in my Note), my Dynamic DL is in the CN=Users,DC=ASH,DC=NET Organizational Unit. Now if I move my users into the Users OU, then they receive the email & show up as valid recipients.

Now no matter which OU I move my Dynamic Distribution Group (TestDL#1) to, this behavior is the same.

For instance, if I had run the below command instead, I never would have noticed an issue because the Dynamic DL would’ve been created in the same OU as the users & the Security Group.

New-DynamicDistributionGroup -Name “TestDL#1″ -OrganizationalUnit “ash.net/Company_Users/End_Users” -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”}

The last head scratcher is if I move the actual AD Security Group (TestSecurityGroup) that I’m using to filter against to a different OU, I get the same behavior (no emails).

So it would seem that the solution is to ensure you always place the Dynamic Distribution Group into the same OU where ALL of your Security Group members are as well as the security group itself is.

This seemed crazy so I had to assume I wasn’t creating the filter correctly. It was at this point I pinged some colleagues of mine to see where I was going wrong.

Tip: Always get your buddies to peer review your work. A second set of eyes on an issue usually goes a long way to figuring things out.

Solution:

As it turned out, there were two things I failed to understand about this issue.

When you create a Dynamic Distribution Group, by default, the RecipientContainer setting for that group is set to the OU where the DDG is placed. This means that because I initially did not specify the OU for the DDG to be placed in, it was placed in the Users OU (CN=Users,DC=ASH,DC=NET). So when Exchange was performing its query to determine membership, it could only see members that were in the Users OU. So the solution in my scenario would be to use the –RecipientContainer parameter when creating the OU & specify the entire domain.

EX: New-DynamicDistributionGroup -Name “TestDL#1″ -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”} –RecipientContainer “ASH.NET”

This one was particularly embarrassing because the answer was clearly in the TechNet article for the New-DynamicDistributionGroup cmdlet.

The other thing I didn’t realize was the reason my DDG broke when moving the Security Group I was filtering against. It was breaking because I specified the Security Group using its Distinguished Name, which included the OU it resided in (CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET). So by moving the group I was making my query come up empty. Now the first thing I thought of was if I could specify the group using the common name or the GUID instead. Unfortunately, you cannot because of an AD limitation:

“MemberOfGroup filtering requires that you supply the full AD distinguished name of the group you’re trying to filter against. This is an AD limitation, and it happens because you’re really filtering this calculated back-link property from AD, not the simple concept of “memberOf” that we expose in Exchange.”

So the important thing to remember here is to either not move the Security Group you’re filtering against, or if you move it, to update your filter.

Thanks go to MVPs Tony Redmond & Tony Murray for pointing these two important facts out to me.

Conclusion:

As I found out, a strong foundational knowledge of Active Directory is key to being a strong Exchange Admin/Consultant/Support Engineer. But even when you feel confident in your abilities for a given topic, don’t be afraid to ask people you trust. You might find out you’re either a bit rusty or not as knowledgeable as you thought you were J

Overview
Much is made about a healthy Active Directory environment being a prerequisite for a healthy Exchange deployment. This can be especially challenging when there are separate teams managing AD & Exchange; meaning sometimes things can slip through the cracks.

Issue
A colleague of mine recently ran into an issue when preparing to deploy Exchange 2013 into an existing Exchange Organization. While running Setup /PrepareAD, the process would fail at about 14%, stating the domain controller is not available. It was determined that the DC holding all of the FSMO roles was in the process of a reboot. At first the assumption was that this was coincidental; possibly the work of the AD team. After the server came back up, /PrepareAD was run again & had the exact same result! So it appeared something that the /PrepareAd process was doing was the culprit. The event logs on the DC gave the below output:

EVENTID: 1000
Faulting application name: lsass.exe, version: 6.3.9600.16384, time stamp: 0x5215e25f

Faulting module name: ntdsai.dll, version: 6.3.9600.16421, time stamp: 0x524fcaed

Exception code: 0xc0000005

Fault offset: 0x000000000019e45d

Faulting process id: 0x1ec

Faulting application start time: 0x01d0553575d64eb5

Faulting application path: C:\Windows\system32\lsass.exe

Faulting module path: C:\Windows\system32\ntdsai.dll

Report Id: 53c0474e-c12d-11e4-9406-005056890b81

Faulting package full name:

Faulting package-relative application ID:

EVENTID: 1015
A critical system process, C:\Windows\system32\lsass.exe, failed with status code c0000005. The machine must now be restarted.

The logs were saying that the Lsass.exe process was crashing, leading to the Domain Controller restarting (see image below).

The easiest path of troubleshooting lead towards moving the FSMO roles to another server & seeing if the issue followed it. Setup /PrepareAD was run again & the issue did in fact follow the FSMO roles.

Resolution
It was at this point that I was engaged & I had a feeling this was either a performance issue on the domain controllers or something buggy at play. Before too long I was able to find the below MS KB for an issue that seemed to match our symptoms:

“Lsass.exe process and Windows Server 2012 R2-based domain controller crashes when the server runs under low memory”
http://support.microsoft.com/en-us/kb/3025087

The customer was more than willing to install the hotfix, but we soon realized that we also had to install the prerequisite update package below (which was sizeable):

Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 update: April 2014
http://support.microsoft.com/en-us/kb/2919355

During this time, the domain controller was also updated to .NET 4.5.2. After all of this was done, Setup /PrepareAD completed successfully. My colleague was 90% certain the hotfix was the fix, but also noted that before the patch the DC’s CPU utilization was consistently running at 60%. After the updates, it now sits in the 20-30% range. So regardless, we saw much better performance & stability after updating the Domain Controllers.

Conclusions
While I understand we can’t all be up to date on our patching 100% of the time, there is some health checking we can do to the environments we manage.

For all Windows servers, I strongly recommend getting a performance baseline of the big 3: Disk, Memory, & CPU. I like to say that you can’t truly say what bad performance is defined by if you don’t have a definition of good performance in the first place. Staying up to date with Windows Updates can greatly help with this. Even though a system may have performed to a certain level at one point in time, doesn’t mean any number of other variable couldn’t have changed since then to result in poor performance today; often times, vendor updates can remedy this.

As for Domain Controllers, they’re one of the easiest workloads to test with, since a new DC can be created with relative ease. You can use a test environment (recommended) or simply deploy Windows updates to a select number of domain controllers & then compare the current behavior with your baseline.

In this customer’s case, these performance/stability issues could have resulted in any number of applications to fail that relied on AD. Some failures may have been silent, while others could’ve been show stoppers like this one.

Scenario

After switching from hosted email to Exchange 2013 on-premises, a customer noticed that when using scan-to-email functionality the .PDF files it created were not showing up as expected. Specifically, instead of an email being received with the .PDF attachment of the scanned document, they were receiving the entire original message as an attachment (which then contained the .PDF).

When the scanner was configured to send to an external recipient (Gmail in this case), the issue did not occur & the message was formatted as expected. The message was still being relayed through Exchnage, it was just the recipient that made the difference. See the below screenshots for examples of each:

What the customer was seeing (incorrect format)

What the customer expected to see (correct format)

This may not seem like a big issue but it resulted in users on certain mobile devices not being able to view the attachments properly.

Troubleshooting Steps

There were a couple references on the MS forums to similar issues with older versions of 2013, but this server was updated. My next path was to see if there were any Transport Agents installed that could’ve been causing these messages to be modified. I used many of the steps in my previous blog post “Common Support Issues with Transport Agents” including disabling two 3^rd party agents & restarting the Transport Service; the issue remained.

My next step was to disable both of the customer’s two Transport Rules (Get-TransportRule | Disable-TransportRule); one was related to managing attachment size while the other appended a disclaimer to all emails. This worked! By process of elimination I was able to determine it was the disclaimer rule causing the messages to be modified.

Resolution

Looking through the settings of the rule the first thing that caught my eye was the Fallback Option of “Wrap”. Per this article from fellow MVP Pat Richard, Wrap will cause Exchange to attach the original message & then generate a new message with our disclaimer in it (sounds like our issue).

However, making this change did not fix the issue, much to my bewilderment. There seemed to be something about the format of the email that Exchange did not like; probably caused by the formatting/encoding the scanner was using.

Ultimately, the customer was fine with simply adding an exception to the Transport Rule stating to not apply the rule to messages coming from the scanner sender email address.

Scenario
Customer stated that after replacing a certificate for their Exchange 2013 server they were unable to access Exchange Management Shell. The following error was displayed in Exchange Management Shell:

VERBOSE: Connecting to server-a.domain.com.
New-PSSession : [server-a.domain.com] Connecting to remote server server-a.domain.com failed with the following error message
: The WinRM client sent a request to an HTTP server and got a response saying the requested HTTP URL was not
available. This is usually returned by a HTTP server that does not support the WS-Management protocol. For more
information, see the about_Remote_Troubleshooting Help topic.
At line:1 char:1
+ New-PSSession -ConnectionURI “$connectionUri” -ConfigurationName Microsoft.Excha …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OpenError: (System.Manageme….RemoteRunspace:RemoteRunspace) [New-PSSession], PSRemotingTransportException
+ FullyQualifiedErrorId : URLNotAvailable,PSSessionOpenFailed

Resolution
In this case I decided to just refer to my own notes from a previous blog post. Because this error is typically associated with IIS related issues such as improper bindings, stopped web sites, or firewalls I made my way through each of the settings.

After right-clicking each of the web sites & selecting “Edit Bindings” I was greeted by the below image which immediately told me what was wrong.

(These images are actually from my lab where I recreated the issue)

It seems that in their confusion, instead of just using EAC or Exchange Management Shell to replace their certificate they decided to go into the default bindings (which rarely ever need to be modified using the IIS management tools) & add the subject name of their new certificate to the “Host Name” field of each binding. This was done on both the “Default Web Site” as well as the “Exchange Back End” website.

It’s certainly unnecessary but while it may seem harmless, it actually negatively affected the way in which IIS handles the incoming client connections. Since the Exchange Management Shell module sends the request using the Exchange Server’s internal FQDN, IIS would not answer the request because to it, it was not hosting that service. It was only answering requests for mail.ash.com (my lab’s name for the purpose of issue reproduction in this article). Interestingly enough, we could access OWA/ECP etc. using mail.ash.com but we were unable to access those services using the server’s hostname/FQDN. This makes perfect sense if you consider how IIS treats inbound connections when you use Host Names to define binding. Simply put, if you don’t leave blank Host Name fields, IIS will only answer requests for the Host Names you specifically defined.

So the solution was to blank out the Host Names & restart IIS. After doing so EMS connected without issue.

Overview

I’ve seen this issue a few times over the past months & most recently this past week with a customer. Luckily there’s a fairly simple fix to the issue published by Microsoft, but realizing not everyone remembers every Microsoft KB that gets released I thought I’d shine a spotlight on this one.

Scenario

As part of the migration process, when customers move their namespace from either Exchange 2007 or 2010 to 2013, HTTP connections start proxying through 2013 to the legacy Exchange Servers and some users will experience failures. The potential affected workloads are:
AutoDiscover
Exchange Web Services (Free/Busy)
ActiveSync
OWA
Outlook

Test or new mailboxes may not be affected.

Resolution

The cause of this is the age old problem of Token Bloat. Users being members of too many groups or having large tokens.

The fix is to implement the changes in the below Microsoft KB article

“HTTP 400 Bad Request” error when proxying HTTP requests from Exchange Server 2013 to a previous version of Exchange Server
https://support.microsoft.com/en-us/kb/2988444

The interesting thing in this scenario is that the issue was not experienced in the legacy version of Exchange & even if you look at the tokens themselves, they may not seem overly large. It seems that the process of proxying Exchange traffic is much more sensitive to this issue. Also, in a recent case that went to Microsoft, even if you increase the recommended values to a value higher than your current headers it may not have the desired effect. In our case we had to set the MaxRequestBytes & MaxFieldLength values to exactly match the values in the Microsoft KB (65536 (Decimal)).

For further reading, please see the below articles.

Complimentary Articles

“HTTP 400 – Bad Request (Request Header too long)” error in Internet Information Services (IIS)
https://support.microsoft.com/en-us/kb/2020943

How to use Group Policy to add the MaxTokenSize registry entry to multiple computers
https://support.microsoft.com/en-us/kb/938118

Additional Note

As an FYI, another issue I commonly see when namespaces get transitioned to 2013 is authentication popups when connections proxy to the legacy Exchange Servers. Please see the below KB for that issue

Outlook Anywhere users prompted for credentials when they try to connect to Exchange Server 2013
https://support.microsoft.com/en-us/kb/2990117

I also blogged about it here
https://exchangemaster.wordpress.com/2014/10/30/exchange-2010-outlook-anywhere-users-receiving-prompts-when-proxied-through-exchange-2013/

Exchange 2013 – Exchange Administration Center “Internet Explorer has stopped working” with IE 10

Beware Full OAB Downloads After Installing 1st Exchange 2013 Server in Existing 07/10 Environment

New behavior in Outlook 2013 causing certificate errors in some environments

Creating Custom DLP Classification Rules and Policy

Creating and importing custom Classifications

Creating a custom DLP Rule

Useful Tools

Example of a Rule Classification XML

Quick Exchange 2013 DAG Setup Guide

Once again, Unchecking IPv6 on a NIC Breaks Exchange 2013

DatabaseCopyAutoActivationPolicy Setting Breaks Client Access in Exchange 2013 CU2

Common Support Issues with Transport Agents

Incorrectly Adding New Receive Connector Breaks Exchange 2013 Transport

Bad NIC Settings Cause Internal Messages to Queue with 451 4.4.0 DNS query failed (nonexistent domain)

All Exchange 2013 Servers become unusable with permissions errors

Legacy Public Folder remnants in Exchange 2013 cause “The Microsoft Exchange Administrator has made a change…” prompt

OOF messages not being sent in Exchange 2013 CU5 Environment

Exchange 2010 Outlook Anywhere users receiving prompts when proxied through Exchange 2013

Beware effects to Exchange of setting Primary Group in AD

Remember the basics when working with Dynamic Distribution Groups (I didn’t)

The Importance of Updated Domain Controllers When Deploying Exchange

Emails from scanner to Exchange 2013 being sent as separate attachment

Exchange Shell errors after incorrectly modifying IIS

Failures when proxying HTTP requests from Exchange 2013 to a previous Exchange version