Today an interesting situation happened when one of our employees could not contact an external company. He informed me that mail does not reach him from one domain, let’s assume it’s domain.com, although his messages did reach him, the other party did not receive any return messages stating the reason for the rejection. He asked me to check this problem, I will briefly outline how I went about it and what caused it.
News Diagnostics – Exim
As we’re using exim as our MTA here will be commands for that system, but the same process can be easily used also in postfix.
The first message is that if messages are not returned to the sender it means that they got stuck somewhere, encountered a temporary error and one of the servers will try to deliver them again. This is a standard action and there is nothing strange here people who have read our article how mail works will know about it. And the important information is that the messages were sent from domain.com.
Since the messages are stuck somewhere we will check our queue of messages to be delivered to see if they are stuck with us or have not even made it to us:
That is, the message has been in our queue for 66 minutes and cannot be delivered, which may be a bit surprising that the message is on our server and yet has not reached the addressee, but more on that in time. The next test is to check the logs with a useful command exigrep which searches for messages based on the given data, e.g. a domain and displays all the information about what happened to it. Using a simple grep in this situation would result in us not having all the information.
So we have the reason, the message is not delivered because it cannot find the DKIM entry in the DNS zone, and the message itself is signed and should be checked if it comes from a real source, so to be sure we check if such an entry exists because it may be a problem with our local DNS server. We will use a public DNS server from Google to check:
So we have the reason, the txt entry for the domain zendesk1._domainkey.domain.com does not exist and there our server will look for the DKIM key with which it can check the signature. First, why zendesk1._domainkey.domain.com? Well, earlier in the log we had DKIM data given: “d=domain.com s=zendesk1 c=relaxed/relaxed a=rsa-sha256 t=1431354869″ as you can see the entry s= is actually zendesk1 and s stands for selector. It tells us under which domain we should look for the DKIM entry, in this case zendesk1._domainkey.domain.com, according to the pattern: value s._domainkey.value d, d being the domain. If there is no txt entry there, it means that there is a temporary problem with DNS server or someone did not enter it there, as it was the case with this domain.
The dig query itself returned one more piece of information, and that is that the DNS servers are maintained at Cloudflare. This is a company that we happened to work with and we will certainly write more about it. Cloudflare acts as a proxy for the web server but this requires giving up your own DNS server as well, what does this mean? Well, that someone simply did not enter the DKIM key into the DNS zone, so our server could not check whether the message is properly signed, and waited with delivery until someone adds a DNS entry or the DNS server starts responding, but this did not happen before 6 hours from the time the message arrived on our server and it was automatically returned to the sender as undelivered due to DKIM error. I have of course informed the other side of the problem and we are waiting for them to resolve it.
A good tool to check your email configuration is the website , just send an email to the address provided and click the button and you will get a full report. Emails sent from our domains get 10/10 points.
Bonus – What is a DKIM selector?
A selector is nothing but information which DKIM key the message was signed with, and where to look for it. It was introduced to give administrators the ability to enter multiple keys for a single domain which makes it easy to revoke old or compromised keys, or to not have to keep an eye on a key because it goes missing, you can quickly change the selector and start signing with the new key, or not even change it at all but just start signing with the new key, this does however mean that messages that are sent during the change may be returned because of the wrong key.