What’s the deal with sender authentication? Part 4

2010-09-01

Terry Zink

Microsoft, USA
Editor: Helen Martin

Abstract

Sender authentication is a hot topic in the world of email. It has a number of uses and a number of suggested uses. Which ones work in real life? Which ones don’t quite measure up? Can we use authentication to mitigate spoofing? Can we use it to guarantee authenticity? And how do we authenticate email, anyway? Terry Zink provides the answers to these questions and more, this month focusing on the use of encryption in email.


In the previous articles in this series [1], [2], [3], we’ve seen two relatively simple methods for authenticating email: SPF and SenderID. Both can be used to authenticate a sender and presumably trust a message, and both can also be used to detect spoofing. However, as we have also seen, both have their weaknesses: SPF can be circumvented by not using a domain in the P1 From that has SPF records, while SenderID can be prone to false positives when mail is sent on behalf of another. Neither technology works when mail is forwarded. Furthermore, both technologies tie a domain to a specific set of IP addresses.

To illustrate the problem, suppose my friend Tony has moved to Sacramento. I know that he always sends mail from Sacramento and so when I get the letter in the mail, I check the postmark, verify that it’s from Sacramento and that it has his name on it. But what if Tony moves to St. Louis? He has to update the postal service. And, if he moves to Boston, he has to update the postal service again.

Furthermore, what if Tony gives the letter to Frank to deliver? Frank might have to make a stop in Memphis, Tennessee before he mails the letter. When I receive it, I see that it has Tony’s name on it, but it came from Memphis. I know that Tony always sends mail from Sacramento or St. Louis. What’s it doing coming from Tennessee? If that occurred, I would be tempted to think that it was not actually Tony’s mail. At the very least, I wouldn’t be able to verify that it was from him.

In email, in the case where I have a complex forwarding system set up to deliver mail to my personal domain, suppose I have a forwarding rule so that if Tony sends mail to my Hotmail account, it forwards to my Gmail account. I then have all of my Gmail sent to my personal domain.

The original sending IP is IP1, but the SPF check is performed on IP3, which results in a fail.

Figure 1. The original sending IP is IP1, but the SPF check is performed on IP3, which results in a fail.

The problem is that since the SPF or SenderID check is performed on the perimeter, the originating IP looks to my mail servers like Gmail’s IP. I cannot rely on header traversal to walk through IPs and search for the actual originating IP because received headers can be forged. A SenderID or SPF check will fail in this case, and it should fail; SPF and SenderID are only done on the headers that you can trust.

What would be handy would be if Tony put some sort of stamp of authenticity into his letter. What if Tony had a personal seal which he could dip into wax and stamp onto the bottom of his letter, and he was the only one in the world with this stamp? When I got the letter, rather than seeing where it came from, I could instead look for the seal at the bottom of his letter. Since Tony is the only one in the world that has this seal, I could be sure that the letter came from him.

In the postal system, and in letter writing, it would not be all that difficult to forge a seal that looked like Tony’s. Fortunately, when it comes to technology, we can do better.

Encryption

Before we get into the technology used to establish identity, we first need to understand the basics of encryption. In the olden days, people needed ways of communicating messages securely between one another. At first, they simply sent their trusted companions on horseback. For example, a king would send a message with his assistant to the general out on the front lines. As time passed and technology moved on, people began sending messages electronically because this was much quicker and you could push through more data in a shorter period of time. Generals who can communicate between each other and transmit information faster have an advantage over those who can’t. But the problem was security; if the message in transit was sensitive, then if somebody intercepted the message the secret information would no longer be secret.

The idea behind encryption is to encode the contents of the message such that even if the message is intercepted in transit, the person who intercepted it would be unable to read its contents. Consider the following message:

Ifmmp, J bn bo fodszqufe nfttbhf.

This text appears to be a bunch of gobbledygook but it is actually an example of a substitution cipher. The key is that each letter is actually the subsequent letter in the alphabet. In other words, B is substituted for A, C is switched for B, and so forth. For the above, the decrypted message is the following:

Hello, I am an encrypted message.

Different types of substitutions can be used. Above, I used a one-character algorithm, but others can be used such as a three-character substitution or an 11-character substitution. A three-character substitution would be the following:

khoor, L dp dq hqfubswhg phvvdjh.

While a substitution cipher is easy to implement, it is also very easy to break. The more text you have, the more you can use statistical analysis to break the cipher. For example, in the English language, the most common letter is ‘e’. If you were to intercept a message in transit without knowing the substitution algorithm, you would look for the letter that occurs the most often and that would be pretty likely to be the letter ‘e’. You could then look for a bunch of three-letter-words and make a guess that the first letter is ‘t’ and the second letter is ‘h’. In this way, you’ve guessed the letters for the word ‘the’. Other commonly occurring consonants are r, s, l and n. Small, two-letter words are likely to be words such as in, of, on, at, it, and so forth. Once you start getting the smaller words you can use a process of elimination to work your way backwards in order to find the rest of the letters. Sometimes it is a process of trial and error to find the words that fit, but with enough iterations you can do it.

Computers are very good at iterating algorithms to find out patterns like this. Rather than using a simple substitution cipher, you could use a more complicated algorithm – for example by substituting the first letter of the message with the letter that follows it in the alphabet, the second with the letter that appears two letters after it in the alphabet, the third with the letter that appears three letters after it in the alphabet, and then repeating the sequence.

Example of a more complicated substitution algorithm

Figure 2. Example of a more complicated substitution algorithm

However, given enough time, a computer could break this algorithm as well. It wouldn’t take very long because substitution ciphers that work by switching one letter for another are not complicated to reverse engineer. An encryption-breaking algorithm works by trying every possible combination and then running the decrypted text against a text recognition program that detects recognizable word patterns in plain text. The swapping around of letters at various fixed points in the example above would be deciphered in a trivial fashion because this is something that computers can do extremely quickly.

Enter the concept of one-way functions. A one-way function is a mathematical function that is easy to calculate one way but very difficult to calculate in the inverse. For example, consider the process of squaring a number. It is easy to calculate x2 but it is more difficult to calculate the square root of x. The algorithm for a square root is more complicated than squaring a number and takes longer to evaluate. Another example would be a logarithm. It is easier to calculate 10x than it is to evaluate log10x.

A good encryption algorithm makes use of these one-way functions. A message sender would encrypt his message using a one-way function and send it to the receiver and even if somebody intercepted the message in transit, they would have a difficult time decrypting it. The algorithm is computationally intensive, which makes breaking it cost-prohibitive (in terms of time). The non-intended recipient could break the message since all it takes to break it is a matter of time and enough computing power, however, the idea is that by the time they did this, the contents of the message would be stale. In other words, it would not be useful to the non-intended recipient. For example, if a military commander was going to organize his troops for a surprise attack on the enemy in a week’s time, he might send a message and encrypt it using an algorithm that is breakable but which would take a long time to break, at least two months on average. In transit, the message is intercepted and the enemy proceeds to attempt to break it. The enemy will eventually be successful but by the time they do, the original commander will have made his attack and the information will be stale-dated and no longer useful.

Thus, if you wanted to encrypt the contents of an email message such that it was resistant to people who might try to intercept it, you would use an algorithm that takes a long time to decrypt.

This is all well and good for the people who you don’t want reading your message, but what about the person who you do want to read it? What good is it if it takes them forever to read the message? You might as well not send it at all.

This is where secret key encryption comes in. With secret key encryption, you use a mathematical algorithm to encode your message, and use a secret key to do it. So, a message would be scrambled by using the mathematical function that returns a different result each time you use a different key.

Here’s a very basic example. Suppose you wanted to encode the number sequence:

4 8 15 16 23 42

Let’s suppose your secret key, n, is 2 and you are using the algorithm f(x) = xn. We’d encode the sequence this way:

16 64 225 256 529 1764

If our secret key were 4, we would encode the sequence this way:

256 4096 50625 65536 279841 3111696

The recipient would receive the encoded message and would also know the algorithm. Therefore they also would know the decryption algorithm (in our case, the square root of x or the 4th root of x). Secret key encryption works not by keeping the algorithm secret (as is the case of a substitution cipher) but by keeping the key secret. If you don’t know the key, it will take you a long time to figure out the contents of the message. By the time you do, the data will no longer be useful. In my example above, that doesn’t look too difficult to break. What would happen if our secret key were 8? Or 16? Or 3.14159? There are still algorithms out there that are computationally expensive for computers to break.

77.70847 685.0189 4930.904 6038.607 18872.32 125026.7

Someone who intercepted this piece of cipher text wouldn’t know what the algorithm for encryption was so they would start trying every possible combination. They might suspect that it is an exponential function and so start by attempting to decrypt using the square root of x, then the cube root of x, and then the fourth root of x. When none of those worked they might start working on decimal points, and so forth. While modern computers today can blaze through mathematical functions in the blink of an eye, it does take more CPU time to evaluate these mathematical functions. If the mathematical function is more complicated, then even powerful computers can start to slow down and take a long time to process it. (Modern encryption algorithms do not rely exclusively on mathematical functions. They swap bits around and make use of prime numbers. Furthermore, standard algorithms get reviewed by the encryption community looking for weaknesses [back doors that make them easy to reverse engineer].)

This brings me to my next point: in order to increase the security of an encrypted message, you don’t need to change the algorithm; you only need to increase the length of the key. For example, it is easier to calculate the inverse of x4 than x4.5, which is easier still than x4.59, which is easier than x4.591234, and so forth. Knowing what the secret key is makes it possible to decrypt the secret message in a shorter time frame, but it is always going to take longer than it did to encrypt it. For example, encrypting it might take two seconds but decrypting it takes three seconds. On the other hand, without knowing the secret key, the amount of computational processing time makes decryption infeasible from a usefulness perspective because of stale-dating of information.

Distribution

The basic idea behind secret key encryption is the following:

  1. The encryption algorithm should be secure (i.e. one-way).

  2. You don’t have to keep the algorithm a secret.

  3. It should only be able to be decrypted by use of a secret key.

  4. You do need to keep the key a secret.

  5. To increase the security of the contents, you lengthen the size of the key.

The next question arises: how do you distribute the key to your recipients? And what do you do if you want to update your key? Do you have to send them a letter containing it, talk on the telephone and verbalize it, or maybe send a representative on horseback carrying a new key? That’s a bit of a hassle.

This is where public key encryption comes in. Whereas with secret key encryption, the same key is used to encrypt the message as to decrypt it, with public key encryption, you use two different keys in the process: one to encrypt and one to decrypt. The public key algorithm is similar to secret key encryption except that the keys are pairs and are designed to work together. You cannot decrypt a message encoded with one key without the other (if you lose one, you’re out of luck). The keys are unique (or nearly unique) to each other. Suppose that Bob wanted to send an encrypted message to Alice. Here’s how the process works:

  1. Alice picks two keys and makes one public and keeps the other private.

  2. Bob asks Alice for her public key, and Alice gives it to him.

  3. Bob encrypts the message with Alice’s public key and transmits the message to Alice.

  4. Alice receives the message and decrypts it with her private key. Alice is the only one that can decrypt the message with her private key.

Note that after Bob encodes his message, he can’t decrypt it with the public key to double-check its contents. Once it’s encoded, it’s encoded and he can’t check it over. So, Bob can transmit the message to Alice and, just like secret key encryption, without the secret key to decrypt the message, the message contents are protected if it is intercepted in transit by an unintended party. Eventually, it could be broken but it would be time-prohibitive to do so.

Public key encryption solves the problem of key distribution. Using public key encryption, you don’t have to worry about distributing your key to others, they simply ask you for your public key, you give it to them and then they send you the message. Note that you can use either key to encrypt or decrypt, but you have to keep one of them secret. Again, the strength of this process is that you don’t have to keep the algorithm or the public key secret, only your private key.

Digital signatures

However, recall that either key can be used to encrypt and decrypt. That is, we can encrypt with the private key and decrypt with the public key. This means that anyone can intercept the message, and anyone (with knowledge of the public key) can then read the message. Why would we want to do this? Don’t we always want to keep the message contents a secret? As it turns out, there are times when we don’t care about keeping the contents a secret, we only care about who encrypted the message.

This brings us to the concept of a digital signature. In real life, a signature is something that does the following:

  1. Provides proof that a person authorized the contents of the document

  2. Is unique to the individual.

If a document is signed with a person’s signature (such as Tony’s signature), I am not concerned about the contents of the document (his letter to me), I am only concerned that Tony authorized the letter, and that the signature is unique to him.

Public key encryption allows us to digitally sign a document. Here is how the process of authentication works between Bob and Alice:

  1. Bob creates a document and signs it with his signature (i.e. ‘I am Bob and I signed this document’).

  2. Bob encrypts the document with his private key and sends it to Alice.

  3. Alice receives the message, reportedly from Bob, and asks Bob for his public key. Bob sends it to Alice.

  4. Alice takes the public key and decrypts the message. The contents of the message contain Bob’s signature, which verifies that the message came from Bob.

Public key encryption (image from http://www.data-processing.hk/uploads/images/public_key_encryption%281%29.jpg).

Figure 3. Public key encryption (image from http://www.data-processing.hk/uploads/images/public_key_encryption%281%29.jpg).

What would happen if Bob sent a key that was not part of the public/private key pair? Assume someone claims to be Bob and sent Alice a message. Alice asks the real Bob for his public key, who sends it to Alice. Alice decrypts the message, but because Bob’s public key only works with his private key, the contents of the message do not decrypt properly. Alice judges that the message did not actually come from Bob.

If the contents of the message did decrypt properly, then Alice could have judged that the message did come from Bob. Since the keys can only work in pairs, only the private key that was used to encrypt the message could have been the one used to create the signature, and only the public key could have decrypted it. In other words, encrypting a message with a private key allows others with a public key to verify (authenticate) the original signer of a message.

In the world of email, if Tony were to send a message to me, it might look something like the following (there is no actual protocol that uses this flow of events, it is for illustrative purposes):

  1. Tony decides that he will proceed to sign all of his messages with the following signature: ‘I am Tony and I approve this message’. He uploads the signature to his public DNS at diamond.net as well as his public key. (This means that there are two entries in DNS – a secret key and a clear text signature establishing Tony’s identity.)

  2. Tony next wants to send me a message. At the bottom of it, he adds a signature – ‘I am Tony and I approve this message’. He places it between two XML tags which makes it easy for me to parse:

    From the desk of <person>[email protected]</person>
    
    Hey Terry, you’re an awesome person.
    
    <signature>
    I am Tony and I approve this message.
    </signature>
    
  3. Tony encrypts the signature with his private key using the TZFA (Terry Zink’s Fictional Algorithm) algorithm. He does not encrypt the entire message, only his signature. It now looks like the following:

    From the desk of <person>Tony</person>
    
    Hey Terry, you’re an awesome person.
    
    <signature>
    ksxal;q1254naa;lkasdf\a;kz7a890asd\2;
    </signature>
    

    He then proceeds to send me the email.

  4. I receive the message and I don’t bother to do an SPF check. Instead, I see that Tony has placed his name between the <person> XML tags. I extract the signature between the <signature> tags, trimming any leading and trailing white space. I see that the message is purportedly from [email protected] and since there is a signature at the bottom, I attempt to decrypt it.

  5. I retrieve Tony’s public key which is stored in public DNS at diamond.net. I run the TZFA algorithm on the contents of the signature and it reads the following: ‘I am Tony and I approve this message’.

  6. I proceed to retrieve Tony’s clear text signature from DNS. I compare the signature from the email against the one from DNS. The two of them match, and I decide that the message really did come from Tony. My day has just got better because my friend has told me I’m an awesome person.

In this example, nobody could ever send me a message where the signature decrypts to ‘I am Tony and I approve this message’ while claiming to be from the person [email protected]. The public key only works to decrypt messages encrypted with Tony’s private key. If someone attempted to forge the message and encrypt it with a different secret key, then when I decrypted the signature it would be a different string of text and it would not match Tony’s signature which he had uploaded to DNS.

Digital signatures solve the problem of email forwarding; you no longer have to identify the correct source IP address of the mail. So long as the originator of the message always adds their signature and the signature can be extracted (a signing algorithm generally specifies how to extract the signature), you will be able to validate it. Mail can be forwarded any number of times, but as long as the contents of the signature are not modified, it will always be properly validated. The message is validated securely and reliably by the contents, not the sending IP. It is not spoofable in any practical sense.

Similarly, with digital signatures, a domain doesn’t have to tie all of its outbound mail to a particular set of IPs, it only needs to ensure that it signs with the same private key. If IPs change, it won’t matter because it is with the private/public key pair that mail is validated, not a rotating set of IPs. To be sure, keys need to be rotated every so often, but you can add more servers and outbound IPs with less overhead. The receivers of your mail will be able to validate it with a DNS query to the sending domain without worrying about your IP addresses.

A word of caution, however. Digital signatures work to establish identity and trust. They do not necessarily work to establish forgery:

  1. If a message comes to me purportedly from Tony and does not contain a signature, it doesn’t mean that the message didn’t come from him (i.e. is being spoofed). He may have not signed this particular message. Perhaps he forgot, or perhaps he is in the process of rotating keys, or doing server maintenance and didn’t have time to update the keys.

  2. If a message comes to me purportedly from Tony and does contain a signature, but the signature doesn’t validate properly (i.e. doesn’t match what he has uploaded in DNS), it doesn’t mean that the message didn’t come from him. The signature may have the wrong private/public key pair (i.e. a misconfiguration), or it could mean that the message was modified in transit since changing characters in a string affects how it is decrypted. Modifications in transit can be intentional (such as line wrapping by a mail transfer agent) or unintentional (such as line noise that changes the bit stream).

This example of digital signature validation is essentially what is done in the actual world of email. The discussion on the main technology used to do it, Domain Keys Identified Mail, or DKIM, will have to wait until next month.

Bibliography

[1] Zink, T. What’s the deal with sender authentication? Part 1. Virus Bulletin, June 2010, p.7. http://www.virusbtn.com/pdf/magazine/2010/201006.pdf.

[2] Zink, T. What’s the deal with sender authentication? Part 2. Virus Bulletin, July 2010, p.16. http://www.virusbtn.com/pdf/magazine/2010/201007.pdf.

[3] Zink, T. What’s the deal with sender authentication? Part 3. Virus Bulletin, August 2010, p.16. http://www.virusbtn.com/pdf/magazine/2010/201008.pdf.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.