## Contents

Contents
Disclaimer
I am not a securities professional, so take my words with a pinch of salt.

We live in the digital age, where almost all aspects of our lives are moving online. Security, in this Internet-dependent world, is becoming more and more important.

Passwords are the gate-keeper to almost all online services, thus secure passwords are extremely important for your online securities. There are some well-known rules to creating secure passwords, like using long (more than 12 characters) passwords, including capital letters, including numbers, including special symbols, and avoiding dictionary words. However, these rules usually leads users to create jumbles of nonsense that people tend to forget.

So how can we create passwords that are both secure and easy to remember?

Before answering this question, we must first understand how passwords function and how malicious actors cracks them.

Passwords are strings users input for authorization. In the early days, a password is stored in the and compared to user inputs. One big problem is that when a hacker gets access to the database storing these passwords, he or she would be able to download the database and have access to all the passwords in plain text. To combat this, people use a technique called hashing.

### Hashing

Hashing algorithms like SHA and MD5 aims to create a unique string for each unique input. These algorithms are not perfect, as a hash collision1 is inevitable.

A more secure way of storing passwords is to hash them. Passwords are hashed and saved in the database, and authorization requests are handled by passing the provided password through the hashing algorithm before being compared to the hash saved in the database.

With a hashed password, the hacker would not be able to get access to the plain-text passwords. To recover the password from the database, the hacker would need to reverse engineer the original string from the hash. This is done through either trying out all possible combinations of the password through brute force attacking or trying out likely passwords through a dictionary attack. Compared to hashing, reverse hashing is extremely computation heavy. This ensures that the hackers would not be able to get the passwords within reasonable time.

### Salting

However, hashing can still be exploitable. When two or more users use the same password, cracking one of them would mean the hacker gain access to multiple accounts. A rainbow table attack is a more complicated dictionary-based attack that can crack hashed passwords quickly. Unsalted passwords are also vulnerable to it as rainbow attacks can quickly run through a dictionary of hashes, making common passwords unsafe.

To salt a password, a random string (the salt) is appended to the password. The result is then hashed, and the hash is saved in the database along with salt in plaintext. Note that prepending the salt is unsafe due to length extraction attacks2. A salted password will provide more security in rainbow table attacks, as random salts would jumble the plaintext corresponding to the hash, so the pre-generated hashes in the rainbow table would not match.

This is the standard practice today, and most security breaches leak salted password databases, which are extremely difficult to decipher.

### Brute Force Attack

Brute force attack is the simplest form of attack. The attacker would try out every possible combination of characters to try to match the target hash. This is a fail-safe way to decipher a hashed password, but would take extremely long. To crack a password that is at least 8 characters long and at most 16 characters long, you will need $\sum_{i=8}^{16}94^{i} \approx 3.75 \times 10^{31}$ tries to exhaust all possible combinations, and the expected number of times to guess the password would be $1.88 \times 10^{31}$, assuming we have a randomly generated password (uniform distribution). This would take approximately 6.6 billion years to crack3.

What to Do
Short passwords or passwords containing only numbers can be quickly cracked by brute force attacks, so it is strongly recommended to use longer (more than 8 characters) passwords that has combinations of numbers, letters, and symbols.

### Dictionary Attack

Dictionary attack is also extremely common. Due to the tendency for users to use dictionary words or their combinations as passwords, a dictionary attack reduces the search space using pre-generated dictionaries.

What to Do
Dictionary attacks can crack insecure passwords within a few seconds, so it is strongly recommended to avoid using dictionary words or common password phrases in your password.

### Rainbow Table Attack

Brute force attacks can exhaust every possible but runs very slow, and dictionary attacks is fast but it takes a lot of space to store a large-enough dictionary to approach the capability of brute force attacks. Rainb ow table is a middle ground that uses a clever trick to precompute hashes while not needing to store all of them.

The most important idea in a rainbow table is a hash chain. A hash function $\mathcal{H}: \mathbb{P} \to \mathbb{H}$ can be paired with a reduction function $\mathcal{R}: \mathbb{H} \to \mathbb{P}$ to get a chained function $\mathcal{H} \circ \mathcal{R}: \mathbb{P} \to \mathbb{P}$. The reduction function does not reverse the hash, but rather maps the hash back to its original domain. This allows chains of plaintext, hash pairs to be derived from a single plaintext and two functions. We only save the beginning plaintext and the ending hash as a pair to represent the chain.

$$\text{plaintext} \xrightarrow{\mathcal{H}} \text{hash} \xrightarrow{\mathcal{R}} \text{plaintext} \xrightarrow{\mathcal{H}} \text{hash} \xrightarrow{\mathcal{R}}\dots$$

This is not perfect, as collisions can happen in hashing algorithms1. With collision, we can encounter issues such as multiple chains merging to get the same outcome, or a single chain looping.

We can use more reduction functions to eliminate collision-related issues. If each reduction function is unique for all inputs, only when the same hash appears at the same column in the table does collision happen. This would avoid looping issues and a post-processing step removing duplicate chains can be performed to purge merging chains. Even though this will not ensure the chains to be collision-free, the number of collisions will be greatly reduced.

$$\text{plaintext} \xrightarrow{\mathcal{H}} \text{hash} \xrightarrow{\mathcal{R}_0} \text{plaintext} \xrightarrow{\mathcal{H}} \text{hash} \xrightarrow{\mathcal{R}_1}\dots \xrightarrow{\mathcal{H}} \text{hash}$$

Suppose that the chain has $n$ pairs, then we can compress the rainbow table by $n$ folds by only recording the first plaintext and the last hash. This algorithm can quickly determine if the password is present in these pre-computed chains:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  function rainbowTable(table, targetHash) { for i = 0; i < n; i++ { tempHash = targetHash for j = i; j < n; j++ { tempHash = Hash(Reduce[j](tempHash)) for (text, hash) in table if tempHash == hash { if falseAlarm(text, targetHash) { continue } return extractPassword(text, targetHash) } } return NotFound } } 

Simply speaking, we assume that the hash can be at each position, and when we apply the appropriate hash and reduction functions, we should be able to reconstruct a hash that is supposed to be the end of that chain. When comparing to the actual ending of each chain, we can determine if a chain potentially contains the target hash. A false alarm can be triggered when the target hash can generate the ending hash, but the chain found does not contain our target hash (due to collision). We simply go on when a false alarm is triggered.

What to Do
Salting can help defend rainbow table attacks, but this isn’t under your control. If possible, use services that are popular and actively being maintained. This will increase the possibility that the service provides necessary protection such as salting.

### Credential Stuffing

Credential stuffing is an attack where the hacker tries his or her luck with already-leaked passwords. Many people use the same password for multiple services, and a security breach from one of them may cause your other accounts to be hacked.

Always use a different password for different services/accounts. It’s hard to remember tons of passwords, so use a password manager like LastPass, Dashlane, or if you like open source, BitWarden.

Password spraying is most commonly used in online attacks. This is when the attacker gets hold of a lot of usernames/emails but does not have access to the password database. The attacker would use common passwords and try it on each account. Once the attacker hacks into one account, he or she may have access to more tools to further penetrate the system.

What to Do
There is not much you can do here other than creating a more complex password that is not likely to be sprayed. Even though the likelihood of the service being compromised does not decrease, your personal information would be less likely to be seen directly by hackers.

### Keyloggers and Phishing

Another common attack is through external software. These includes keyloggers and phishing attacks.

A keylogger is software that records your keypresses and sends them to a remote server controlled by hackers. Keyloggers can usually be injected through malware, XSS (cross site scripting), or other techniques.

Phishing is often done through email or other forms of messaging. They usually contain links that have suspicious URLs. Once a user follows these URLs, they would land on a page that looks extremely close to the legit website, but controlled by the attacker. The attacker would be able to intercept your username and password before sending them to the legitimate website and redirecting you there so you would not suspect that you have been attacked.

What to Do
Stay away from suspicious emails and websites to protect against phishing attacks. Use a password manager’s automatic form-filling functionality to avoid leaking passwords over keyloggers.

## Protecting My Online Security

Usually, when considering password strength, we consider them in terms of the amount of computation needed to crack it with a brute-force attack. Brute-force attack is the only attack that can guarantee an eventual success. Other forms of attacks take advantage of human vulnerabilities rather than directly attacking the encryption system.

A common way to measure password strength is to measure the amount of information, or entropy, contained in that password. To calculate the entropy of a random password, we start by finding out the pool of valid characters the password can use. Suppose this number to be $n$. Next, we get the length our random password $L$, so we can derive all possible combinations to be $n^L$. Since we have assumed that the distribution of password is uniform (all possible passwords have the same probability to be generated), the probability of a password to be generated would be $p=\frac{1}{n^L}$. To get the entropy, we calculate the negative base 2 logarithm of the probability. So we get the entropy $H = -log_2(\frac{1}{n^L}) = log_2(n^L)$ bits.

Though it is fairly easy to calculate the entropy of a randomly selected password, estimating the entropy for user-chosen passwords is much more difficult and past efforts to do so have not been particularly accurate4.

As we can see from the above equation, when using a random password, the larger $n$ and $L$ is, the larger the entropy is. The larger the entropy, the less likely the password would be cracked quickly through a brute-force attack. This prompts us to create longer and more complex passwords, but these random passwords are extremely hard to memorize and wee need to get some help from tools like password managers.

It is often extremely difficult to follow all the security guidelines, as it requires you to remember dozens of random strings that have capital letters, numbers, and symbols more than 8 characters long. Thankfully, password managers can help take care of remembering all of these passwords. Services like LastPass, Dashlane, and BitWarden can take care of password generation and remembering. You only need to remember one single master password to get access to all your other passwords. These services are made by securities professionals and are usually fairly robust against attacks.

Even though password managers are not completely secure, they should be more than enough for personal and enterprise use and are still recommended by securities professionals5.

Now we are left with one password to create and remember. How can we generate one that is secure while still being easy to remember?

### Diceware

A method of using natural language words to create passwords is called Diceware created by Arnold Reinhold. Diceware uses a dice to select words from a wordlist. You would roll the dice 5 times to generate a result out of $6^5 = 7776$ possible outcomes, and find the corresponding word in the wordlist. You select a few words this way and now you have a password.

Is this method secure? Let’s calculate the entropy and compare it to the recommendations of NIST.

NIST requires randomly generated passwords to be at least as long as 6 characters4. In ASCII, there are 94 non-space and non-device-control characters. This will give us a total entropy of $-log_2(\frac{1}{94^6}) \approx 39.3$ bits. Using Diceware, we can choose 3 words to get a total entropy of $-log_2(\frac{1}{7776^3}) \approx 38.8$ bits. So if we create more than 3 words in Diceware, we have met the minimum recommendations of NIST.

What if memorizing a couple random words is still difficult? Ray Eads made some minor improvements on Diceware and talked at LinuxFest Northwest 2020.

Ray had the brilliant idea to interlace nouns with adjectives. Now rather than a password with random words, you have random noun phrases. Due to adjectives being much more limited compared to nouns, Ray’s adjectives wordlist is smaller ($6^4 = 1296$ rather than $6^5 = 7776$ words). With Ray’s method, you can get around $46.5$ bits of entropy with a four-word password.

### Further Security Measures

Having a secure password is important, but sometimes it is still not secure enough. You may consider using a hardware authentication method such as fingerprints, auth keys, or two factor authentication methods. These will further increase the security of your account. However, most service providers don’t offer these options, and you will need to specifically look for these features when selecting an online service.

## Final Words

We are living in an age where the internet seeps into all aspects of our lives. Most people overlook the importance of online security. Many simply do not understand the reason behind some of the rules and best practices. It is my belief that anyone who remotely cares about computers and technology should have a basic understanding of cryptography and security. I hope this article can give you some idea of the reasonings behind some of the best practices in online security.

1. When two strings have the same hash, $\mathcal{H}(a) = \mathcal{H}(b)$. See Wikipedia for more information. ↩︎