VBA: How to Code a Letter Frequency Chart

Rolling Stones

I recently picked up a book called “Ghost in the Wires: My Adventures as the World’s Most Wanted Hacker” by Kevin Mitnick. To be totally honest, I’d never heard of Mitnick before. He was a little before my time—when he was arrested in 1995, I was still struggling to understand why the hell I had to learn fractions and everything I knew about computers was based on WarGames.

I haven’t finished it yet, but thus far, I have come to learn that he was accused of disconnecting phone lines, deleting his own arrest records, harassing some actress named Kristy McNichol, hacking into the NSA, and just generally being a nuisance. He maintains that the more outlandish claims are false. He does admit to screwing with a drive-through speaker to make it scream, “What the fuck are you looking at?” at a McDonald’s manager.

Mitnick is a little vague about his approach to writing code. It’s a narrative, not an instruction manual. He focuses more on the social engineering aspect of his activities. The con-artist part. Where you trick people into giving up their user names and passwords and such. Not my cup of tea. You need social skills for that. I was getting ready to put the book back when I noticed that he starts off every chapter with a scramble of letters. Like this:

MAX VHLM HY MAX UNL WKBOXK INGVA B NLXW MH INGVA FR HPG MKTGLYXKL

Except with each chapter, they seem to get more complicated. He introduces numbers, non-alphanumeric characters, then towards the end, it’s only ones and zeros.

The first one appears to be the easiest because it’s just letters and spaces. And the MAX/MAX/MH is a give-away. If you know a cipher is written in proper English (complete sentences) and you suspect that it’s a simple substitution, an easy way to start is to look for a three letter word that appears more than once. Decent chance it’ll be “the.” And if you know M=T, then MH can only be TO. Then you look at the two letter words. HY has to be OF, OR, or ON. And you just keep plugging away.

It’s easy enough to do by hand but if you’ve got a computer handy, why not make it help?

Let’s Talk About Letter Frequency

Some letters come up more frequently than others in English. That’s where a macros that tells you how many times a particular letter occurs can come in handy. Look at the number of times each letter appears in the Gettysburg Address:

Gettysburg Address

Here’s a character count for all the letters in a recent Wired article about the OurMine hacks.

HBO Hacks

And here’s the first 3rd of the script for Deadpool.

Deadpool

E and T frequently end up in the top two slots. The other vowels and S generally follow. X, Z, and Q tend to be on the low end.

So if you plugged the scramble in not knowing anything about it, you could very well guess that M and X are either T or E. L is likely to be a vowel or S. The more characters you have, the more accurate it tends to be, but it can help you get started even with a small sample like this. The letters they turned out to be are below the percentages.

Mitnick

The scramble in the second chapter is slightly harder because there are no “the”s to guide the way. But you look at the character count of it:

ESTD MZZV ESLE ELFRSE XP SZH EZ NCPLEP YPH TOPYETETPD HSPY T HLD L ACP-EPPY

Mitnick 2
…You can guess that E and P are going to be T or E. I figured he wasn’t going to assign E to itself, so E = T and P = E. Then all of a sudden, ACP-EPPY becomes a give-away because what word in English matches this pattern: _ _ E-TEE_?

It helps that the chapter is about Mitnick’s Survival Bookstore excursions in his youth.

So keep in mind—I start with E and T as a rule, but on shorter sentences, it doesn’t always work. That’s when I start defaulting to the other vowels, S’s, R’s, and N’s. That’s how I got the third scramble to cooperate:

PBZFSOBP DKFOBTPKX LQ PBKFI PPBKFPRY AOXTOLC IIXZ LQ ABPR BOBT PBZFSBA CL BMVQ OBAIL BPBEQ

Mitnick 3

The beauty of frequency distribution is it works even if the cipher is backwards. Once you start plugging things in, it’s pretty easy to tell if the entire sentence has been reversed. Easy, that is, if there are spaces that let you know where each word starts. Mitnick’s fourth cipher has no spaces. I’m still working on that one.

The VBA script for the letter counter is below. It’s wired to work with a spreadsheet formatted like the ones above (with A2 as a merged cell for the cipher text).

The Code Explained

Frequency Chart Script

The Copy/Paste Version

Sub FrequencyDistribution()
Dim Punct As Integer
Dim Counter As Integer
Dim dict As Object
Punct = 0
Set dict = CreateObject("Scripting.Dictionary")
Text = Cells(2, 1).Value

For Counter = 1 To Len(Text)
Letter = UCase(Mid(Text, Counter, 1))
If dict.Exists(Letter) Then
dict(Letter) = dict(Letter) + 1
ElseIf Letter = " " Or Letter = "," Or Letter = "'" Or Letter = vbLf Or Letter = "(" Or Letter = ")" Or Letter = "-" Or Letter = "." Or Letter = "—" Or Letter = ":" Then
Punct = Punct + 1
Else
dict.Add Key:=Letter, Item:=1
End If
Next

For i = 0 To dict.Count - 1
Cells(1, i + 3).Value = dict.keys()(i)
Cells(2, i + 3).Value = dict.items()(i)
Cells(5, i + 3).Value = dict.items()(i) / (Len(Text) - Punct)
Next i

Columns("C:AE").Sort key1:=Range("C2"), _
order1:=xlDescending, Header:=xlYes

End Sub 

***Spoiler Alert***
It wasn’t until after I’d written this that I looked at the scrambles again and realized I’m an idiot. Plug these into a Brute Force VBA—it’s not just a simple substitution. The first three Mitnick jumbles are Caesar Shifts (the last one in reverse). The fourth, however, does not appear to be simple shift. I’ll have to look closer at that one later.

 

 

6 Replies to “VBA: How to Code a Letter Frequency Chart”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: