April 1, 2006

On Social Security Numbers

So, the UIUC CS Department made a little oops last December. Someone uploaded an Excel file with personal information about all of the undergraduates in the department, including name, citizenship, sex, and social security number. The file was found in Google's cache by somebody who then alerted the department. Google promptly took down the page. Over the last month, myself and everyone I know in CS has gotten a letter alerting them to the incident. None of this is much of a big deal except for the name and SSN matching, though there really isn't anything we can do about that now. It's not like I can go and get a new number.

I've always been curious how SSNs are assigned. I figured there was some structure because I have similar prefixes with too many of my friends for it to be mere coincidence. I also wanted to know how hard it would be to brute force someone's SSN or alternatively, given an SSN, how much information can one learn about its owner? So I finally did some research and will share for those curious.

The SSN is composed of nine digits of the form XXX-XX-XXXX. The first 3 digits represent the area number. Prior to 1947, this represented the state your SSN was issued in. This wasn't necessarily the state you lived in because you could apply for an SSN in any state. Since then, SSN assignment because centralized to Baltimore and the 3 digits became based on the ZIP code from the applicant's mailing address, though again, not necessarily where the person resided. The number to state assignments are here for reference.

The middle two digits are the group number, which serve no specific purpose other than to break the number up into conveniently sized chunks for ordering. There is rumor that these are somewhat assigned by demographic, but this was proven to be an urban legend. There are, however, some administrative rules to assigning these group numbers. Group numbers are assigned in the following order: (1) Odd numbers from 01 through 09. (2) Even numbers from 10 through 98, (3) Even numbers from 02 through 08 (4) Odd numbers from 11 through 99. For example, if we know that the highest group assigned for area 666 is 03, we know that 666-10-XXX is an invalid number because we haven't assigned all numbers from the first rule. (Actually, neither number would be valid since no SSNs were assigned with 666 as the first three digits. Superstitious reasons, I suppose).

The last four boring digits are serial numbers and represent a serial assignment from 0001-9999 within the group. Yawn.

There are a few other rules and exceptions, such as:
  • No number will ever have all 0's assigned to any one section (No 000-XX-XXXX, XXX-00-XXXX, or XXX-XX-0000).
  • The numbers from 987-65-4320 to 987-65-4329 are reserved for advertising.
  • If a SSN is mistakenly used for advertising, it renders the number invalid. This happened once and the number was claimed by over 40,000 people as their own.

So all of this is interesting and fun, but it's all a bit daunting to realize the amount of information available with someone that has the almighty SSN key. I know that you aren't technically required to have an SSN to be a U.S. citizen, but you still need to pay taxes if you are making any form of income, including into social security (though you won't get anything back without a number). That being said, I can't really imagine it being possible to do much without sharing your SSN. It's become a national ID number despite what the Social Security Administration claims. Maybe originally its purpose was only for social security, but when someone (I) can't start an account for power at an (my) apartment without sharing their (my) SSN, you realize it has become more than that. Sure, we are only required to give it to governmental agencies, but it comes down to a matter of convenience and compliance. I don't really want to fight the power companies only to end up lighting candles in a dark home with my SSN flying outside in a million other places for them to buy, steal, or ask for.

Yours truly,
SSN# 987-65-4323

[4.12.2006] Edit: Sameer pointed out to me that the SSA posts recent assignments of group numbers. These are cached at back until December of 2003. So, for all assignments after 2003, you might be able to significantly narrow the number of possible SSNs, especially in low population areas.

1 comment:

Anonymous said...

I just wanted to thank you for posting this entry. Since I started working, I've had to test some software that has a lot of personal data entry components, and I use it as a reference for fake SSNs. I've actually got it bookmarked. Good stuff.