Hispanic origin test for family names (.rb)
This a simple Ruby implementation of the process described in the U.S. Census Bureau's Technical Working Paper No. 13, "Building a Spanish Surname List for the 1990's—A New Approach to An Old Problem" (7.1.3, Orthographic Structure of Surnames).
Download (utf8, Unix) — View source (html)
The "data structures" could use some work (you'll see what I mean), but other than that, it works. Obviously it makes some mistakes, but when applied to a list of names it gives you a decent guess as to which percentage of those names is of hispanic origin.
Context is key to good results: use this on a list of Italian or Portuguese family names, and you'll get mostly false positives. Use on a population you know to be mostly of hispanic and non-latin family names, and you'll get a pretty good guess.
Is no data better than (educated) guesswork? You decide. Usage:
load 'hispanic_origin.rb' 'Delgado'.hispanic? => true 'Cottman'.hispanic? => false
If you'd like to share improvements, do let me know.
Licensed under MIT's.
