Recipe 6.2. Matching Letters

6.2. Matching Letters

Problem

You want to see whether a value only consists of alphabetic characters.

Solution

The obvious character class for matching regular letters isn't good enough in the general case:

if ($var =~ /^[A-Za-z]+$/) {
    # it is purely alphabetic
}

That's because it doesn't respect the user's locale settings. If you need to match letters with diacritics as well, use locale and match against a negated character class:

use locale;
if ($var =~ /^[^\W\d_]+$/) {
    print "var is purely alphabetic\n";
}

Perl can't directly express "something alphabetic" independent of locale, so we have to be more clever. The \w regular expression notation matches one alphabetic, numeric, or underscore character. Therefore, \W is not one of those. The negated character class [^\W\d_] specifies a byte that must not be an alphanumunder, a digit, or an underscore. That leaves us with nothing but alphabetics, which is what we were looking for.

Here's how you'd use this in a program:

use locale;
use POSIX 'locale_h';

# the following locale string might be different on your system
unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) {
    die "couldn't set locale to French Canadian\n";
}

while (<DATA>) {
    chomp;
    if (/^[^\W\d_]+$/) {
        print "$_: alphabetic\n";
    } else {
        print "$_: line noise\n";
    }
}

__END__
silly
faзade
coцperate
niсo
Renйe
Moliиre
hжmoglobin
naпve
tschьЯ
random!stuff#here

6.2. Matching Letters

Problem

Solution

Discussion

See Also