[Chapter 1] 1.2 Natural and Artificial Languages

1.2 Natural and Artificial Languages

Languages were first invented by humans, for the benefit of humans. In the annals of computer science, this fact has occasionally been forgotten.[3] Since Perl was designed (loosely speaking) by an occasional linguist, it was designed to work smoothly in the same ways that natural language works smoothly. Naturally, there are many aspects to this, since natural language works well at many levels simultaneously. We could enumerate many of these linguistic principles here, but the most important principle of language design is simply that easy things should be easy, and hard things should be possible. That may seem obvious, but many computer languages fail at one or the other.

[3] More precisely, this fact has occasionally been remembered.

Natural languages are good at both because people are continually trying to express both easy things and hard things, so the language evolves to handle both. Perl was designed first of all to evolve, and indeed it has evolved. Many people have contributed to the evolution of Perl over the years. We often joke that a camel is a horse designed by a committee, but if you think about it, the camel is pretty well adapted for life in the desert. The camel has evolved to be relatively self-sufficient.[4]

[4] On the other hand, the camel has not evolved to smell good. Neither has Perl.

Now when someone utters the word "linguistics", many people think of one of two things. Either they think of words, or they think of sentences. But words and sentences are just two handy ways to "chunk" speech. Either may be broken down into smaller units of meaning, or combined into larger units of meaning. And the meaning of any unit depends heavily on the syntactic, semantic, and pragmatic context in which the unit is located. Natural language has words of various sorts, nouns and verbs and such. If I say "dog" in isolation, you think of it as a noun, but I can also use the word in other ways. That is, a noun can function as a verb, an adjective or an adverb when the context demands it. If you dog a dog during the dog days of summer, you'll be a dog tired dogcatcher.[5]

[5] And you're probably dog tired of all this linguistics claptrap. But we'd like you to understand why Perl is different from the typical computer language, doggone it!

Perl also evaluates words differently in various contexts. We will see how it does that later. Just remember that Perl is trying to understand what you're saying, like any good listener does. Perl works pretty hard to try to keep up its end of the bargain. Just say what you mean, and Perl will usually "get it". (Unless you're talking nonsense, of course - the Perl parser understands Perl a lot better than either English or Swahili.)

But back to nouns. A noun can name a particular object, or it can name a class of objects generically without specifying which one or ones are currently being referred to. Most computer languages make this distinction, only we call the particular thing a value and the generic one a variable. A value just exists somewhere, who knows where, but a variable gets associated with one or more values over its lifetime. So whoever is interpreting the variable has to keep track of that association. That interpreter may be in your brain, or in your computer.

1.2.1 Nouns

A variable is just a handy place to keep something, a place with a name, so you know where to find your special something when you come back looking for it later. As in real life, there are various kinds of places to store things, some of them rather private, and some of them out in public. Some places are temporary, and other places are more permanent. Computer scientists love to talk about the "scope" of variables, but that's all they mean by it. Perl has various handy ways of dealing with scoping issues, which you'll be happy to learn later when the time is right. Which is not yet. (Look up the adjectives "local" and "my" in Chapter 3, Functions, when you get curious.)

But a more immediately useful way of classifying variables is by what sort of data they can hold. As in English, Perl's primary type distinction is between singular and plural data. Strings and numbers are singular pieces of data, while lists of strings or numbers are plural. (And when we get to object-oriented programming, you'll find that an object looks singular from the outside, but may look plural from the inside, like a class of students.) We call a singular variable a scalar, and a plural variable an array. Since a string can be stored in a scalar variable, we might write a slightly longer (and commented) version of our first example like this:

$phrase = "Howdy, world!\n";          # Set a variable.
print $phrase;                        # Print the variable.

Note that we did not have to predefine what kind of variable $phrase is. The $ character tells Perl that phrase is a scalar variable, that is, one containing a singular value. An array variable, by contrast, would start with an @ character. (It may help you to remember that a $ is a stylized "S", for "scalar", while @ is a stylized "a", for "array".)

Perl has some other variable types, with unlikely names like "hash", "handle", and "typeglob". Like scalars and arrays, these types of variables are also preceded by funny characters.[6] For completeness, Table 1.1 lists all the funny characters you'll encounter.

[6] Some language purists point to these funny characters as a reason to abhor Perl. This is superficial. These characters have many benefits: Variables can be interpolated into strings with no additional syntax. Perl scripts are easy to read (for people who have bothered to learn Perl!) because the nouns stand out from verbs, and new verbs can be added to the language without breaking old scripts. (We told you Perl was designed to evolve.) And the noun analogy is not frivolous - there is ample precedent in various natural languages for requiring grammatical noun markers. It's how we think! (We think.)

Table 1.1: Variable Syntax
Type	Character	Example	Is a name for:
Scalar	`$`	`$cents`	An individual value (number or string)
Array	`@`	`@large`	A list of values, keyed by number
Hash	`%`	`%interest`	A group of values, keyed by string
Subroutine	`&`	`&how`	A callable chunk of Perl code
Typeglob	`*`	`*struck`	Everything named `struck`

1.2.1.1 Singularities

From our example, you can see that scalars may be assigned a new value with the = operator, just as in many other computer languages. Scalar variables can be assigned any form of scalar value: integers, floating-point numbers, strings, and even esoteric things like references to other variables, or to objects. There are many ways of generating these values for assignment.

As in the UNIX shell, you can use different quoting mechanisms to make different kinds of values. Double quotation marks (double quotes) do variable interpolation[7] and backslash interpretation,[8] while single quotes suppress both interpolation and interpretation. And backquotes (the ones leaning to the left) will execute an external program and return the output of the program, so you can capture it as a single string containing all the lines of output.

[7] Sometimes called "substitution" by shell programmers, but we prefer to reserve that word for something else in Perl. So please call it interpolation. We're using the term in the textual sense ("this passage is a Gnostic interpolation") rather than in the mathematical sense ("this point on the graph is an interpolation between two other points").
[8] Such as turning \t into a tab, \n into a newline, \001 into a CTRL-A, and so on, in the tradition of many UNIX programs.

$answer = 42;               # an integer
$pi = 3.14159265;           # a "real" number
$avocados = 6.02e23;        # scientific notation
$pet = "Camel";             # string
$sign = "I love my $pet";   # string with interpolation
$cost = 'It costs $100';    # string without interpolation
$thence = $whence;          # another variable
$x = $moles * $avocados;    # an expression
$cwd = `pwd`;               # string output from a command
$exit = system("vi $x");    # numeric status of a command
$fido = new Camel "Fido";   # an object

Uninitialized variables automatically spring into existence as needed. Following the principle of least surprise, they are created with a null value, either "" or 0. Depending on where you use them, variables will be interpreted automatically as strings, as numbers, or as "true" and "false" values (commonly called Boolean values). Various operators expect certain kinds of values as parameters, so we will speak of those operators as "providing" or "supplying" a scalar context to those parameters. Sometimes we'll be more specific, and say it supplies a numeric context, a string context, or a Boolean context to those parameters. (Later we'll also talk about list context, which is the opposite of scalar context.) Perl will automatically convert the data into the form required by the current context, within reason. For example, suppose you said this:

$camels = '123';
print $camels + 1, "\n";

The original value of $camels is a string, but it is converted to a number to add 1 to it, and then converted back to a string to be printed out as 124. The newline, represented by "\n", is also in string context, but since it's already a string, no conversion is necessary. But notice that we had to use double quotes there - using single quotes to say '\n' would result in a two-character string consisting of a backslash followed by an "n", which is not a newline by anybody's definition.

So, in a sense, double quotes and single quotes are yet another way of specifying context. The interpretation of the innards of a quoted string depends on which quotes you use. Later we'll see some other operators that work like quotes syntactically, but use the string in some special way, such as for pattern matching or substitution. These all work like double-quoted strings too. The double-quote context is the "interpolative" context of Perl, and is supplied by many operators that don't happen to resemble double quotes.

1.2.1.2 Pluralities

Some kinds of variables hold multiple values that are logically tied together. Perl has two types of multivalued variables: arrays and hashes. In many ways these behave like scalars. They spring into existence with nothing in them when needed. When you assign to them, they supply a list context to the right side of the assignment.

You'd use an array when you want to look something up by number. You'd use a hash when you want to look something up by name. The two concepts are complementary. You'll often see people using an array to translate month numbers into month names, and a corresponding hash to translate month names back into month numbers. (Though hashes aren't limited to holding only numbers. You could have a hash that translates month names to birthstone names, for instance.)

Arrays.

An array is an ordered list of scalars, accessed[9] by the scalar's position in the list. The list may contain numbers, or strings, or a mixture of both. (In fact, it could also contain references to other lists, but we'll get to that in Chapter 4, References and Nested Data Structures, when we're discussing multidimensional arrays.) To assign a list value to an array, you simply group the variables together (with a set of parentheses):

[9] Or keyed, or indexed, or subscripted, or looked up. Take your pick.

@home = ("couch", "chair", "table", "stove");

Conversely, if you use @home in a list context, such as on the right side of a list assignment, you get back out the same list you put in. So you could set four scalar variables from the array like this:

($potato, $lift, $tennis, $pipe) = @home;

These are called list assignments. They logically happen in parallel, so you can swap two variables by saying:

($alpha,$omega) = ($omega,$alpha);

As in C, arrays are zero-based, so while you would talk about the first through fourth elements of the array, you would get to them with subscripts 0 through 3.[10] Array subscripts are enclosed in square brackets [like this], so if you want to select an individual array element, you would refer to it as $home[n], where n is the subscript (one less than the element number) you want. See the example below. Since the element you are dealing with is a scalar, you always precede it with a $.

[10] If this seems odd to you, just think of the subscript as an offset, that is, the count of how many array elements come before it. Obviously, the first element doesn't have any elements before it, and so has an offset of 0. This is how computers think. (We think.)

If you want to assign to one array element at a time, you could write the earlier assignment as:

$home[0] = "couch";
$home[1] = "chair";
$home[2] = "table";
$home[3] = "stove";

Since arrays are ordered, there are various useful operations that you can do on them, such as the stack operations, push and pop. A stack is, after all, just an ordered list, with a beginning and an end. Especially an end. Perl regards the end of your list as the top of a stack. (Although most Perl programmers think of a list as horizontal, with the top of the stack on the right.)

Hashes.

A hash is an unordered set of scalars, accessed[11] by some string value that is associated with each scalar. For this reason hashes are often called "associative arrays". But that's too long for lazy typists to type, and we talk about them so often that we decided to name them something short and snappy.[12] The other reason we picked the name "hash" is to emphasize the fact that they're disordered. (They are, coincidentally, implemented internally using a hash-table lookup, which is why hashes are so fast, and stay so fast no matter how many values you put into them.) You can't push or pop a hash though, because it doesn't make sense. A hash has no beginning or end. Nevertheless, hashes are extremely powerful and useful. Until you start thinking in terms of hashes, you aren't really thinking in Perl.

[11] Or keyed, or indexed, or subscripted, or looked up. Take your pick.
[12] Presuming for the moment that we can classify any sort of hash as "snappy". Please pass the Tabasco.

Since the keys to a hash are not automatically implied by their position, you must supply the key as well as the value when populating a hash. You can still assign a list to it like an ordinary array, but each pair of items in the list will be interpreted as a key/value pair. Suppose you wanted to translate abbreviated day names to the corresponding full names. You could write the following list assignment.

%longday = ("Sun", "Sunday", "Mon", "Monday", "Tue", "Tuesday",
            "Wed", "Wednesday", "Thu", "Thursday", "Fri",
            "Friday", "Sat", "Saturday");

Because it is sometimes difficult to read a hash that is defined like this, Perl provides the => (equal sign, greater than) sequence as an alternative separator to the comma. Using this syntax (and some creative formatting), it is easier to see which strings are the keys, and which strings are the associated values.

%longday = (
    "Sun" => "Sunday",
    "Mon" => "Monday",
    "Tue" => "Tuesday",
    "Wed" => "Wednesday",
    "Thu" => "Thursday",
    "Fri" => "Friday",
    "Sat" => "Saturday",
);

Not only can you assign a list to a hash, as we did above, but if you use a hash in list context, it'll convert the hash back to a list of key/value pairs, in a weird order. This is occasionally useful. More often people extract a list of just the keys, using the (aptly named) keys function. The key list is also unordered, but can easily be sorted if desired, using the (aptly named) sort function. More on that later.

Because hashes are a fancy kind of array, you select an individual hash element by enclosing the key in braces. So, for example, if you want to find out the value associated with Wed in the hash above, you would use $longday{"Wed"}. Note again that you are dealing with a scalar value, so you use $, not %.

Linguistically, the relationship encoded in a hash is genitive or possessive, like the word "of" in English, or like "'s". The wife of Adam is Eve, so we write:

$wife{"Adam"} = "Eve";

1.2.2 Verbs

As is typical of your typical imperative computer language, many of the verbs in Perl are commands: they tell the Perl interpreter to do something. On the other hand, as is typical of a natural language, the meanings of Perl verbs tend to mush off in various directions, depending on the context. A statement starting with a verb is generally purely imperative, and evaluated entirely for its side effects. We often call these verbs procedures, especially when they're user-defined. A frequently seen command (in fact, you've seen it already) is the print command:

print "Adam's wife is ", $wife{'Adam'}, ".\n";

This has the side effect of producing the desired output.

But there are other "moods" besides the imperative mood. Some verbs are for asking questions, and are useful in conditional statements. Other verbs translate their input parameters into return values, just as a recipe tells you how to turn raw ingredients into something (hopefully) edible. We tend to call these verbs functions, in deference to generations of mathematicians who don't know what the word "functional" means in natural language.

An example of a built-in function would be the exponential function:

$e = exp(1);   # 2.718281828459, or thereabouts

But Perl doesn't make a hard distinction between procedures and functions. You'll find the terms used interchangeably. Verbs are also sometimes called subroutines (when user-defined) or operators (when built-in). But call them whatever you like - they all return a value, which may or may not be a meaningful value, which you may or may not choose to ignore.

As we go on, you'll see additional examples of how Perl behaves like a natural language. But there are other ways to look at Perl too. We've already sneakily introduced some notions from mathematical language, such as addition and subscripting, not to mention the exponential function. But Perl is also a control language, a glue language, a prototyping language, a text-processing language, a list-processing language, and an object-oriented language. Among other things.

But Perl is also just a plain old computer language. And that's how we'll look at it next.