Computer languages - Setting the stage

After commenting (https://www.til-technology.com/post/d%C3%A9j%C3%A0-vu) on how useful my prior familiarity with the C programming language was in learning PHP, and the fact that C has influenced so many subsequent languages, I kept thinking about the history and evolution of various programming languages.

But wait, you might say, here’s a family tree of programming languages:

... and it’s not THAT complicated, is it?

Yes, well, that’s a simplified version.

Here’s something a bit more realistic, but still far from complete.

(Please see https://hopl.info/ for an even more complex analysis and graph, though this site appears to only go as far as 2004 and may not be actively maintained.)

In any case, the point is that programming languages evolve over time, in complex ways which appear to be analogous to human languages.

But why? And how?

To begin, we need to define some terms and a framework for understanding.

When we talk about “programming” in its most basic sense (https://en.wikipedia.org/wiki/Computer_programming), we’re usually describing a fairly broad group of methods for recording the instructions needed “to accomplish a specific computing result or to perform a specific task”. Early “programmable devices” included music boxes, “player pianos” (https://en.wikipedia.org/wiki/Player_piano), and the “Jacquard loom” (https://en.wikipedia.org/wiki/Jacquard_machine), all of which used cards or plates with data physically encoded in some way, such as by holes punched in paper or metal.

The first “computer program” is generally credited to Ada Lovelace (https://en.wikipedia.org/wiki/Ada_Lovelace) who, in the 1840s, published an algorithm to calculate Bernoulli numbers. It was intended to be “run” by Charles Babbage’s Analytical Engine (https://en.wikipedia.org/wiki/Analytical_Engine), a “general purpose” mechanical computer which was proposed but never built. (Another fascinating rabbit-hole in the evolution of computer technology, possibly for future investigation.)

With the advent of electronic computers a century later, we start to get into what most people would recognize as “computer programming”.

Without getting into the details of how processors work and how they are designed, programming languages provide us with the instructions we need in order to get computers to perform various tasks, and can be broken into “low-level” and “high-level” languages.

“Low-level” computer languages (https://en.wikipedia.org/wiki/Low-level_programming_language) include so-called “machine code”, which is the only language the computer can directly process, and “assembly language”. An assembly language is one which corresponds with an actual processor architecture and which allows(more or less) direct translation to machine code.

#TIL about Kathleen Booth (https://en.wikipedia.org/wiki/Kathleen_Booth), who wrote the first assembly language, designed the assembler for the first computer systems atBirkbeck College, University of London, and helped design three different computers.

In the early days of computers, assembly languages were widely-used, but the advent of “higher-level” languages dramatically reduced their popularity over time. There are a number of things that contributed to this change, but one of the biggest is that most computer languages are designed to be generic and not dependent on a specific hardware configuration, so the code can be run on a wide variety of machines.

For illustration / comparison, below are some examples of a function to calculate Fibonacci numbers (ie, the sequence of numbers where each number is the sum of the two preceding ones: 0, 1, 1, 2, 3, 5, 8...) - all examples from Wikipedia (https://en.wikipedia.org/wiki/Low-level_programming_language):

x86 machine code

NOTE: In general, this will only run on an Intel 8086 or related processor.

x86-64 assembly language

NOTE: In general, this will only run on an Intel 8086 or related processor.

C programming language

NOTE: This will run pretty-much anywhere

There has long been debate on the usefulness of assembly languages, as compared with higher-level languages. In the past, there might have been significant performance benefits to writing certain code in assembly language, but modern high-level languages have evolved to the point where there is no practical difference in most situations.

Nowadays, assembly languages are used mainly when a developer wants to interact with the processor directly, which is usually not necessary for most developers. According to the TIOBE index (https://www.tiobe.com/tiobe-index/), assembly languages represent approximately 2% of the total lines of code on the public internet. (I see a number of challenges in how this sort of thing can be measured, so I would consider this a very rough approximation, rather than definitive in any way - possibly a topic for future investigation.)

So, taking stock, we have defined programming as a set of methods for recording the instructions needed “to accomplish a specific computing result or to perform a specific task”, described “machine code” as the actual instructions executed by the computer, discussed the concept of “assembly languages” as a way for humans to conveniently interact directly with a given processor, and touched on the idea of higher-level languages which allow us define instructions which are not directly dependent on the details of where they are run.

You might have noted that, aside from C, I haven’t yet discussed any of the languages on any of the charts above. That’s actually part of the point – for those charts to make any sense, we need to understand that there is a whole framework on which those other languages were built.

Now that we have reached higher-level programming languages that “abstract away” many of the details of the processor architecture, we start getting into variations which depend less on the technology and more on the humans – ie, the languages in the charts above.

I’ve avoided the term so far, but wanted to end this post with a comment on the term “generation”, since most computer science courses include some discussion of them.

Wikipedia (https://en.wikipedia.org/wiki/Programming_language_generations) describes “first-generation” (1GL) languages as machine-level programming languages used to program “first-generation computers”. Setting aside the somewhat circular definition, “second-generation” (2GL) languages, are therefore assembly languages which are heavily dependent on the architecture of the computer. It should be noted that these terms were not used until the term “third-generation” language (3GL) was coined.

3GL is rather vaguely-defined as being more machine-independent and more programmer-friendly, while “fourth-generation” (4GL) is even more vaguely described as an advancement on 3GL, or specialized languages such as database query languages.

“Fifth-generation” (5GL) is defined as a programming language based on problem-solving constraints given to the program, rather than on an algorithm defined by a programmer. These languages are often associated with artificial intelligence research, and don’t appear to be widely used by most programmers.

So, aside from the fact that these categories are often very vaguely-defined, and are often defined differently by different researchers, it appears that most of the commonly-used languages fall into the 3GL category, and many have 4GL and/or 5GL features as additions or expansions. Add to that the fact that vendors will use these terms with reckless abandon, and you can see why I prefer to avoid the term as much as possible.

In future posts, I plan to look into common programming languages and their histories in a bit more detail, and see how their evolution relates to that of human languages.

Cheers!

“In science, 'fact' can only mean 'confirmed to such a degree that it would be perverse to withhold provisional assent.' I suppose that apples might start to rise tomorrow, but the possibility does not merit equal time in physics classrooms."

Stephen Jay Gould

Today I Learned

Computer languages - Setting the stage

Recent Posts

Yorumlar