Background and general introduction

With the exception of recursion,
all the concepts in these lecture notes are presented in Aristotelian order,
progressing from known things towards unknown things.
There are no hidden forward references, and no rabbits are pulled out of hats,
a questionable strategy that is known to make hats proliferate at a Fibonacci rate.

—Mimer

What’s in a name?

In the American English term “Computer Science”, the emphasis is on the computer, from its architecture all the way to its use (i.e., on Aristotle’s formal cause). However, as Edsger Dijkstra allegedly put it once, Computer Science is no more [only] about computers than astronomy is [only] about telescopes, and indeed one often sees the term “Computing Science” instead, where the emphasis is on the processing of data on computers (i.e., on Aristotle’s efficient cause). In Denmark, Peter Naur proposed the terms “datalogi” and “Data Science”, where the emphasis is on the representation of information as data on computers for processing (i.e., on Aristotle’s material cause). In Germany, Karl Steinbuch proposed the term “informatik” in 1956, and in France, Philippe Dreyfus proposed the term “informatique” in 1962, terms that are rendered in English as “informatics”, where the emphasis is on the automated treatment of information by processing data on computers (i.e., on Aristotle’s final cause).

To sum up, data are representations of information in a computer, and these representations are processed to carry out a computation. And while the term “Computer Science” is the accepted one, alternative terms exist. For example,

Information and data

Data represent information:

_images/ditaa-63d2e0ed07b3b625d560a0360bf9c2edaabaf19b.png

Data are built using data constructors:

  • A representation is sound when using constructors correctly always yields a piece of data that correctly represents the information it is meant to represent. (Otherwise this representation is unsound.)
  • A representation is complete when all the information we wish to represent can be represented using the data constructors. (Otherwise this representation is incomplete.)

For example, in OCaml (the programming language of discourse for Intro to CS), the default representation of integers has a fixed size, and so only integers between, e.g., -4611686018427387904 and 4611686018427387903 can be represented. This representation is sound in that the representation of any integer between -4611686018427387904 and 4611686018427387903 does correspond to a mathematical integer, but it is incomplete since integers smaller than -4611686018427387904 or greater than 4611686018427387903 cannot be represented.

For another example, in Chez Scheme, the default representation of integers does not have a fixed size: this representation is complete... up to the size of the memory of our computer.

Exercise 02

  1. Install Chez Scheme.

  2. Start Chez Scheme from an Emacs terminal (M-x shell):

    Chez Scheme Version 9.5.1
    Copyright 1984-2017 Cisco Systems, Inc.
    
    >
    
  3. Compute successive exponents of 2 by evaluating (expt 2 0), (expt 2 1), (expt 2 2), (expt 2 3), (expt 2 10), (expt 2 100), (expt 2 1000), (expt 2 10000), etc.:

    > (expt 2 0)
    1
    > (expt 2 1)
    2
    > (expt 2 2)
    4
    > (expt 2 3)
    8
    > (expt 2 10)
    1024
    > (expt 2 100)
    1267650600228229401496703205376
    > (expt 2 200)
    1606938044258990275541962092341162602522202993782792835301376
    > (expt 2 300)
    2037035976334486086268445688409378161051468393665936250636140449354381299763336706183397376
    > (expt 2 500)
    ...
    > (expt 2 1000)
    ...
    >
    
  4. Keep increasing the exponent until the result contains too many digits to fit in the screen.

  5. Go back to the beginning of the buffer (ESC <), resize Emacs’s fonts (C-x - and then C-x - - - -), and contemplate the exponential growth of these successive integers.

  6. Resize Emacs’s fonts back to normal (C-x + 0), go to the end of the buffer (ESC >), and visualize several exponentiations not of 2, but of 10:

    > (expt 10 0)
    ...
    > (expt 10 1)
    ...
    > (expt 10 2)
    ...
    > (expt 10 3)
    ...
    > (expt 10 10)
    ...
    > (expt 10 100)
    ...
    >
    

    Briefly analyze these successive results.

  7. Food for thought:

    If Chez Scheme were printing integers in base 2 instead of in base 10, what would be the result of evaluating (expt 2 0), (expt 2 1), (expt 2 2), (expt 2 3), etc.?

    N.B.: As it happens, the ~b option of the resident string-formatting procedure format provides this conversion in base 2:

    > (format "~b" (expt 2 0))
    ...
    > (format "~b" (expt 2 1))
    ...
    > (format "~b" (expt 2 2))
    ...
    > (format "~b" (expt 2 3))
    ...
    >
    

Epistemology of representation

In a nutshell, a representation and what this representation represents are two distinct things:

  • a name is not what it denotes (e.g., we are not our name);
  • the address of one’s home is not one’s home;
  • a musical note is not its name (e.g., because musical notes have several names: do, ré, mi, fa, sol, etc. or A, B, C, D, etc.);
  • etc.
Harald: So a number is not its name?
Mimer: Right. A number is not its name.
Harald: So much for the multiplication tables.
Mimer (surprised): The multiplication tables?
Harald: Yes. We had to learn them by heart.
Mimer: Well, they are still valid any way we look at them.
Harald: Yes, but suddenly they are less simple.
Mimer: Why? They also work in another language than the one you grew up with.
Harald: Harrumph.

Several of René Magritte‘s paintings illustrate this point, e.g., The Treachery of Images where the painting of a pipe is accompanied by the words “Ceci n’est pas une pipe” (“This is not a pipe”), a point Walter Jon Williams developed eloquently in his novel This is not a game.

Computing as data processing

A computer processes data. Given some input data that represent input information, it can:

  • produce output data that represent output information,
  • issue an error message (which is also a piece of output data), or
  • loop, i.e., not produce any output data.

A “batch” computation is carried out by representing input information as input data and processing this input data in a way that produces output data that represents output information:

_images/ditaa-2cca4a52cc757fb385d3ad40842c83984cb744cf.png

There are of course other forms of computations than batch ones, e.g., reactive ones that process events (e.g., a key being pressed on a keyboard or the movement of a mouse) as these events occur, but batch computations are a good starting point:

  • If the output data of a batch computation always represent the expected output information, then this batch computation is said to be sound.
  • A batch computation is complete if the expected output information is always the one represented by the output data.

Programs

A program is the recipe the computer follows to process data: a program computes a function from input information to output information. A program is written by a programmer.

Here is a comparison. A cooking recipe is a notation that conveys how to cook something. It specifies data (the ingredients, e.g., eggs, butter, salt, pepper, thyme), resources and tools (e.g., a stove and a pan), and an algorithm (a method to operate on the data, e.g., to beat the eggs towards cooking an omelet). To make a dish, a cook uses the kitchen resources and tools to operate over the ingredients according to the algorithm.

Programming languages

A programming language is the language in which a program is written: a notation to express computations. There are many notions of computation (i.e., ways of representing information and ways of processing data), and for each of these notions, there are many notations (i.e., many programming languages).

What we need are notions, not notations.

Carl-Friedrich Gauss (quoted by Paul Lockhart in his Mathematician’s Lament)

General summary

  • Data Science (material cause):

    As put forward by Peter Naur, the topic here is centered on data that represent information.

  • Computing Science (efficient cause):

    The topic here is centered on data processing, i.e., computing.

  • Computer Science (formal cause):

    The topic here is the methodology of computing, i.e., how to carry out computations.

  • Informatics (final cause):

    As put forward by Karl Steinbuch and Philippe Dreyfus, the topic here is the automated treatment of information.

Postlude

Loki: It would indeed be nonsensical to refer to Astronomy as “Telescope Science”.

Mimer: Glad you agree.

Loki: By the token above, at the beginning of this lecture note, Astronomy should be referred to as “Telescoping Science”.

Mimer (who didn’t see this one coming): Telescoping Science.

Loki: Or better, on the ground that Computer Science is about data (cf. Data Science), Astronomy should be called Star Science, based on its etymology.

Mimer (recovering): You know what, Loki, you are right. And by that logic, top astronomers would be nominated to the Star Academy. It would make so much more science, I mean sense.

Loki: Glad you agree. Next step: meteorology, the science of meteors.

Halcyon (to himself): Another invitation for critical reading, methinks.

Data Science in the 21st century

Nowadays, the term “Data Science” has taken a new meaning. This meaning is captured in the following humorous riddle, heard while attending a workshop at the NUS Department of Statistics and Data Science in 2019:

Question: What is the difference between a data scientist and a statistician?

Answer: A data scientist is a useful statistician.

Exercise 03

The goal of this exercise is to pursue the analogy between programs and cooking recipes.

  • The data are the ingredients.
  • The computer at large is the kitchen.
  • The processor, in the computer, is the cook in the kitchen.
  • The program is the cooking recipe.

What is the cooking analogue of a programmer?

Exercise 04

The goal of this exercise is to establish an analogy between computing and knitting.

  • The data are the yarns.
  • The computer at large is the work environment of the person who is knitting, including, e.g., knitting needles.
  • The processor, in the computer, is the person who is knitting.

What is the knitting analogue of a program?

What is the knitting analogue of a programmer?

Exercise 05

The goal of this exercise is to establish an analogy between computing and making music.

  • The data are the sounds.
  • The computer at large is anything that can make a sound.

What is the musical analogue of the processor, in the computer?

What is the musical analogue of a program (improvisation notwithstanding)?

What is the musical analogue of a programmer?

Version

Mentioned the format procedure in Exercise 02 [21 Apr 2022]

Added Exercise 02 [20 Apr 2022]

Added the executive summary and the exercises [13 Mar 2022]

Fine-tuned [15 Jan 2022]

Created [14 Jan 2022]