# What is the theory of data?

### Data

Data is one of the Big Ideas of Computer Science. It is a Big Idea like money in Economics, energy in Physics, molecule in Chemistry, and sex in Biology. Data is the raison d’être of computation. Data and computation originate in the needs of science and engineering, and of financial accounting and administration. Measuring and counting, modelling and calculating, recording and archiving, are truly ancient activities.

Today, data is everywhere. It is collected in the scientific analysis of the universe and the body movements of sports people, in the files of the secret service and the hospital, in the customer records of the supermarket, in the pages distributed across the internet, in the videos of our streets from cctv cameras and satellites.

Moreover, the conceptual unity of all this data is made tangible as data becomes digital. It is common to listen to digital music, view digital photographs, televison and films. Indeed it is common to have a computer and create digital artifiacts oneself.
Digital data is just data that can be represented by finite strings, normally of 0s and 1s. Digital data is discrete, rather than continuous. Digital data is what people think of when they think of computers and is at the heart of Computer Science.

Because digital data is everywhere so Computer Science is everywhere.

However, at this point let me say that I think a central problem for Computer Science is to embrace analogue data in its theories, methods and tools and, simply, unify the computational theories of analogue and digital data.

### An Algebraic Theory of Data

Since the 1970s, I have been interested in the development of a general theory of data. By this I mean a mathematical theory that will help us
• to model and analyse data in any system or situation,
• to specify, implement and reason when computing and communicating with data
• to understand the scope and limits of data  in computation.
Traditionally, such a mathematical theory would find applications in programming technology and software design methods. But to these I would add algorithmic modelling and simulation in the physical and other sciences.

To me the subject has all the makings of a classic theoretical science with a “shelf life” of centuries. What does the mature form of such a "classic theoretical science" look like? Imagine
• a set of twenty (say) theoretical ideas,
• mathematical techniques for modelling,
• a rich list of standard methods for analysis and algorithms,
• deep and surprising theorems,
• classic case studies,
• a wide range of seemingly remote and and surprising applications
and the whole forming something that can be taught and, ultimately, must be taught to all students -  along with automata, grammars, propositional logic, first-order logic, ...

Acquiring and archiving data is already a fundamental process that is set to come of age and become an industry of its own.   Are we becoming aware of just how important the new paradigm data-centric computing is, both intellectually and technologically. I expect a significant increase in effort and hopefully understanding of the theory of data. However, I rather hope this does not produce too much frenetic research activity. Some bright dedicated people prepared to spend few decades on the subject is what is needed, of course. I don't want my scientific space invaded by aliens just now.

Many sorted algebras, equational specifications and term rewriting were introduced into programming theory and software design by Jim Thatcher, Eric Wagner and Joe Goguen, among others, in the 1970s. Today, the theory of abstract data types is an enormous field aimed at questions and problems in the foundations of programming technology and the practice of software engineering.
Throughout the period I have seen the steady progress of the central theory, its applications and software products. But it has not been quick or popular by the standards of computing research and, thankfully, for people like me, it has always been too difficult to be a fad.

### My Research Programme

1. To model and analyse the structure of all forms of data types using classes of many sorted algebras;
2. To study their use in syntax of programming and specification languages, in hardware design, in dynamical systems, and in graphics.
3. To analyse hierarchical structure;
4. To determine the scope and limits of specification and computation using various computability theories.
I have concentrated on what I think are core problems in the theory of abstract data types concerning discrete and continuous data and their representation, equational specification, term rewriting, and computability.

### My Research

Computability is also my speciality. Together with my collaborators, Jan Bergstra, Viggo Stoltenberg-Hansen and Jeff Zucker, I have developed and studied many models of computation - some abstract, some based on concrete representations - and used them to analyse computations, specifications and verifications with a huge range of data types.

For the last decade I have been concentrating on continuous data, rather than discrete data, developing a computability theory of topological algebras and its applications. Most of my work is theoretical, but I have used algebraic and logical methods and computability theory to analyse practical problems in the design of:

Hardware systems:
• microprocessor verification with Neal Harman;
• special purpose hardware systems with Ben Thompson, Keith Hobley and Steve Eker;
• synchronous concurrent algorithms with Ben Thompson, Karl Meinke, Andy Martin, Keith Hobley, Matthew Poole, and  Brain McConnell;
Software systems:
• program verification with Jan Bergstra and J I Zucker;
• volume graphics with Min Chen;
• programming language definition and compilation with Karen Stephenson;
• interfaces with Karen Stephenson and Dafydd Rees;
• libraries with Markus Roggenbach;
Physical systems:
• simulation of excitable media and cardiac tissue, and integrative whole heart models with A V Holden, Matthew Poole and Min Chen;
• mechanical systems with Edwin Beggs;

I have also worked on mathematical applications, mainly in algebra.

See my classified list of books and papers.

Note the various Handbook chapters I have written that provide an almost comprehensive introduction to my theoretical interests.