Поделиться через


What is this thing you call a "type"? Part one

(Eric is out camping; this posting is prerecorded. I'll be back in the office after Labour Day.)

The word "type" appears almost five thousand times in the C# 4 specification, and there is an entire chapter, chapter 4, dedicated to nothing but describing types. We start the specification by noting that C# is "type safe" and has "a unified type system" (*). We say that programs "declare" types, and that declared types can be organized by namespace. Clearly types are incredibly important to the design of C#, and incredibly important to C# programmers, so it is a more than a little bit surprising that nowhere in the eight hundred pages of the specification do we ever actually define the word "type".

We sort of assume that the developer reading the specification already has a working understanding of what a "type" is; the spec does not aim to be either a beginner programming tutorial or a mathematically precise formal language description. But if you ask ten working line-of-business developers for a formal definition of "type", you might get ten different answers. So let's consider today the question: what exactly is this thing you call a "type"?

A common answer to that question is that a type consists of:

* A name
* A set (possibly infinite) of values

And possibly also:

* A finite list of rules for associating values not in the type with values in the type (that is, coercions; 123.45 is not a member of the integer type, but it can be coerced to 123.)

Though that is not a terrible definition as a first attempt, it runs into some pretty nasty problems when you look at it more deeply.

The first problem is that of course not every type needs to have a name; C# 3 has anonymous types which have no names by definition. Is "string[][]" the name of a type? What about "List<string[][]>" -- does that name a type? Is the name of the string type "string" or "String", or "System.String", or "global::System.String", or all four? Does a type's name change depending on where in the source code you are looking?

This gets to be a bit of a mess. I prefer to think of types as logically not having names at all. Program fragments "12" and "10 + 2" and "3 * 4" and "0x0C" are not names for the number 12, they are expressions which happen to all evaluate to the number 12. That number is just a number; how you choose to notate it is a fact about your notational system, not a fact about the number itself. Similarly for types; the program fragment "List<string[][]>" might, in its context, evaluate to refer to a particular type, but that type need not have that fragment as its name. It has no name.

The concept of a "set of values" is also problematic, particularly if those sets are potentially infinite "mathematical" sets.

To start with, suppose you have a value: how do you determine what its type is? There are infinitely many sets that can contain that value, and therefore infinitely many types that the value can be! And indeed, if the string "hello" is a member of the string type, clearly it is also a member of the object type. How are we to determine what "the" type of a value is?

Things get even more brain-destroying when you think, oh, I know, I'll just say that every type's set of values is defined by a "predicate" that tells me whether a given value is in the type or not. That seems very plausible, but then you have to think about questions like "are types themselves values?" If so, then "the type whose values are all the types that are not members of themself" is a valid predicate, and hey, we've got Russell's Paradox all over again. (**)

Moreover, the idea of a type being defined by a predicate that identifies whether a given value is of that type or not is quite dissimilar to how we typically conceive of types in a C# program. If it were, then we could have "predicate types" like "x is an even integer" or "x is not a prime number" or "x is a string that is a legal VBScript program" or whatever. We don't have types like that in the C# type system.

So if a type is not a name, and a type is not a set of values, and a type is not a predicate that determines membership, what is a type? We seem to be getting no closer to a definition.

Next time: I'll try to come up with a more workable definition of what "type" means to both compiler writers and developers.

------------

(*) An unfortunate half-truth, as pointer types are not unified into the type system in the version of the C# language which includes the "unsafe" subset of functionality.

(**) When Russell discovered that his paradox undermined the entire arithmetic theory of Frege, he and Whitehead set about inventing what is now known as the "ramified theory of types", an almost impossible-to-understand theory that is immune to these sorts of paradoxes. Though the mathematical underpinnings of type theory/set theory/category theory/etc are fascinating, I do not understand them nearly well enough to go into detail here. Remember, what we're looking for is a definition of "type" that is amenable to doing practical work in a modern programming language.

Comments

  • Anonymous
    August 28, 2011
    I think that in the scope of modeling an application into the programing language, a type means some kind of creation which the programmer  makes.

  • Anonymous
    August 29, 2011
    that's some integer coercion that turns 12.345 into 123. =)

  • Anonymous
    August 29, 2011
    Defining a type as a set of values is not a problem because you couldn't ask what +the+ type of an expression is: after all, indeed, due to subclassing, values may have multiple types. The subtype is just a (smaller) set of values that all belong to the set of values of the base type. An important property of types not mentioned yet is that it describes a way to shape new objects - although that doesn't do for a definition either (since there are types that you cannot create new objects from, e.g. static types)

  • Anonymous
    August 29, 2011
    @yie, the type of 12.345 is DecaDouble. You have to assume that Eric is right... :-)

  • Anonymous
    August 29, 2011
    As an aside to people interested in it, there are in fact languages where things like "x is an even integer" or "x is a string that is a legal VBScript program" are types. This is a consequence of the Curry-Howard correspondence (which actually says that proof systems and programs can be expressed as each other, not types, so I'm glossing over some very necessary details for the full picture). Definitely worth checking out if mathematics don't scare you too much (there's lots of words and not many numbers in formal logic, so don't worry). Not the approach taken by mainstream languages like C#, obviously.

  • Anonymous
    August 29, 2011
    @Jasper Actually, in the .NET/C# type system, for any given value, there is a type that is member specific (i.e., a subset of) all other types that the value or expression is a member of.  That is what is usually what is referred to when one refers to "the" type of a value or expression.  Simply calling a type a set of values is unsufficient, as not all sets of values are types. Instead, I would define a type using something along the lines of "A logical construct that defines a set of possible values and what operations can be performed on those set of values."

  • Anonymous
    August 29, 2011
    Hmm... Types are a tool programmers use to structure data and the valid operations on the data. Defining a type as a set of valid values is too academic to be useful in implementing a compiler. A type is metadata, the actual form of which necessarily depends on the features of the programming language.

  • Anonymous
    August 29, 2011
    Another problem with using the "set of values" definition is structurally identical compound types. Imagine you have two classes, X and Y, whose definition are identical. For example, a class Height and class Width, which are basically wrappers over an Int (single public Int field named "value," and no methods). Should a Height with value 10 be considered the same type as a Width with value 10? The set of values that they take are identical. But logically, they are different classes, and you want your type system to know the difference between Heights and Widths. After all, the main benefit of static type systems is checking program correctness. But in order to do so, you have to take two identical sets of values and somehow separate them.

  • Anonymous
    August 29, 2011
    my best shot: a type is a blueprint for values or references. pointer types seem to be out of this category.

  • Anonymous
    August 29, 2011
    @Logan Gordon: Just because Width and Height have the same structure, it doesn't mean they describe the same set of values. Surely an instance of Height with value=10 is not just of a different type from an instance of Width with value=10, but actually an entirely distinct value. It might have the same in-memory representation, but that's surely an unimportant implementation detail.

  • Anonymous
    August 29, 2011
    A Type is a set of rules for how to tread a region of memory.

  • Anonymous
    August 29, 2011
    I bet it's really called a "Type" because "Class" and "Structure" were already taken, it was late at night, it made sense and it sounded good.

  • Anonymous
    August 29, 2011
    Of course, the definition of a type is dependant on context - I think of types in the CLR and types in C++ and types in Haskell as completely different things, and they are all programming language types! For CLR languages, the obvious answer is: A type is: An optional name, an optional namespace, an accessability, the assembly in which it is defined, an optional type from which it inherits, an optional set of types which it implements, an optional list of array dimensions, an optional type for which this is a pointer to, an optional list of generic arguments and a set of members (did I miss anything?). Obviously, that isn't terribly useful a definition for either semantics or implementation, but there you go :) Semanticaly, a type is the association of a set of values with operations on them: 3 may be a value that is in the types byte, ushort, int, double, BigInteger, (in some languages) string, etc..., but without the association with a type (even if it's just object), you can't actually do anything with it - except perhaps, depending on your interpretation of when a value's existance begins, create a new instance of a type from it. Interestingly, this answers why expressions with the 'same' operations on the same values can have different results: eg 6 / 4 gives 1.5 with double division, but 1 with int division, and 100 * 2 gives 200 with byte multiplication, but -56 with sbyte multiplication. I prefer splitting the type from it's definition, this simplifies anonymous types (a type without any definitions) and typedef/type forwarding (a type with multiple definitions).

  • Anonymous
    August 29, 2011
    I don't find it hard to believe that the "set of values" approach to defining types is not a total description of what we understand by "type", but I'm not yet convinced by Eric's arguments against it. 'suppose you have a value: how do you determine what its type is?' This seems a strange question to ask. What is it to "have a value"? Are we talking about a person looking at marks on a page? A compiler looking at expressions in a program text? A CPU looking at bytes in RAM? In most of the cases we care about, I would have thought it is most unusual to have a value without knowing a type, because in order to have a value you must have had some sort of symbol, be it scribbles on paper or bytes in RAM, and some system for interpreting such symbols into some possible set of conceptual values, and so we already have at least one type that we know the value belongs to. Furthermore, why do we want to determine a single type for the value "hello"? Surely the only reason a type is useful is when we need to work with a value that we don't entirely know in advance, but about which we know something. The type is the information that we have that constricts our unknown value from the set of all conceivable ideas to something useful which can be meaningfully processed. It seems more important to worry about the relationships between the types we care about than to fret over the difficulty of saying anything about the type of an arbitrary value from the set of all possible values.

  • Anonymous
    August 29, 2011
    When I was learning how to program I had a hard time in creating a mental modelof what types were. I finally came up with something simple, when I reversed the logic. I started by thinking what a program actually is to the computer. *) What is a program?   It's a block of memory containing instructions for the cpu. *) What is an (object) instance?   It's a block of memmory containing data those instructions should operate on. *) Since both can occupy (physically, though not at the same time) the same memory address, you need to tell the difference. *) Then what is a type?   It's a block of memory that describes (definitiion) how another block of memory is to be interpreted.   You can read a block of memory with the wrongtype, but that will (moslty) give you garbage. This model has worked fine by me for years now. There is some recursion going on here, (what describes the type?) but that is ok, you need a compiler to build a compiler too. (sorry if this is a double post, butfirst attempt was not accepted after 10 minutes, which weems a bit too long)

  • Anonymous
    August 29, 2011
    The comment has been removed

  • Anonymous
    August 30, 2011
    This has changed so much from the NT daze. Eric, what are your top five fav blogs that you follow? Just curious.  :O)

  • Anonymous
    August 30, 2011
    @JM: Scribbles on paper and bytes in RAM were not given as examples of values, but of symbols. What I was saying is that in order to have a value at all you must first have some context that tells you that those four bytes are an unsigned integer or a single precision float or four ASCII characters or whatever else. If you don't have the context then you don't really have the value. "Good luck writing a compiler that doesn't fret about what type a value is" Oh dear, that is not what I meant at all. I said that you can't have a value without some kind of context. What I was trying to say is that it that it doesn't seem particularly important to try to define (say) some function that takes an utterly arbitrary value and tells you its type because it's the context that tells you what other values it could have had, not the value itself. When talking about the value "hello", perhaps the context tells you that it could have been any sequence of characters (i.e., any string). The useful questions are then whether this type - the set of all strings - is a subset of some other type we are interested in, say the set of all objects, or whether a string can be coerced to another type, and so on. That's what I mean by "the relationships between the types we care about". I don't think anything I said precludes any particular compiler/interpreter implementations. I'm just talking about the question at hand - what is a type; and Eric's reasons for why a "set of possible values" is not a suitable definition of a type. I'd have considered this to be a relatively abstract question about the principles behind programming languages, not a specific mechanism for implementing a compiler or interpreter.

  • Anonymous
    August 30, 2011
    How about this - a type T is:

  1. A set of operations that can create distinct values ("instances of T").
  2. A set of operations that can be performed on instances of T. There's some subtlety in the meaning of the word "distinct" in that first paragraph that I am not quite sure how to get across - something to the effect that the values thus created must be tagged with the identity of T so that other types cannot ever produce the same values. But I think overall this kind of definition applies as a working definition of how the concept of a Type is used in the CLR.
  • Anonymous
    August 30, 2011
    A type is union of:  - data storage specification (at least size);  - agreement about meaning of that data (e.g. "this is character").

  • Anonymous
    August 30, 2011
    ... and optional set of operations defined for this data.

  • Anonymous
    August 30, 2011
    The comment has been removed

  • Anonymous
    August 30, 2011
    (this is a retry, my first attempt seems to be munched, sorry if it shows up doubly after all!) @JM: Indeed there are more sophisticated type systems where you could define types like 'a string representing a valid VB program' - but either you'll have to pack more info with the string to have it be a value of that type (namely data proving that it is, in fact, a representation of a valid VB program) or you'll have to be content with your type judgement being undecidable, i.e. not being able always to tell whether value x belongs to type y or not. An undesirable trait, IMO, especially for compilers wanting to give guarantees of correctness.

  • Anonymous
    August 31, 2011
    I think the type is a combination of rules we put together. A type can be some sort of imaginary we make in mind when we see a value. I agree that the value is should have a type but this "type" is some how semi hidden

  • Anonymous
    August 31, 2011
    My try on the meaning of type without reading through the article: A type is implied set of rules for natrual interpretation and manipulation of data.

  • Anonymous
    August 31, 2011
    @Amin: Value may not necessarily be associated with types if our defination is not bounded to .NET realm. See C++ macro defined values for example, also a value referenced by untyped pointer for another example.

  • Anonymous
    August 31, 2011
    An interesting question, 'are type values themselves?'. Certainly there is the type Type, and the operator typeof() and method object.GetType() yield values in that type. But one cannot say e.g. Type mytype = condition() ? String : Int64; -or- String.GetType(). So there is a difference between the (compile-time) type indications we use to type variables with or use the new operator on, and the (run-time) Type-valued objects representing these types. That said, by reflection we can use the latter to create new instances with, so maybe types are values themselves after all. Russell's paradox is avoided since values seem (implicitly) to have a reference to their types. Defining a new subtype B of A that doesn't extend A in any way doesn't suddenly yield all A-typed objects to be B-objects as well - this in contrast to a set-theoretic view when types are just sets of values. (Hope this post doesn't get lost on the way - chances are 50-50 up to now :-S Nimrand, I tried to reply to you twice, but no luck)

  • Anonymous
    August 31, 2011
    @cheong00 : What I mean is not bounded to a special programming language. Think about my word in real life examples. I think what you say is something we call 'feature' of a programming language or 'specification' or 'model of creation' of a language , but if we don't bound ourselves to some programming platform and focus on the deep meaning of the 'Type' we come to the common rules for  natrual interpretation and manipulation of data as you said yourself. And another interesting thing I think of : sometimes values can make new types... what do you think ?

  • Anonymous
    September 01, 2011
    >> To start with, suppose you have a value: how do you determine what its type is? I think this counterpoint is subtly wrong, for the simple reason that it makes certain assumptions about what a "value" is - and surely, if we're defining "types" in terms of "values", we should first define what a "value" is with similar rigor? On the other hand, if "value" itself is defined as data tagged with a type - which resembles more what we actually see in CLR - then there's no contradiction. A type then is, indeed, just a predicate, and every value is tagged with one (and only one) type, which is how we can determine  the "actual" type of any value.

  • Anonymous
    September 01, 2011
    @Amin: For values that makes new type, that doesn't make many difference if you're using my defination. You still have to obey the function contract (memory size, properties, methods) provided by the new type, and provide a way to interprete the value it holds. Actually I'll agree my defination is a bit dull and not stimulating imagination, just that it helps to fit all essential parts to think about while talking about the term "type" when we talk about programming.

  • Anonymous
    September 02, 2011
    Not that difficult really, something like, "A logical, hierarcichal, and tractable classification of values that can be bound to a function."

  • Anonymous
    September 09, 2011
    > but then you have to think about questions like "are types themselves values?" If so, then "the type whose values are all the types that are not members of themself" is a valid predicate, and hey, we've got Russell's Paradox all over again. (**) Isn't that exactly the difference between naive set theory and type theory? Things that classify values aren't values themselves. Instead you have different levels and only entities from lower levels can be used to compose an entity of a higher level, thus preventing loops and avoiding Russell's Paradox. Going a bit beyond C# and looking at Scala or Haskell you can see it better: "values" are classified by "types" and "types" are classified by "kinds". So you already have three layers. Ωmega (an academic language) even has an infinite count of layers ("kinds" are classified by "sorts", ...).

  • Anonymous
    September 16, 2011
    Why is Russell's Paradox a "Bad Thing"? It makes me think of the man who goes to doctor complaining about how it hurt if he poked himself.. just don't define a "stupid predicate" like that and you wouldn't run into the paradox. Sure that isn't the most elegant solution, but programming is already full of "so just don't do that"'s.

  • Anonymous
    October 25, 2011
    Consider the tuple (x=2, y=4). What function does it belong to? Well, 22 = 4, so it might belong to the function y = 2x. But 2² = 4 as well, so it equally belongs to the function y = x². So how do you determine what the function of (2, 4) is? I think you’ll agree this question makes no sense. There is no reason that a tuple must belong to a single function and no other. In your post the rhetorical question “how do you determine what its type is?” is on the same shelf. I do indeed conceptualise types as sets of possible values, i.e. your option #2, and the fact that a value can belong to any number of conceivable types does not pose a problem to me.