May 29, 2011

## More on Channel Theory

In my last post I introduced a couple of concepts from the channel theory of Jeremy Seligman and Jon Barwise. In this post I would like to continue that introduction.

To review, channel theory is intended to help us understand information flows of the following sort: a‘s being F carries the information that b is G. For example, we might want a general framework in which understand how a piece of fruit’s bitterness may carry the information that it is toxic, or how a mountain side’s having a particular distribution of flora can carry information about the local micro-climate, or how a war leader’s generous gift-giving may carry information about the success of a recent campaign, or the sighting of a gull can carry the information that land is near. In a previous post, we asked how position of various participants in a fono might forecast information about the political events of the day. One would hope that such a framework may even illuminate how an incident in which a person gets sick and dies may be perceived to carry the information that there is a sorcerer who is responsible for this misfortune.

In my last post, I introduced a simple sort of data structure called a classification. A classification simply links particulars to types. But as my examples above were intended to show, classifications are not only intended to model  ‘categorical’ data, as usually construed.

Def 1. A classification is a triple A = $\langle tok(A), type(A), \vDash \rangle$ such that for every token $a \in tok(A)$, and every type $\alpha\in typ(A)$, $a \vDash_{A}\alpha$  if and only if  $a$ is of type $\alpha$.

One might remark that a classification is not much more than a table whose attributes have only two possible value, a sort of degenerate relational database. However, unlike a record/row in a relational database, channel theory treats each token as a first-class object. Relational databases require keys to guarantee that each tuple is unique, and key constraints to model relationships between records in tables. By treating tokens as first class objects, we may model relationships using an infomorphism:

Def 2. Let $A$ and $B$ be two classifications. An infomorphism $f : A \rightleftarrows B$ is a pair of functions $f = \lbrace f^{\wedge}, f^{\vee} \rbrace$ such that $f ^{\wedge} : typ(A) \rightarrow typ(B)$ and $f^{\vee}: tok(B) \rightarrow tok(A)$ so that  it satisfies the following property: that for every type $\alpha$ in A and every token b in B, $b \vDash_{B} f^{\wedge}(\alpha)$ if and only if $f^{\vee}(b) \vDash_{A} \alpha$.

An infomorphism is more general than an isomorphism between classifications, i.e. an isomorphism is a special case of an infomorphism. For example, an infomorphism $f : A \rightleftarrows B$ between classifications A and B might map a single type $\beta$ in B onto two or more types in A, provided that from B’s point of view the two types are indistinguishable, or more precisely that for all tokens b in B and all types $\alpha$ in A, $f^{\vee}(b) \vDash_{A} \alpha$ if and only if $f^{\vee}(b) \vDash_{B} \alpha^{\prime}$. Note that this requirement does not mean that those types in A are not distinguishable in A (or more technically, are not co-extensional in A). There may be tokens in A outside the range of $f^{\vee}$ for which, for example, $a \vDash_{A} \alpha$ but not $a \vDash_{A} \alpha^{\prime}$. A dual observation may be made about the tokens of B. Two tokens of B may be mapped onto the same token in A, provided that those tokens in B are indistinguishable with respect to the set of types $\beta$ in B for which there exists some $\alpha$ such that $f^{\wedge}(\alpha) = \beta)$. Again, this does not mean these same tokens in B are wholly indistinguishable in B. For example, there may be types outside the range of  $f^{\wedge}$ classifying them differently. Thus, an infomorphism may be thought of as a kind of view or filter into the other classification.

It is actually rather difficult to find infomorphisms between arbitrary classifications. In many cases there will be none. If it were too easy, then the morphism would not be particularly meaningful. Too stringent and then it would not be very applicable. However, two classifications may be joined in a fairly standard way.For example, we can add them together:

Def 3. Given two classifications A and B, the sum of A and B is the classification A+B such that:

1.      $tok(A + B)=tok(A)\times tok(B)$,

2.     $typ(A + B)$ is the disjoint union of $typ(A)$ and $typ(B)$ given by $\langle 0,\alpha \rangle$ for each type $\alpha \in typ(A)$ and$\langle 1,\beta \rangle$ for each type $\beta \in typ(B)$ , such that

3.      for each token $\langle a,b\rangle \in tok(A+B)$ $\langle a,b\rangle {{\vDash }_{A+B}}\langle 0,\alpha \rangle \text{ iff a}{{\vDash }_{A}}\alpha$ and $\langle a,b\rangle {{\vDash }_{A+B}}\langle 1,\beta \rangle \text{ iff b}{{\vDash }_{B}}\beta$.

Remark. For any two classifications A and B there exist infomorphisms ${{\varepsilon }_{A}} : A \rightleftarrows A+B$ and ${{\varepsilon }_{B}}:B\rightleftarrows A+B$ defined such that ${{\varepsilon }_{A}}^{\wedge }(\alpha )=\langle 0,A\rangle$ and ${{\varepsilon }_{B}}^{\wedge }(\beta )=\langle 1,B\rangle$ for all types $\alpha \in typ(A)$ and $\beta \in typ(B) {{\varepsilon }_{B}}^{\vee }(\langle a,b\rangle )=b$ and ${{\varepsilon }_{A}}^{\vee }(\langle a,b\rangle )=a$ for each token $\langle a,b\rangle \in tok(A+B)$.

To see how this is useful, we turn now to Barwise and Seligman’s notion of an information channel.

Def 4. A channel C  is an indexed family of infomorphisms $\{ f_{i} : A_{i} \rightleftarrows C \} _{i \in I}$ each having co-domain in a classification C called the core of the channel.

As it turns out, in a result known as the Universal Mapping Property of Sums, given a binary channel C = $\{ f : A \rightleftarrows C, g : B \rightleftarrows C \}$, and infomorphisms ${{\varepsilon }_{A}} : A \rightleftarrows A+B$ and ${{\varepsilon }_{B}}:B\rightleftarrows A+B$, the following diagram commutes:

The result is general and can be applied to arbitrary channels and sums.

I still haven’t exactly shown how this is useful. To do that we introduce some inference rule that can be used to reason from the periphery to the core and back again in the channel.

A sequent $\langle \Gamma ,\Delta \rangle$ is a pair of sets of types. A sequent $\langle \Gamma ,\Delta \rangle$ is a sequent of a classification $A$ if all the types in  $\Gamma$ and $\Delta$ are in $typ(A).$

Def 5. Given a classification $A,$ a token $a\in tok(A)$ is said to satisfy a sequent $\langle \Gamma ,\Delta \rangle$ of $A,$ if $a{{\vDash }_{A}}\alpha$ for every type $\alpha \in \Gamma$ and $a{{\vDash }_{A}}\alpha$ for some type $\alpha \in \Delta$. If every $a\in tok(A)$ satisfies $\langle \Gamma ,\Delta \rangle$, then we say that $\Gamma$ entails $\Delta$ in $A$, written $\Gamma {{\vdash }_{A}}\Delta$ and $\langle \Gamma ,\Delta \rangle$ is called a constraint of $A.$

Barwise and Seligman introduce two inference rules: f-Intro and f-Elim. Given an infomorphism from a classification A to a classification C, $f:A\rightleftarrows C$:

$f\text{-Intro: }\frac{{{\Gamma }^{-f}}{{\vdash }_{A}}{{\Delta }^{-f}}}{\Gamma {{\vdash }_{C}}\Delta }$

$f\text{-Elim: }\frac{{{\Gamma }^{f}}{{\vdash }_{C}}{{\Delta }^{f}}}{\Gamma {{\vdash }_{A}}\Delta }$

The two rules have different properties.  f-Intro preserves validity, ­f-Elim does not preserve validity; f-Intro fails to preserve invalidity, but f-Elim fails to preserve invalidity. f-Elim is however valid precisely for those tokens in A for which there is a token b of B mapping onto A by the infomorphism f.

Suppose then that we have a channel. At the core is a classification of flashlights, and and at the periphery are classifications of bulbs and switches. We can take a sum of the classifications of bulbs and switches. We know that there are infomorphisms from these classifications to the sum (and so this too makes up a channel), and using f-Intro, we know that any sequents of the classifications of bulbs and switches will still hold in the sum classifications: bulbs + switches. But note that the classification bulbs + switches, since it connects every bulb and switch token, any sequents that might properly hold between bulbs and switches will not hold in the sum classification. Similarly, all the sequents holding in the classification bulbs + switches will hold in the core of the flashlight channel. However, there will be constraints in the core (namely those holding between bulbs and switches) not holding in the sum classification bulbs + switches.

In brief: suppose that we know that a particular switch is in the On position, and that it is a constraint of switches that a switch being in the On position precludes it being in the Off position. We can project this constraint into the core of the flashlight channel reliably. But in the channel additional constraints hold (the ones we are interested in). Suppose that in the core of the channel, there is a constraint that if a switch is On in a flashlight then the bulb is Lit in the flashlight We would like to know that because *this* switch is in the On position, that a particular bulb will be Lit. How can we do it? Using f-Elim we can pull back the constraint of the core to the sum classification. But note, that this constraint is *not valid* in the sum-classification. But it is not valid for precisely those bulbs that are not connected in the channel. In this way, we can reason from local information to a distant component of a system, but in so doing, we lose the guarantee that our reasoning is valid, and we lose the guarantee that it is sound.

[1] Barwise, Jon, and Jerry Seligman. 1997. Information Flow: The Logic of Distributed Systems. Cambridge tracts in theoretical computer science 44. Cambridge: Cambridge University Press.

May 4, 2011

## Have Hammer Need Nail

In the past month or two here on Dead Voles the notions of instance and of type have come up in several times (not always in the same place). I have become more keenly aware of this distinction, particularly in certain discussions of cultural anthropology, but also endemically in discussions of programming. More precisely, I have become more keenly aware of how often we slip between talking about the world in the language of types and in the language of tokens, without really being aware that we are doing it, and how difficult and fruitful it can be to discipline ourselves to maintain the distinction, especially when we are trying to analyze the social world.

The history of situation theory’s struggle to arrive at an adequate notion of information flow is perhaps a testament to the tendency to neglect one or the other. In particular, situation theory was introduced with just such a distinction in mind, with a division between situations, as concrete parts of the world, and infons as items of information (or types). And yet, for some quite defensible reasons, situation theorists chose to model situations as the sets of infons made factual by that situation, treating two situations as being identical (i.e., the same situation) whenever they supported precisely the same information. This move reintroduced an ambiguity between tokens and types so that it becomes difficult sometimes to know whether situation theorists are talking about infons or the concrete situations themselves.

But it may also be evident in how we go about interpreting human artifacts in terms of some presumed system of meaning and ignore the brute actuality of the artifact itself (which is why a sacred object can still be used as a paper-weight). It is not enough, as John McCreery tells us, to look for meanings behind the objects; instead we may well ask, why do the gods looks like that?

I have already mentioned the theory of information flow (called channel theory) of Jon Barwise and Jeremy Seligman in their book [1]. Here I would like to briefly introduce two of its main concepts, since not only does it take the distinction between tokens and types as fundamental, but it provides an interesting model of the flow of information.

It is also the hammer, with which I have been looking for a nail.

Let us first define a sort of data structure, that in some ways is not very remarkable. It is merely a kind of attribute table, and is somewhat similar to a formal context in formal concept analysis discussed here. The structure consists of a set of tokens, a set of types classifying those tokens, and a binary classification relation between them.

Def 1. A classification A is a triple $A = \langle tok(A), type(A), : \rangle$ such that for every token $a \in tok(A)$, and every type $\alpha\in typ(A)$, $a:\alpha$  if and only if  $a$ is of type $\alpha$.

The classification distinguishes itself from other similar data structures (and relations in general) by making both types and tokens first class objects of the theory. This allows an interesting morphism between classifications, called an infomorphism (also called a Chu morphism), which we define presently:

Def 2. Let $A$ and $B$ be two classifications. An infomorphism $f : A \rightleftarrows B$ is a pair of contravariant maps $f = \lbrace f^{\wedge}, f^{\vee} \rbrace$ such that $f ^{\wedge} : typ(A) \rightarrow typ(B)$ and $f^{\vee}: tok(B) \rightarrow tok(A)$ satisfying the fundamental property that for every type $\alpha$ in A and every token b in B, $b : f^{\wedge}(\alpha)$ if and only if $f^{\vee}(b) : \alpha$

The infomorphism defines a curious part-whole relationship that can be used to represent a number of interesting relationships, for example, between points of view or perspectives, between map and terrain, between the parts of distributed systems, and between concepts.

An elaboration must wait for a future post, since I have run out of time.

[1] Barwise, Jon, and Jerry Seligman. 1997. Information Flow: The Logic of Distributed Systems. Cambridge tracts in theoretical computer science 44. Cambridge: Cambridge University Press.