Type Analysis

Types

There are many interpretations or definitions of what a type can mean. Here are some:

Definition

A type is a collection of values (integers, strings, functions, trees, images, etc)

Definition

A type is a way to interpret a sequence of bits

E.g

1001 means 9 as an unsigned integer
1001 means -7 as an s2c integer
1001 means TAB as an ASCII character

Definition

A type is a computable property of programs that garuntees some property of their execution

E.g

If a compiler determines some expression has type int, then the expression always produces integer values when it is evaluated
Programs that can be given a type in some programming languages are guaranteed to avoid certain bad behaviours when they run
Some type systems are sophisticated enough to express functional correctness properties of programs

LACS Types: Int (32 bit), function type , void

Type Systems

Definition

A type system is a set of rules that define a typing relation

Typing Relation

Definition

A typing relation is a set of triples denoted , where:

is the typing context, which is the symbol table that maps each identifier to its meaning
is the term corresponding to a subtree of the parse tree
is the type

and we traditionally write to mean .

The presence of a triple in a typing relation is intended to mean that the expression represented by the tree has type , using the symbol table to look up the declarations of any identifiers used in .

Since contains all possible symbol tables and parse trees , it is infinite, so it's not helpful to think about enumerating all possible triples in it, but it is intuitively helpful to remember it as a set of triples.

Inference Rules

Definition

An inference rule is a collection of zero or more premises and a conclusion.

The premises are written above a horizontal line and the conclusion below the line. The rule asserts that if the premises hold, then the conclusion holds, just like in formal logic!

Example

Please add this new tuple to the set
- Prof. Ondřej Lhoták

is the symbol table, most rules do not change the symbol table (they are the same in the premise and conclusion)

For a better understanding, see some example binary relations. Once you understand how this set of tuples represents a relation, things make more sense: a relation holds if its corresponding tuple is in the set, otherwise it does not hold. This explains why the set is infinitely large.

When we write a relation, we use metavariables like which mean any number or any symbol table (this is probably self-evident). That is to say, they are implicitly universally quantified. For example, in our set, we may have , , , etc

Additional notes on notation:

means a sequence of variables
means we are looking up the identifier in the symbol table and grabbing its return type

Well-Formedness Relations

Some parts of the tree are not expressions but still have types, such as function declarations.

Definition

A well-formedness relation is a set of pairs denoted , where:

is the parsing context (symbol table)
is a term (parse tree)

and we traditionally write to mean .

LACS Type Rules

Metavariable Definitions

Type Derivation Rules

Rule - Literal

this means a numeric literal will always have the type integer.

Rule - Variable

this means, if we use a variable that is in the symbol table with name ID of type , then we add it to the symbol table.

Rule - Procedure

Rule - Parentheses

Rule - Arithmetic

if we tried something goofy ahh like main + main (where main is a procedure), then we would have an error, because there is no rule that takes two functions and adds them together.

A more intuitive way of looking at this is, if and , then the triple containing the tree corresponding to the expression . This is because when we have an arithmetic expression, every expression in the tree must evaluate to an integer.

For example, if we tried something even more goofy, like 5 * 2 + (3 * main), then we have an error, because ONE of the expression subtrees is invalid.

When we write (1 + 3) * (2 + 5), note that , etc.

Rule - Assignment

in LACS, assignment "returns" its value as an expression (like in Javascript).

Rule - Sequencing

means any type (we don't care what type, since it won't be in the rule), but it MUST be SOME type

When we have a block expression, the last thing that shows up is the type that the block evaluates to (in LACS, we must use semicolons). In this way, if we have a function like def func() = {a = 5; 3}, our block {a = 4; 3} evaluates to type Int, regardless of what came before it.

Rule - If Statement

Rule - Procedure Call

this means if we have a procedure with params , then we can call it with the correct parameters (whether they are literals or variables). Note that the second parameter is saying that for every parameter, the types of some value matches the types of the param list.

Well-Formedness Rules

Rule - Program

this rule defines the well-formedness of the root node of the parse tree, which is expanded using production rule defdefs -> defdef defdefs or defdefs -> defdef.
The premise is checking that every procedure is well-formed (checked by the next rule)
The conclusion adds the list of top-level function declarations with an empty symbol table to the set. We can do this because in LACS, there are no global variables.

Rule - Procedure Declaration

What everything means:
- : procedure params
- : local variable declarations
- : nested procedures
The first premise enforces the requirement that parameters, variables, and nested procedure declarations must hav distinct names
The second premise states that the symbol table is extended with the declarations of the parameters, variable declarations, and nested procedure declarations, and that each nested procedure declaration must be recursively well-formed using this extended symbol table
- The commas mean that any new declarations take precedence over (replace) declarations of the same name in . That is, we take the "nearest" (innermost) scope of declaration for a name
The third premise states that the body of the procedure may refer to any of the parameters, variables, and nested procedures declared in that procedure, as well as those declared in any outer enclosing scopes
- Note that this premise is a type-checking premise, not a well-formed premise
- We are type-checking with our new extended symbol table
- In order for a procedure declaration to be well-formed, the body must have a type according to the typing rules
  - I.e
- The rule also checks that the return type is correct
Note that in LACS, you must declare in order: local variables, nested procedures, expressions
This rule corresponds to the grammar rule defdef -> DEF ID LPAREN parmsopt RPAREN COLON type BECOMES LBRACE vardefsopt defdefsopt expras RBRACE

In LACS, only well-formedness rules extend the symbol table (add new declarations). In all other rules, the symbol table is unchanged. This is true because in LACS, only new procedures can create scopes with new declarations. In other languages where you can declare names in other scopes, the type inference rules would also extend the symbol table.

Soundness

See soundness in formal logic

Definition

A type system is sound if when it assigns a type to an expression, the expression evaluates to a value of the same type

Type Checking

We need an algorithm to type check a parse tree.

Given a symbol table and parse tree , determine whether the typing relation contains any triple of the form , and if it does, return the type from that triple as the type for .

Info

The triple only if the conclusion of some inference rule put it there

For LACS, each conclusion has a distinct syntactic form. In other words, for each grammar rule that could be in the production rule, there is only one type inference rule that could possibly apply.

For example:

If is a numeric literal, only the LITERAL rule could give it a type
If is an arithmetic expression, only the ARITHMETIC rule could give it a type
If is an if-statement, only the IF STATEMENT, rule could give it a type

Step

Therefore, the implementation of the type checker can look at the grammar production rule in the root node of the tree and implement the inference rule corresponding to that production rule.

To implement an inference rule, the type checker needs to determine values for the metavariables, from the tree and the symbol table.

Examples:

If the tree is an addition expression, the metavariables and must refer to the subtrees of defining the two operands of addition
If the tree is a use of an identifier, the premise of the identifier inference rule requires that identifier to be looked up in , then the result becomes the value of the metavariable

Step

With values for the metavariables, the type checker checks the premises of the inference rule. In many cases, this involves recursive calls to determine types for subtrees of .

Finally, the type checker returns the type specified by the conclusion of the inference rule.

Remark

The property that the syntactic form of each parse tree node determines which inference rule to apply and suggests values for metavariables in the rule is specific to LACS. Other type systems may require more complex algorithms to search for ways to apply inference rules and instantiate metavariables to find a triple

Summary

For each inference rule, start with the bottom left (syntactic shape of the tree) to the top (to check the premises) to the bottom right (return the type)
Type Analysis.png