Type Analysis
Types
There are many interpretations or definitions of what a type can mean. Here are some:
A type is a collection of values (integers, strings, functions, trees, images, etc)
A type is a way to interpret a sequence of bits
E.g
1001
means 9 as an unsigned integer1001
means -7 as an s2c integer1001
meansTAB
as an ASCII character
A type is a computable property of programs that garuntees some property of their execution
E.g
- If a compiler determines some expression has type
int
, then the expression always produces integer values when it is evaluated - Programs that can be given a type in some programming languages are guaranteed to avoid certain bad behaviours when they run
- Some type systems are sophisticated enough to express functional correctness properties of programs
LACS Types: Int
(32 bit), function type
Type Systems
A type system is a set of rules that define a typing relation
Typing Relation
A typing relation is a set of triples
is the typing context, which is the symbol table that maps each identifier to its meaning is the term corresponding to a subtree of the parse tree is the type
and we traditionally write
The presence of a triple
Since
Inference Rules
An inference rule is a collection of zero or more premises and a conclusion.
The premises are written above a horizontal line and the conclusion below the line. The rule asserts that if the premises hold, then the conclusion holds, just like in formal logic!
Please add this new tuple to the set
- Prof. Ondřej Lhoták
For a better understanding, see some example binary relations. Once you understand how this set of tuples represents a relation, things make more sense: a relation holds if its corresponding tuple is in the set, otherwise it does not hold. This explains why the set
When we write a relation, we use metavariables like
Additional notes on notation:
means a sequence of variables means we are looking up the identifier in the symbol table and grabbing its return type
Well-Formedness Relations
Some parts of the tree are not expressions but still have types, such as function declarations.
A well-formedness relation is a set of pairs
is the parsing context (symbol table) is a term (parse tree)
and we traditionally write
LACS Type Rules
Metavariable Definitions
Type Derivation Rules
this means a numeric literal will always have the type integer.
this means, if we use a variable that is in the symbol table with name ID of type
if we tried something goofy ahh like main + main
(where main
is a procedure), then we would have an error, because there is no rule that takes two functions and adds them together.
A more intuitive way of looking at this is, if
For example, if we tried something even more goofy, like 5 * 2 + (3 * main)
, then we have an error, because ONE of the expression subtrees is invalid.
When we write (1 + 3) * (2 + 5)
, note that
in LACS, assignment "returns" its value as an expression (like in Javascript).
When we have a block expression, the last thing that shows up is the type that the block evaluates to (in LACS, we must use semicolons). In this way, if we have a function like def func() = {a = 5; 3}
, our block {a = 4; 3}
evaluates to type Int
, regardless of what came before it.
this means if we have a procedure with params
Well-Formedness Rules
this rule defines the well-formedness of the root node of the parse tree, which is expanded using production rule defdefs -> defdef defdefs
or defdefs -> defdef
.
The premise is checking that every procedure is well-formed (checked by the next rule)
The conclusion adds the list of top-level function declarations with an empty symbol table to the set. We can do this because in LACS, there are no global variables.
- What everything means:
: procedure params : local variable declarations : nested procedures
- The first premise enforces the requirement that parameters, variables, and nested procedure declarations must hav distinct names
- The second premise states that the symbol table
is extended with the declarations of the parameters, variable declarations, and nested procedure declarations, and that each nested procedure declaration must be recursively well-formed using this extended symbol table- The commas mean that any new declarations take precedence over (replace) declarations of the same name in
. That is, we take the "nearest" (innermost) scope of declaration for a name
- The commas mean that any new declarations take precedence over (replace) declarations of the same name in
- The third premise states that the body
of the procedure may refer to any of the parameters, variables, and nested procedures declared in that procedure, as well as those declared in any outer enclosing scopes- Note that this premise is a type-checking premise, not a well-formed premise
- We are type-checking with our new extended symbol table
- In order for a procedure declaration to be well-formed, the body
must have a type according to the typing rules- I.e
- I.e
- The rule also checks that the return type
is correct
- Note that in LACS, you must declare in order: local variables, nested procedures, expressions
- This rule corresponds to the grammar rule
defdef -> DEF ID LPAREN parmsopt RPAREN COLON type BECOMES LBRACE vardefsopt defdefsopt expras RBRACE
In LACS, only well-formedness rules extend the symbol table (add new declarations). In all other rules, the symbol table
Soundness
A type system is sound if when it assigns a type to an expression, the expression evaluates to a value of the same type
Type Checking
We need an algorithm to type check a parse tree.
Given a symbol table
The triple
For LACS, each conclusion has a distinct syntactic form. In other words, for each grammar rule that could be in the production rule, there is only one type inference rule that could possibly apply.
For example:
- If
is a numeric literal, only theLITERAL
rule could give it a type - If
is an arithmetic expression, only theARITHMETIC
rule could give it a type - If
is an if-statement, only theIF STATEMENT
, rule could give it a type
Therefore, the implementation of the type checker can look at the grammar production rule in the root node of the tree
To implement an inference rule, the type checker needs to determine values for the metavariables, from the tree and the symbol table.
Examples:
- If the tree
is an addition expression, the metavariables and must refer to the subtrees of defining the two operands of addition - If the tree
is a use of an identifier, the premise of the identifier inference rule requires that identifier to be looked up in , then the result becomes the value of the metavariable
With values for the metavariables, the type checker checks the premises of the inference rule. In many cases, this involves recursive calls to determine types for subtrees of
Finally, the type checker returns the type specified by the conclusion of the inference rule.
The property that the syntactic form of each parse tree node determines which inference rule to apply and suggests values for metavariables in the rule is specific to LACS. Other type systems may require more complex algorithms to search for ways to apply inference rules and instantiate metavariables to find a triple
For each inference rule, start with the bottom left (syntactic shape of the tree) to the top (to check the premises) to the bottom right (return the type)