Context-Free Grammars

Consider the expression a + b * c - d, we would have a DFA:

but if we wrote (a + b) * (c - d), our DFA would reject it. We could make this DFA:

But this DFA would reject ((a + b) * (c - d)). The problem is, we can have an infinite number of parentheses, so we cannot define this language with a definite finite automaton. Instead, we need to use a context-free language.

Production Rules

Definition

A production rule is a rewrite rule specifying a symbol substitution that can be recursively performed to generate new symbol sequences

Example

Example production rules for a CFG are:

Explanation:

  • An expression is either an identifier, or an expression followed by an operator and anothe expression, or an expression in parentheses
  • An operator is one of the four listed arithmetic operators

Parse Trees

Definition

In a CFG, we can construct parse trees for words:

  • Leaves are symbols in the word
  • Internal nodes are a production rule in the grammar
    • Label: left side of a rule
    • Children: right side of the rule
Example

For the expression (a + b) * c, we have the parse tree:

Example

/** A sample grammar. */
val grammar = parseGrammar(
    """
S BOF A EOF
A x
A A x
    """)

/** A parse tree of the string `BOF x EOF` generated by `grammar`. */
val oneX: Tree =
    new Tree(Token("S"), Seq(
        new Tree(Token("BOF")),
        new Tree(Token("A"), Seq(
            new Tree(Token("x")),
        )),
        new Tree(Token("EOF")),
    ))

/** A parse tree of the string `BOF x x EOF` generated by `grammar`. */
lazy val twoXes: Tree =
    new Tree(Token("S"), Seq(
        new Tree(Token("BOF")),
        new Tree(Token("A"), Seq(
            new Tree(Token("A"), Seq(
            new Tree(Token("x")),
        )),
        new Tree(Token("x")),
    )),
    new Tree(Token("EOF")),
))

/** A parse tree of the string `BOF x x x EOF` generated by `grammar`. */
lazy val threeXes: Tree =
    new Tree(Token("S"), Seq(
        new Tree(Token("BOF")),
        new Tree(Token("A"), Seq(
            new Tree(Token("A"), Seq(
                new Tree(Token("A"), Seq(
                    new Tree(Token("x")),
                )),
                new Tree(Token("x")),
            )),
            new Tree(Token("x")),
        )),
        new Tree(Token("EOF")),
    ))

Ambiguity

An expression for a grammar might not have a unique tree.

Definition

A CFG is ambiguous if it generates multiple parse trees for some input string

Example

We want to design our grammar without ambiguity (in the example, we would specify that our grammar follows BEDMAS)

Example

These production rules are ambiguous:

where as these production rules that generate the same language are unambiguous:

Context Free Grammars Formally

Definition

A context free grammar is a 4-tuple

  • is a finite set of non-terminal symbols
  • is a finite set of terminal symbols
  • is a finite set of production rules
  • is the start non-terminal root of the parse tree
Example

For the grammar

we would have:

  • is the set of:
    • expr -> ID
    • expr -> expr op expr
    • expr -> (expr)
    • op -> +
    • op -> -
    • op -> *
    • op -> /

and we would write:

The language is called context-free because it is not context-sensitive. In a context-sensitive language, for the production rules, you can have sequences on the left side of the production rules rather than single symbols.

Conventions

that is:

Direct Derivation

Definition

Whenever is a production rule in the grammar, we say directly derives and we write .

Example

NB: do not mix this up with an implication. To disambiguate, we will use a shorter arrow () as opposed to a longer one ().

Derivation

Definition

derives iff we can use zero or more direct derivations to derive : , and we write

Example

Intuition

This is just an extreme formalization running production rules on expr to get something else

Generation or Specification

Definition

The language generated or specified by is:

(see set builder notation)

Intuition

The language generated by the grammar is a set of words such that we can derive said word from the starting non-terminal root.

means a sequence of symbols from our alphabet .

Context-Free Language

Definition

A language is context-free is there exists a CFG that specified/generates it

Undecidability

Info

It is undecidable (there does not exist an algorithm to decide), given two CFGs and , whether they generate the same language.

For a regular language, it is decidable whether two DFAs, NFAs, or regular expressions specify the same language.

Info

It is undecidable whether a given context-free grammar is ambiguous. It is possible to prove that a specific grammar is unambiguous, but there is no algorithm that can determine this for any arbitrary grammar.