CSC444 Oct09

single file version

Regular Languages Pumping Lemma
Example of Proof Idea of the Pumping Lemma
Example 1 Using the Pumping Lemma
Example 2 Using the Pumping Lemma
Quiz Remarks/Questions
Context-Free Grammars
Examples
Derivations
Parse Trees
Yields
Context-Free Languages (CFL)
Closure Properties for Regular Languages
Closure Properties for CFLs
Closure of CFLs under Union
Closure of CFLs under Concatenation
Closure of CFLs under Star
Chomsky Normal Form
Example 1
Example 2
Converting to Chomsky Normal Form
Example
Parsing Problem
Parsing Algorithms
LL(1) Grammars
LL(1) Grammars are a Proper Subset of Context-Free Grammars
LL(1) Languages are a Proper Subset of CFLs
Push Down Automaton

Regular Languages Pumping Lemma [2] [top]

Pumping Lemma for Regular Languages: If A is a regular language, then there is a number p such that if s is any string in A of length at least p, then s may be divided into three pieces, s = xyz with the properties:

for each i >= 0, xyⁱz ∈ A,
|y| > 0, and
|xy| <= p

Example of Proof Idea of the Pumping Lemma [3] [top]

The integer p associated with the pumping lemma is just the number of states a DFA that recognizes the regular language in question.


               a         b         c
    ---> [q0] ---> [q1] ---> [q2] ---> [[q3]]
                    ^                    |
                    |                    |
                    +--------------------+
                              a

    p = 4

    So we need a string of length at least 4

    w = abcabc is accepted and has length 6 >= 4.

    The first state that is repeated when w is input to this DFA
    is  q₁. 

    x = string up to the first occurrence of repeated state q₁
    y = string after x up to the second occurence of q₁

    So x = a and y = bca which means z = bc

    w = |a|bca|bc|
         x  y  z

   Pumping lemma says these string are also in the language

   w₀ = xy⁰z = xz = a bc
   w₁ = xy¹z = xyz = a bca bc
   w₂ = xy²z = xyyz = a bca bca bc

Example 1 Using the Pumping Lemma [4] [top]

   L = { aⁱba^j | 0 <= i < j }

Proof is by contradiction, using the pumping lemma to get the contradiction.

Assume L is regular and let p be the constant given by the pumping lemma.

The string w = a^pba^p+1 is in L and has length > p.

By the pumping lemma w = xyz with |xy| <= p and |y| > 0.

But this means the prefix xy must come before the 'b' and consist only of a's.

So this means y consists of 1 or more a's (x might be empty).

By the pumping lemma, w₂ = xy²z must also be in L, but the number of a's before the b in w₂ must be at least p + 1, while the number of a's after b is still p + 1.

But this contradicts the condition for w₂ being in L and so the assumption that L is regular is false.

Example 2 Using the Pumping Lemma [5] [top]

   L = { aⁱba^j | i >  j >= 0 }

Proof is by contradiction again, using the pumping lemma to get the contradiction, but has to work slightly differently.

Adding more y's will not work because now the condition is i > j., the leading a's are greater in number than the ones after the b.

Assume L is regular and let p be the constant given by the pumping lemma.

The string w = a^p+1ba^p is in L and has length > p.

By the pumping lemma w = xyz with |xy| <= p and |y| > 0.

But this again means the prefix xy must come before the 'b' and consist only of a's.

So this means y consists of 1 or more a's (x might be empty).

By the pumping lemma, w₀ = xz must also be in L, but the number of a's before the b in w₀ must be no more than p since we have removed at least 1 a from w to get w₀. But the number of a's after b in w₀ is still p.

But this contradicts the condition for w₀ being in L and so the assumption that L is regular is false.

Quiz Remarks/Questions [6] [top]

Quiz

Context-Free Grammars [7] [top]


A Context Free Grammar is a 5-tuple (V, ∑, R, S)

  V   Finite set of variables (also called nonterminals)
  ∑   Finite Set of terminal symbols disjoint from V
  R   A finite set of grammar rules 
      (also called productions) 
       formally a subset of V x (V U ∑)^*
  S   A designated element of V, called the start symbol

  However, (X, w) in R is usually written with one of the following
  notations:

    X -> w

or

    X ::= w

and if there are multiple rules with the same X on the left, the
notation is used to abbreviate listing X multiple times. E.g.,

    X -> w₁ | w₂

instead of

    X -> w₁
    X -> w₂

Note that the right side of a rule can be ε

Examples [8] [top]

1. S -> a S b | ε

2. S -> S + M | M
   M -> M * T | T
   T -> n

V = {S,M,T}
∑ = { n }  (n represents any integer}
Start smbol is S

Derivations [9] [top]

A derivation of an element of w in ∑^* is a sequence of strings w₀ w₁ ... w_n n >=1 or more derivation steps where

each w_i is in (V U ∑)^*
w₀ is the string consisting of the start symbol alone
For each i = 0,1,...n-1, w_i+1 is obtained from w_i by replacing some nonterminal X in w_i with the corresponding right side u of a rule X -> u in the grammar.
w_n = w

Parse Trees [10] [top]

A derivation can be represented graphically as a parse tree.

The terminals of the derived string w are leaves of the tree.

The Start symbol is the root of the tree.

The nonterminals are interior nodes of the tree.

Each derivation step using a rule X -> u₁ u₂ ... u_k makes the u_i's be child nodes of X in the associated parse tree.

Yields [11] [top]

A replacement of a nonterminal can be applied to a portion of a parse tree or in general to a string of the form uXv where u and v are in (V U ∑)^* and X is in V to give uwv where X -> w is a grammar rule. If 0 or more repeated replacements of this type on uXv result in a string z, we say that uXv yields z (instead of derives z).

Context-Free Languages (CFL)[12] [top]

The language of a context free grammar is the set of all terminal strings that can be derived with the grammar.

A language is defined to be a context-free language (CFL) if it is the language of some context-free grammar.

Closure Properties for Regular Languages [13] [top]

Recall that regular languages are defined slightly differently, but we have equivalent properties.

A language is regular if it is

recognized by a DFA
recognized by an NFA
described by a regular expression
the language of some regular grammar

Regular languages are closed under these operations:

union
concatenation
star
intersection
complement

Closure Properties for CFLs [14] [top]

Noting the regular grammar characterization of regular languages, one might expect various similarities including closure.

Context-free languages, it turns out, are closed under these operations:

union
concatenation
star

Its actually pretty easy using grammars to see these closure properties. (How do you show concatenation, for example?)

Context-free languages are not closed under intersection. OK

But they are also not closed under complement, which may be a bit more surprising.

Closure of CFLs under Union [15] [top]

Let L₁ and L₂ be CFLs. To show that L₁ ∪ L₂ is a CFL, we must construct a context-free grammar whose language is L₁ ∪ L₂.

Since L₁ and L₂ are CFLs, there are grammars G1 = (V1, ∑₁, R₁, S₁) and G2 = (V2, ∑₂, R₂, S₂) with L(G₁) = L₁ and L(G₂) = L₂.

The idea is to introduce a new start symbol, S, and a rule:

S -> S₁ | S₂

More formally, let G = ( V₁ ∪ V₂, ∑₁ ∪ ∑₂, R₁ ∪ R₂ ∪ {S -> S₁ | S₂}, S).

It is assumed that V₁ and V₂ are disjoint. If not, then renaming the non-terminals so they are disjoint doesn't change the languages L₁ and L₂.

The language of G is easily seen to be L₁ ∪ L₂.

Closure of CFLs under Concatenation [16] [top]

To see that CFLs are closed under concatenation, use a similar construction to the one just used for union. The only difference is that the rule for the new start symbol S is

 S -> S₁ S₂

Closure of CFLs under Star [17] [top]

To construct a grammar whose language is L₁^* the idea is to introduce a new start symbol S and the rules:

S -> S₁ S | ε

Formally, let G = (V₁, ∑₁, R₁ ∪ {S -> S₁ S | ε }, S)

The recursive rule for the new start symbol will generate 1 or more instances of L₁ concatenated. Together with the ε rule, the language of G is easily seen to be L₁^*

Chomsky Normal Form [18] [top]

A grammar is in Chomsky Normal if the rules are all of one of the forms:

X -> Y Z where Y and Z are nonterminals, or
X -> t where t is a terminal.

The right side of the rule for the start symbol is also allowed to be ε.

Example 1 [19] [top]

    S -> X B
    X -> A S
    A -> a
    B -> b

Chomsky normal form

Example 2 [20] [top]

    S -> X B | ε
    X -> A S
    A -> a
    B -> b

This is also in Chomsky normal form, but only the start symbol can
have a rule with ε right side.

Converting to Chomsky Normal Form [21] [top]

Intuitively, and in abbreviated form, the conversion of any context-free grammar to Chomsky Normal form is accomplished by these three steps:

Introduce a new Start symbol and a rule.
Replace all ε rules, except for the start symbol
Replace all unit rules ( A unit rule is of the form X -> Y where Y is a single nonterminal)
Convert rules with multiple right sides to Chomsky normal form by introducing additional nonterminals as neccessary.

Example [22] [top]

    S -> a B b 
    B -> a B b | ε

// Introduce a new start symbol
1.  P -> S
    S -> a B b 
    B -> a B b | ε
   
// Replace ε rules:
2.  P -> S
    S -> a B b | a b
    B -> a B b | a b

// Replace unit rules:
3.  P -> a B b | a b    
    B -> a B b | a b

// Convert to Chomsky Normal Form by adding nonterminals
4.  P -> X Y | X Z
    X -> a
    Y -> B Z
    Z -> b
    B -> X Y | X Z

See example on pages 108 - 109.

Parsing Problem [23] [top]

The parsing problem for a grammar is: Given a string, determine if it is in the language of the grammar.

This means we have to determine if there is a derivation for the string or not.

Parsing Algorithms [24] [top]

There are many parsing algorithms including ones with cryptic notational names:

LL(1)
LL(k) for k > 1
LR(1)

Roughly the LL versions try to imitate a derivation; they begin with the start symbol and are called top down parsers.

The LR parsers try to imitate a derivation in reverse; they work their way backwards to the start symbol. The parse tree is built bottom up.

These algorithms are deterministic.

LL(1) Grammars [25] [top]

   S -> x A | y B
   A -> x A z | w
   B -> y B z | w

This is an LL(1) grammar. 
There is always only one choice for the next rule to use in a
left-most derivation.  

Predict(S -> x A) = { x } and Predict(S -> y B) = { y }

These "Predict" sets tell when to use the grammar rule when giving a left-most derivation with S as the left-most nonterminal.

If the next input is 'x', then use rule S -> x A; if 'y', use S -> y B; else the input is rejected.

A grammar is LL(1) if for each nonterminal X, the Predict sets for grammar rules for X are pariwise disjoint.

An LL(1) grammar can parse its input deterministically with no backup.

However, not every context-free grammar is LL(1).

LL(1) Grammars are a Proper Subset of Context-Free Grammars [26] [top]

Problem with left-recursion.

    S -> S '+' M | M
    M -> M '*' T | T
    T -> n

The left-recursion means that Predict(S -> S + M) = Predict(S -> M)

Predict(S -> M) is the set of terminal symbols that begin strings derived using this rule.

Predict(S -> M) is not { M } since M is a nonterminal. Instead we must examine what strings M can derive. So Predict(S -> M) = Predict(M -> M * T) ∪ Predict(M -> T) = Predict(T -> n) = { n }.

But now the left recursion means that to see what terminal symbols can begin strings derived using S -> S + M, we have to eventually replace the leading nonterminal S using S -> M.

Consequently, left-recursion means Predict(S -> S + M) and S -> M), far from being disjoint are eqal.

So left-recursion in a grammar guarantees the grammar is not LL(1).

We can often convert a context-free grammar to an equivalent LL(1) grammar, especially context-free grammars used to describe programming languages.

However, ...

LL(1) Languages are a Proper Subset of CFLs [27] [top]

A language is LL(1) if it is the language of some LL(1) grammar.

Unfortunately, LL(1) languages are a proper subset of CFLs.

That is, we can't hope to convert an arbitrary context-free grammar to LL(1).

Here is an example,


    { ww^R | w in {a,b}^* and w^R is the reverse of w}

    E.g. for w = aaabab,

    ww^R = aaababbabaaa

Here is a grammar for this language:

   S -> a S a | b S b | ε

This grammar is not LL(1). The S -> ε rule should be applied when the first half has been seen (the w part of ww^R). But with only 1 input symbol look ahead, this can't be determined.

This language is an example of a non-deterministic CFL.

But every LL(1) language is deterministic

So there can be no transformation of the grammar to an equivalent LL(1) grammar.

Push Down Automaton [28] [top]

As we will see a language is context-free if it is the language recognized by a particular kind of automaon, the push-down automaton (PDA).

For regular languages there is no difference between the languages defined for the deterministic and non-deterministic finite state automata.

The situation change however, for context-free languages.

Nondeterministic and deterministic versions of push down automata don't recognize the same class of languages.

Next time.

CSC444 Oct09

slide version

single file version

Contents

Regular Languages Pumping Lemma [2] [top]

Example of Proof Idea of the Pumping Lemma [3] [top]

Example 1 Using the Pumping Lemma [4] [top]

Example 2 Using the Pumping Lemma [5] [top]

Quiz Remarks/Questions [6] [top]

Context-Free Grammars [7] [top]

Examples [8] [top]

Derivations [9] [top]

Parse Trees [10] [top]

Yields [11] [top]

Context-Free Languages (CFL)[12] [top]

Closure Properties for Regular Languages [13] [top]

Closure Properties for CFLs [14] [top]

Closure of CFLs under Union [15] [top]

Closure of CFLs under Concatenation [16] [top]

Closure of CFLs under Star [17] [top]

Chomsky Normal Form [18] [top]

Example 1 [19] [top]

Example 2 [20] [top]

Converting to Chomsky Normal Form [21] [top]

Example [22] [top]

Parsing Problem [23] [top]

Parsing Algorithms [24] [top]

LL(1) Grammars [25] [top]

LL(1) Grammars are a Proper Subset of Context-Free Grammars [26] [top]

LL(1) Languages are a Proper Subset of CFLs [27] [top]

Push Down Automaton [28] [top]

CSC444 Oct09

slide version

single file version

Contents

Regular Languages Pumping Lemma[2] [top]

Example of Proof Idea of the Pumping Lemma[3] [top]

Example 1 Using the Pumping Lemma[4] [top]

Example 2 Using the Pumping Lemma[5] [top]

Quiz Remarks/Questions[6] [top]

Context-Free Grammars[7] [top]

Examples[8] [top]

Derivations[9] [top]

Parse Trees[10] [top]

Yields[11] [top]

Context-Free Languages (CFL)[12] [top]

Closure Properties for Regular Languages[13] [top]

Closure Properties for CFLs[14] [top]

Closure of CFLs under Union[15] [top]

Closure of CFLs under Concatenation[16] [top]

Closure of CFLs under Star[17] [top]

Chomsky Normal Form[18] [top]

Example 1[19] [top]

Example 2[20] [top]

Converting to Chomsky Normal Form[21] [top]

Example[22] [top]

Parsing Problem[23] [top]

Parsing Algorithms[24] [top]

LL(1) Grammars[25] [top]

LL(1) Grammars are a Proper Subset of Context-Free Grammars[26] [top]

LL(1) Languages are a Proper Subset of CFLs[27] [top]

Push Down Automaton[28] [top]

Regular Languages Pumping Lemma [2] [top]

Example of Proof Idea of the Pumping Lemma [3] [top]

Example 1 Using the Pumping Lemma [4] [top]

Example 2 Using the Pumping Lemma [5] [top]

Quiz Remarks/Questions [6] [top]

Context-Free Grammars [7] [top]

Examples [8] [top]

Derivations [9] [top]

Parse Trees [10] [top]

Yields [11] [top]

Closure Properties for Regular Languages [13] [top]

Closure Properties for CFLs [14] [top]

Closure of CFLs under Union [15] [top]

Closure of CFLs under Concatenation [16] [top]

Closure of CFLs under Star [17] [top]

Chomsky Normal Form [18] [top]

Example 1 [19] [top]

Example 2 [20] [top]

Converting to Chomsky Normal Form [21] [top]

Example [22] [top]

Parsing Problem [23] [top]

Parsing Algorithms [24] [top]

LL(1) Grammars [25] [top]

LL(1) Grammars are a Proper Subset of Context-Free Grammars [26] [top]

LL(1) Languages are a Proper Subset of CFLs [27] [top]

Push Down Automaton [28] [top]