Pumping lemma for context-free languages
In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages. It generalizes the pumping lemma for regular languages.
Formal statement
If a language L is context-free, then there exists some integer p ≥ 1 (called a "pumping length"^{[1]}) such that every string s in L that has a length of p or more symbols (i.e. with |s| ≥ p) can be written as
- s = uvwxy
with substrings u, v, w, x and y, such that
- 1. |vwx| ≤ p,
- 2. |vx| ≥ 1, and
- 3. uv^{n}wx^{n}y is in L for all n ≥ 0.
Below is a formal expression of the Pumping Lemma.
∀L⊆Σ^{*}. (context_free(L) ⇒ ∃p≥1. ∀s∈L. (|s|≥p ⇒ ∃u,v,w,x,y∈Σ^{*}. (s=uvwxy ∧ |vwx|≤p ∧ |vx|≥1 ∧ ∀n≥0. uv^{n}wx^{n}y∈L)))
Informal statement and explanation
The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.
The property is a property of all strings in the language that are of length at least p, where p is a constant—called the pumping length—that varies between context-free languages.
Say s is a string of length at least p that is in the language.
The pumping lemma states that s can be split into five substrings, s = uvwxy, where vx is non-empty and the length of vwx is at most p, such that repeating v and x any (and the same) number of times in s produces a string that is still in the language (it is possible and often useful to repeat zero times, which removes v and x from the string). This process of "pumping up" additional copies of v and x is what gives the pumping lemma its name.
Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having p equal to the maximum string length in L plus one. As there are no strings of this length the pumping lemma is not violated.
Usage of the lemma
The pumping lemma is often used to prove that a given language L is non-context-free, by showing that arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L.
For example, the language L = { a^{n}b^{n}c^{n} | n > 0 } can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma, there exists an integer p which is the pumping length of language L. Consider the string s = a^{p}b^{p}c^{p} in L. The pumping lemma tells us that s can be written in the form s = uvwxy, where u, v, w, x, and y are substrings, such that |vwx| ≤ p, |vx| ≥ 1, and uv^{i}wx^{i}y is in L for every integer i ≥ 0. By the choice of s and the fact that |vwx| ≤ p, it is easily seen that the substring vwx can contain no more than two distinct symbols. That is, we have one of five possibilities for vwx:
- vwx = a^{j} for some j ≤ p.
- vwx = a^{j}b^{k} for some j and k with j+k ≤ p.
- vwx = b^{j} for some j ≤ p.
- vwx = b^{j}c^{k} for some j and k with j+k ≤ p.
- vwx = c^{j} for some j ≤ p.
For each case, it is easily verified that uv^{i}wx^{i}y does not contain equal numbers of each letter for any i ≠ 1. Thus, uv^{2}wx^{2}y does not have the form a^{i}b^{i}c^{i}. This contradicts the definition of L. Therefore, our initial assumption that L is context free must be false.
While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free.
On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example L = { b^{j}c^{k}d^{l} | j, k, l ∈ ℕ } ∪ { a^{i}b^{j}c^{j}d^{j} | i, j ∈ ℕ, i≥1 }: for s=b^{j}c^{k}d^{l} with e.g. j≥1 choose vwx to consist only of b’s, for s=a^{i}b^{j}c^{j}d^{j} choose vwx to consist only of a’s; in both cases all pumped strings are still in L.^{[2]} There are more powerful proof techniques available, such as Ogden's lemma, but also these techniques do not give a complete characterization of the context-free languages.
References
- ↑ Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words. CRM Monograph Series. 27. Providence, RI: American Mathematical Society. p. 90. ISBN 978-0-8218-4480-9. Zbl 1161.68043.
- ↑ John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Here: sect.6.1, p.129
- Bar-Hillel, Y.; Micha Perles, M. and Eli Shamir (1961). "On formal properties of simple phrase-structure grammars". Zeitschrift für Phonetik, Sprachwissenschaft, und Kommunikationsforschung. 14 (2): 143–172. Cite uses deprecated parameter
|coauthors=
(help) — Reprinted in: Y. Bar-Hillel (1964). Language and Information: Selected Essays on their Theory and Application. Addison-Wesley series in logic. Addison-Wesley. pp. 116–150. ISBN 0201003732. OCLC 783543642. - Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 0-534-94728-X. Section 1.4: Nonregular Languages, pp. 77–83. Section 2.3: Non-context-free Languages, pp. 115–119.