1. Formal Language Theory
Formal language theory starts from the basic notations of alphabet, string operations, and all operations on sets of strings. By such operations, we can obtain complex languages starting from simple ones.
1.1. Alphabet and Language
An alphabet is a finite set of elements called terminal
symbols or characters. Let S = {a1, a2 ,…, ak} be an alphabet
with k elements, i.e., its cardinality is |S| = k.
A string
(also called a word) is a sequence of terminal symbols.
Example 1: Let S = {a, b} be the alphabet. Some strings are: aaba, aab, abaa, b.
A language is a set of strings on a specified alphabet.
Example 2: For the same
alphabet S = {a, b}; three examples
of languages are as follow:
L1
= {aa, aaa};
L2
= {aba, aab};
L3
= {ab, ba, aabb, abab, …, aaabbb ,…} = set of strings having equal number of
a’s and b’s.
Given a
languages, a string belonging to it is called a sentence or phrase. Thus,
bbaa Î L3 is a sentence of
L3, whereas abbÏ L3 is an incorrect string, i.e. not a sentence of
L3.
The cardinality or size of a language is the number
of sentences it contains.
For example, |L2| = | {aba, aab} | = 2.
If the cardinality is finite, the language is called finite too. Otherwise, there is no finite bound on the number of sentences in any language, and then the language is termed as infinite.
For example, L1 and L2 are finite, but L3 is infinite.
For example, |L2| = | {aba, aab} | = 2.
If the cardinality is finite, the language is called finite too. Otherwise, there is no finite bound on the number of sentences in any language, and then the language is termed as infinite.
For example, L1 and L2 are finite, but L3 is infinite.
A special finite
language is the empty set or f language, which contains no sentence.
| f | = 0
|
Let us introduce
the notation |x|b for the number
of symbols b present in a string x. For examples:
|aab|a =
2 ,
|aba|a = 2 ,
|baa|c = 0.
The length |x| of a string x is the number of symbols it contains, for example, |ab| =
2;
|abaa| = 4.
Two strings:
x = a1
a2…ah and
y = b1 b2 …bk
are equal
if h = k and ai = bi , for every i = 1 ,…, h. In other words, examining the strings from left to right, if
their respective symbols coincide, they are equal. Thus, we obtain: aba ¹ baa, baa ¹ ba.
1.1A. String operations
In order to
manipulate strings, there are many operations. For strings,
x = a1
a2…ah and y = b1 b2…bk
concatenation (or product)
is defined as:
x•y = a1 a2…ah
b1 b2 …bk .
The dot may be dropped, that is, we can write
xy in place of x•y.
Example 3: For strings
x = manish y = kumar z = jha
We obtain
xy = manishkumar
yz = kumarjha ¹ xy
(xy)z = manishkumar•jha
= x(yz) = manish•kumarjha = manishkumarjha
Concatenation is
clearly non-cumulative, that is, the
identity xy ¹ yx does not hold in general. The associative property holds:
(xy)z = x(yz)
|
This permits to
write without parentheses the concatenation of three or more strings. The length
of the result is the sum of the lengths of concatenated strings:
|xy|
= |x| + |y|
|
.... equality
(1).
1.1B. Empty string
It is useful to introduce the
concept of empty (or null) string, denoted by epsilon ε, as it is the only string
which satisfies the following identify:
xε = εx = x
|
For every string
x, from equality (1), it follows that
empty string has length zero:
|ε| = 0
|
The empty string
should not be confused with the empty set. In fact, the empty set is denoted by f (also known as f language) which
contains no string, whereas the set {ε}, which also
denotes a language, contains one string,
which is the empty string, denoted by
ε.
1.1C. Sub strings
Let x =
uyv be the concatenation of some, possibly empty, strings u, y and v. Then y is a substring of x, u is a prefix of x, and v is a suffix of x. A sub string (prefix, suffix) is called proper
if it does not identical with string x.
Let x be a string of length at least k, i.e., |x| ³ k ³ 1. The notation initialk(x) denotes the
prefix u of x having length k, to be
termed the initial of length k.
Example 4: The string x = aabacba contains the
following components:
Prefixes: a, aa,
aab, aaba, aabac, aabacb, aabacba, for prefix a, k = 1, for prefix aa, k=2
and so on.
Suffixes: a, ba,
cba, acba, bacba, abacba, aabacba.
Sub strings: all prefixes and suffixes and the internal strings such as a, ab, ba, bacb, …
Sub strings: all prefixes and suffixes and the internal strings such as a, ab, ba, bacb, …
Notice that bc is not a sub string of x, although both b and c occurs in x. The initial of length two is initial2(aabacba) = aa.
1.1D. Mirror reflection / image
The symbols of a
string are usually read from left to right, but it is sometimes requested to
reverse the order. The reflection of
a string x =a1a2…ah is the string xR = ahah-1 …a1.
For
example, if we have:
x = manish
then xR = hsinam.
The following identities are close:
(xR)
R = x
(xy)R
= yR xR
εR = ε
Theorem 1.1: Let x be a string and let xrev be “the same’’ string but backwards. Prove that (xy)rev = yrevxrev for arbitrary strings x, y over an alphabet S.
Proof: We proceed by induction on the length of string.
Definitions: Let us consider following basic definitions with an example.
· εrev = ε for the empty string ε.
·
arev = a for a string which contains only one symbol.
·
A
non-empty string x can be partitioned
into two substrings x = ay where ‘a’ contains one or zero symbols.
Thus, we can then define (ay)rev =yreva , where |a| £ 1.
Example 5: Let x = abc.
Example 5: Let x = abc.
=>
xrev ={(abc)rev}
= {(ay)rev, where y = bc} = {(y)reva} = {(bc)reva}
= {(c)revba} = cba.
Base case 1: Now, we will
show that (xy)rev = yrevxrev, Let us first
assume that x is the empty string. If so, the claim is true, because (εy)rev = yrev = yrevε.
Base
case 2:
Now, let us assume that x is a string
a containing only one symbol. If so, the claim is true as well, because (ay)rev =
yreva, by the above definition.
Induction theory: Now, let us assume that the claim (uv)rev =
vrevurev
holds
for two strings u and v. This is our induction theory.
Induction step: Now, we want to show that (auv)rev =
vrevureva.
(auv)rev = (uv)reva, by above
definition. Further, (uv)rev = vrevurev , by induction theory, which gives (uv)reva
= vrevureva, which is what we wanted to show.
Since we have
shown that the claim holds for a (two) base cases (i.e., base case 1 and base case 2),
and that it holds for any string which may be constructed by concatenating a
symbol to the beginning of the first string, the claim holds for all strings.
Thus, prove.
Theorem 1.2: Show that for any string w in S*, (wR)R = w.
Proof: We proceed by
induction on the length of string w.
·
Base
case: when |w| =
0,
then w = ε , and (εR)R = εR = ε.
·
Induction
theory:
if |w| £ n, then (wR)R =
w.
·
Induction
step:
let |w| = n+1. Then w = ua for some u Î S* and a Î S such that |u| =
n.
In the following, the second and third equations come from the definition
of the reversal and the fourth equation comes from identity of reversal.
(wR)R = ((ua)R)R
= (aR uR)R (from section 1.1D)
= (uR)R (aR)R
(from section 1.1D)
= ua (from section 1.1D)
= w
=> (wR)R
= w.
Thus, prove.
1.1E. Repetitions / iteration
When a string
contains repetitions, we must have an operator denoting them. The m-th power (m ³ 1, integer) of a string x is the concatenation of x
with m-1 itself m-1 times:
xm = xx……x
m times
|
By condition,
the zero power of any string is
defined to be the empty string. The
complete definition is:
xm
= xm-1
x, if m >
0
x0 = ε
|
Example 6:
x
= ab
x0 =ε
x1
= x = ab
x2 =
(ab)2
= abab
y
= a2 = aa
y3 = a2a2a2
= a6
ε0= ε
ε2 =ε
Theorem 1.3: Prove that εk = ε for all k ³ 0.
Proof: First
remember the definitions.
xk = ε if k = 0, xk = xx….x (k times) if k > 0.
xε = ε x = x (by the definition of concatenation, given in 1.1B).
We prove the claim by induction on k.
Base case: when k = 0, then εk = ε0 = ε, by the definition above.
Induction theory: let us assume that the claim is true for k = i, i.e., ε i = ε
Induction step: then we prove that the claim is true for k = i+1, i.e., ε i+1 = ε
We have:
ε i+1= εε….ε (i+1) times
xk = ε if k = 0, xk = xx….x (k times) if k > 0.
xε = ε x = x (by the definition of concatenation, given in 1.1B).
We prove the claim by induction on k.
Base case: when k = 0, then εk = ε0 = ε, by the definition above.
Induction theory: let us assume that the claim is true for k = i, i.e., ε i = ε
Induction step: then we prove that the claim is true for k = i+1, i.e., ε i+1 = ε
We have:
ε i+1= εε….ε (i+1) times
= ε…..ε • ε (i time) • ε , (i.e., we can write (i + 1) times ε as (i times ε) • ε)
= εε (by the induction theory, we have assumed above.)
= ε (by the definition of concatenation, and also by the induction
theory,
we assumed above.)
Thus, the claim is true for k = i + 1.
Thus, prove.
It is my 1st blog written on very introductory concepts and theorems related to Alphabet, Strings, and Languages. All these concepts are widely used in Automata Theory (or Theory of Computations).
ReplyDeleteIn my coming 2nd blog, I will continue the same topic with further related matters.
Thank You.
Nice approach Sir.. This is very helpful..
ReplyDeleteThank You.
DeleteI am writing my 2nd blog to continue the topics explained in my 1st blog. hope it will be useful for students.