Theory of Automata / Computation : Introduction of Alphabet, String, and Languages with Theorems

1. Formal Language Theory


Formal language theory starts from the basic notations of alphabet, string operations, and all operations on sets of strings. By such operations, we can obtain complex languages starting from simple ones.

1.1. Alphabet and Language


An alphabet is a finite set of elements called terminal symbols or characters. Let S = {a1, a2 ,…, ak} be an alphabet with k elements, i.e., its cardinality is |S| = k

A string (also called a word) is a sequence of terminal symbols.

Example 1: Let S = {a, b} be the alphabet. Some strings are: aaba, aab, abaa, b.   

A language is a set of strings on a specified alphabet.

Example 2: For the same alphabet S = {a, b}; three examples of languages are as follow:

L1 = {aa, aaa};

L2 = {aba, aab};

L3 = {ab, ba, aabb, abab, …, aaabbb ,…} = set of strings having equal number of a’s and b’s.

Given a languages, a string belonging to it is called a sentence or phrase. Thus, bbaa Î L3 is a sentence of L3, whereas abbÏ L3 is an incorrect string, i.e. not a sentence of L3.

The cardinality or size of a language is the number of sentences it contains. 
For example, |L2| = | {aba, aab} | = 2

If the cardinality is finite, the language is called finite too. Otherwise, there is no finite bound on the number of sentences in any language, and then the language is termed as infinite
For example, L1 and L2 are finite, but L3 is infinite.

A special finite language is the empty set or f language, which contains no sentence.

| f | = 0

Let us introduce the notation |x|b for the number of symbols b present in a string x. For examples:

|aab|a = 2 , 
|aba|a = 2 ,          
|baa|c = 0.

The length |x| of a string x is the number of symbols it contains, for example, |ab| = 2; |abaa| = 4.

Two strings:

x = a1 a2…ah         and            y = b1 b2 …bk

are equal if h = k and ai = bi , for every i = 1 ,…, h. In other words, examining the strings from left to right, if their respective symbols coincide, they are equal. Thus, we obtain: aba ¹ baa, baa ¹  ba.

1.1A. String operations


In order to manipulate strings, there are many operations. For strings,

x = a1 a2…ah     and    y = b1 b2…bk

concatenation (or product) is defined as:

x•y = a1 a2…ah b1 b2 …bk .

The dot may be dropped, that is, we can write xy in place of x•y.

Example 3: For strings

x = manish         y = kumar           z = jha


We obtain

xy = manishkumar

yz = kumarjha ¹ xy

(xy)z = manishkumar•jha = x(yz) = manish•kumarjha = manishkumarjha

Concatenation is clearly non-cumulative, that is, the identity xy ¹ yx does not hold in general. The associative property holds:
(xy)z = x(yz)

This permits to write without parentheses the concatenation of three or more strings. The length of the result is the sum of the lengths of concatenated strings:
                                   
    |xy| = |x| + |y|
.... equality (1).  

1.1B. Empty string

            
It is useful to introduce the concept of empty (or null) string, denoted by epsilon ε, as it is the only string
which satisfies the following identify:
xε  = εx = x

For every string x, from equality (1), it follows that empty string has length zero:

|ε| = 0

The empty string should not be confused with the empty set. In fact, the empty set is denoted by f (also known as f language) which contains no string, whereas the set {ε}, which also denotes a language, contains one string, which is the empty string, denoted by ε.

1.1C. Sub strings


Let x = uyv be the concatenation of some, possibly empty, strings u, y and v. Then y is a substring of x, u is a prefix of x, and v is a suffix of x. A sub string (prefix, suffix) is called proper if it does not identical with string x.

Let x be a string of length at least k, i.e., |x| ³ k ³ 1. The notation initialk(x) denotes the prefix u of x having length k, to be termed the initial of length k.

Example 4: The string x = aabacba contains the following components:

Prefixes: a, aa, aab, aaba, aabac, aabacb, aabacba, for prefix a, k = 1, for prefix aa, k=2 and so on.

Suffixes: a, ba, cba, acba, bacba, abacba, aabacba.

Sub strings: all prefixes and suffixes and the internal strings such as a, ab, ba, bacb, …

Notice that bc is not a sub string of x, although both b and c occurs in x. The initial of length two is initial2(aabacba) = aa.  

1.1D. Mirror reflection / image


The symbols of a string are usually read from left to right, but it is sometimes requested to reverse the order. The reflection of a string x =a1a2…ah is the string xR = ahah-1 …a1. 

For example, if we have:
x = manish     then       xR = hsinam.

The following identities are close:

(xR) R = x

(xy)R = yR xR  

εRε

Theorem 1.1: Let x be a string and let xrev be “the same’’ string but backwards. Prove that (xy)rev = yrevxrev  for arbitrary strings x, y over an alphabet S.


Proof: We proceed by induction on the length of string.

Definitions: Let us consider following basic definitions with an example.


·       εrevε                       for the empty string ε.

·         arev = a                      for a string which contains only one symbol.

·         A non-empty string x can be partitioned into two substrings x = ay where ‘a’ contains one or zero symbols. Thus, we can then define (ay)rev =yreva , where |a| £ 1.

Example 5: Let x = abc.
=>    xrev ={(abc)rev} = {(ay)rev, where y = bc} = {(y)reva} = {(bc)reva} = {(c)revba} = cba.

Base case 1: Now, we will show that (xy)rev = yrevxrev, Let us first assume that x is the empty string. If so, the claim is true, because (εy)rev = yrev = yrevε.

Base case 2: Now, let us assume that x is a string a containing only one symbol. If so, the claim is true as well, because (ay)rev = yreva, by the above definition.

Induction theory:  Now, let us assume that the claim (uv)rev = vrevurev holds for two strings u and v. This is our induction theory.

Induction step:  Now, we want to show that (auv)rev = vrevureva.
(auv)rev = (uv)reva, by above definition. Further, (uv)rev = vrevurev  , by induction theory, which gives (uv)reva = vrevureva, which is what we wanted to show.

Since we have shown that the claim holds for a (two) base cases (i.e., base case 1 and base case 2), and that it holds for any string which may be constructed by concatenating a symbol to the beginning of the first string, the claim holds for all strings.

Thus, prove.

Theorem 1.2: Show that for any string w in S*, (wR)R  = w.


Proof: We proceed by induction on the length of string w.

·         Base case: when |w| = 0, then w = ε , and (εR)RεRε.

·         Induction theory: if |w| £  n, then (wR)R = w.

·         Induction step: let |w| = n+1. Then w = ua for some ΠS* and ΠS  such that |u| = n. In the following, the second and third equations come from the definition of the reversal and the fourth equation comes from identity of reversal.

(wR)R = ((ua)R)R
        
              = (aR uR)R                           (from section 1.1D)
         
              = (uR)R (aR)R                     (from section 1.1D)
         
              = ua                                        (from section 1.1D)
        
              = w

=>      (wR)R = w.

Thus, prove.

1.1E. Repetitions / iteration


When a string contains repetitions, we must have an operator denoting them. The m-th power (m ³  1, integer) of a string x is the concatenation of x with m-1 itself m-1 times:
           
     xm  xx……x
                m times

By condition, the zero power of any string is defined to be the empty string. The complete definition is:

xm = xm-1 x, if  m > 0
x0 ε

Example 6:

x = ab       
                     
x0 =ε
              
x1 = x = ab       
      
x2 = (ab)2 = abab

y = a2 = aa       

y3 = a2a2a2 = a6

ε0ε        

ε2 =ε

Theorem 1.3: Prove that εkε  for all ³  0.


Proof: First remember the definitions.

 xkε   if  k = 0,  xk = xx….x (k times) if  k > 0.

 xε  = ε x = x (by the definition of concatenation, given in 1.1B).

We prove the claim by induction on k.

Base case: when k = 0, then εk ε0ε, by the definition above.

Induction theory: let us assume that the claim is true for k = i, i.e., ε iε

 Induction step: then we prove that the claim is true for k = i+1, i.e., ε i+1ε

We have:

ε i+1εε….ε                    (i+1) times

                = ε…..ε • ε               (i time) • ε , (i.e., we can write (i + 1) times ε as (i times ε) • ε
      = εε                         (by the induction theory, we have assumed above.)

       = ε                            (by the definition of concatenation, and also by the induction   
                                        theory, we assumed above.)           

Thus, the claim is true for k = i + 1.


Thus, prove.

3 comments:

  1. It is my 1st blog written on very introductory concepts and theorems related to Alphabet, Strings, and Languages. All these concepts are widely used in Automata Theory (or Theory of Computations).
    In my coming 2nd blog, I will continue the same topic with further related matters.
    Thank You.

    ReplyDelete
  2. Nice approach Sir.. This is very helpful..

    ReplyDelete
    Replies
    1. Thank You.
      I am writing my 2nd blog to continue the topics explained in my 1st blog. hope it will be useful for students.

      Delete