Introduction of Alphabet, String, and Languages: Theory of Automata / Computation : Introduction of Alphabet, String, and Languages with Theorems

1. Formal Language Theory

Formal language theory starts from the basic notations of alphabet, string operations, and all operations on sets of strings. By such operations, we can obtain complex languages starting from simple ones.

1.1. Alphabet and Language

An alphabet is a finite set of elements called terminal symbols or characters. Let S = {a₁, a₂,…, a_k} be an alphabet with k elements, i.e., its cardinality is |S| = k.

A string (also called a word) is a sequence of terminal symbols.

Example 1: Let S = {a, b} be the alphabet. Some strings are: aaba, aab, abaa, b.

A language is a set of strings on a specified alphabet.

Example 2: For the same alphabet S = {a, b}; three examples of languages are as follow:

L₁ = {aa, aaa};

L₂ = {aba, aab};

L₃ = {ab, ba, aabb, abab, …, aaabbb ,…} = set of strings having equal number of a’s and b’s.

Given a languages, a string belonging to it is called a sentence or phrase. Thus, bbaa Î L₃is a sentence of L₃, whereas abbÏ L₃is an incorrect string, i.e. not a sentence of L₃.

The cardinality or size of a language is the number of sentences it contains.
For example, |L₂| = | {aba, aab} | = 2.

If the cardinality is finite, the language is called finite too. Otherwise, there is no finite bound on the number of sentences in any language, and then the language is termed as infinite.
For example, L₁ and L₂ are finite, but L₃ is infinite.

A special finite language is the empty set or f language, which contains no sentence.

| f | = 0

Let us introduce the notation |x|_b for the number of symbols b present in a string x. For examples:

|aab|_a = 2 ,

|aba|_a = 2 ,

|baa|_c = 0.

The length |x| of a string x is the number of symbols it contains, for example, |ab| = 2; |abaa| = 4.

Two strings:

x = a₁ a₂…a_h and y = b₁ b₂ …b_k

are equal if h = k and a_i = b_i, for every i = 1 ,…, h. In other words, examining the strings from left to right, if their respective symbols coincide, they are equal. Thus, we obtain: aba ¹ baa, baa ¹ ba.

1.1A. String operations

In order to manipulate strings, there are many operations. For strings,

x = a₁ a₂…a_h and y = b₁ b₂…b_k

concatenation (or product) is defined as:

x•y = a₁a₂…a_h b₁b₂ …b_{k .}

The dot may be dropped, that is, we can write xy in place of x•y.

Example 3: For strings

x = manish y = kumar z = jha

We obtain

xy = manishkumar

yz = kumarjha ¹ xy

(xy)z = manishkumar•jha = x(yz) = manish•kumarjha = manishkumarjha

Concatenation is clearly non-cumulative, that is, the identity xy ¹ yx does not hold in general. The associative property holds:

(xy)z = x(yz)

This permits to write without parentheses the concatenation of three or more strings. The length of the result is the sum of the lengths of concatenated strings:

|xy| = |x| + |y|

.... equality (1).

1.1B. Empty string

It is useful to introduce the concept of empty (or null) string, denoted by epsilon ε, as it is the only string

which satisfies the following identify:

xε = εx = x

For every string x, from equality (1), it follows that empty string has length zero:

|ε| = 0

The empty string should not be confused with the empty set. In fact, the empty set is denoted by f (also known as f language) which contains no string, whereas the set {ε}, which also denotes a language, contains one string, which is the empty string, denoted by ε.

1.1C. Sub strings

Let x = uyv be the concatenation of some, possibly empty, strings u, y and v. Then y is a substring of x, u is a prefix of x, and v is a suffix of x. A sub string (prefix, suffix) is called proper if it does not identical with string x.

Let x be a string of length at least k, i.e., |x| ³ k ³ 1. The notation initial_k(x) denotes the prefix u of x having length k, to be termed the initial of length k.

Example 4: The string x = aabacba contains the following components:

Prefixes: a, aa, aab, aaba, aabac, aabacb, aabacba, for prefix a, k = 1, for prefix aa, k=2 and so on.

Suffixes: a, ba, cba, acba, bacba, abacba, aabacba.

Sub strings: all prefixes and suffixes and the internal strings such as a, ab, ba, bacb, …

Notice that bc is not a sub string of x, although both b and c occurs in x. The initial of length two is initial₂(aabacba) = aa.

1.1D. Mirror reflection / image

The symbols of a string are usually read from left to right, but it is sometimes requested to reverse the order. The reflection of a string x =a₁a₂…a_h is the string x^R = a_ha_h-1 …a₁_.

For example, if we have:

x = manish then x^R= hsinam.

The following identities are close:

(x^R) ^R= x

(xy)^R = y^Rx^R

ε^R = ε

***Theorem 1.1: Let x be a string and let x^rev*** be “the same’’ string but backwards. Prove that *(xy)**^rev* *= y^re***^vx^revfor arbitrary strings x, y over an alphabet S.

Proof: We proceed by induction on the length of string.

Definitions: Let us consider following basic definitions with an example.

· ε^rev = ε for the empty string ε.

· a^rev = a for a string which contains only one symbol.

· A non-empty string x can be partitioned into two substrings x = ay where ‘a’ contains one or zero symbols. Thus, we can then define (ay)^rev =y^reva , where |a| £ 1.

Example 5: Let x = abc.

=> x^rev ={(abc)^rev} = {(ay)^rev, where y = bc} = {(y)^reva} = {(bc)^reva} = {(c)^revba} = cba.

Base case 1: Now, we will show that (xy)^rev = y^revx^rev, Let us first assume that x is the empty string. If so, the claim is true, because (εy)^rev= y^rev = y^revε.

Base case 2: Now, let us assume that x is a string a containing only one symbol. If so, the claim is true as well, because (ay)^rev = y^reva, by the above definition.

Induction theory: Now, let us assume that the claim (uv)^rev = v^revu^rev holds for two strings u and v. This is our induction theory.

Induction step: Now, we want to show that (auv)^rev = v^revu^reva.

(auv)^rev = (uv)^reva, by above definition. Further, (uv)^rev = v^revu^rev, by induction theory, which gives (uv)^reva = v^revu^reva, which is what we wanted to show.

Since we have shown that the claim holds for a (two) base cases (i.e., base case 1 and base case 2), and that it holds for any string which may be constructed by concatenating a symbol to the beginning of the first string, the claim holds for all strings.

Thus, prove.

***Theorem 1.2: Show that for any string w in S, (w^R)^R = w.

Proof: We proceed by induction on the length of string w.

· Base case: when |w| = 0, then w = ε , and (ε^R)^R = ε^R = ε.

· Induction theory: if |w| £ n, then (w^R)^R= w.

· Induction step: let |w| = n+1. Then w = ua for some u Î S* and a Î S such that |u| = n. In the following, the second and third equations come from the definition of the reversal and the fourth equation comes from identity of reversal.

(w^R)^R= ((ua)^R)^R

= (a^Ru^R)^R (from section 1.1D)

= (u^R)^R(a^R)^R (from section 1.1D)

= ua (from section 1.1D)

= w

=> (w^R)^R= w.

Thus, prove.

1.1E. Repetitions / iteration

When a string contains repetitions, we must have an operator denoting them. The m-th power (m ³ 1, integer) of a string x is the concatenation of x with m-1 itself m-1 times:

x^m= xx……x

m times

By condition, the zero power of any string is defined to be the empty string. The complete definition is:

x^m= x^m-1x, if m > 0

x⁰= ε

Example 6:

x = ab

x⁰ =ε

x¹= x = ab

x² = (ab)²= abab

y = a² = aa

y³= a²a²a²= a⁶

ε⁰= ε

ε² =ε

***Theorem 1.3*: Prove that ε^k = ε for all k** ³ 0.

Proof: First remember the definitions.

x^k = ε   if k = 0, x^k = xx….x (k times) if k > 0.

xε  = ε x = x (by the definition of concatenation, given in 1.1B).

We prove the claim by induction on k.

Base case: when k = 0, then ε^k = ε⁰ = ε, by the definition above.

Induction theory: let us assume that the claim is true for k = i, i.e., ε ⁱ = ε

Induction step: then we prove that the claim is true for k = i+1, i.e., ε ⁱ⁺¹ = ε

We have:

ε ⁱ⁺¹= εε….ε                    (i+1) times

= ε…..ε • ε (i time) • ε , (i.e., we can write (i + 1) times ε as (i times ε) • ε)

= εε (by the induction theory, we have assumed above.)

= ε (by the definition of concatenation, and also by the induction

theory, we assumed above.)

Thus, the claim is true for k = i + 1.

Thus, prove.

3 comments:

Unknown22 December 2015 at 20:29
It is my 1st blog written on very introductory concepts and theorems related to Alphabet, Strings, and Languages. All these concepts are widely used in Automata Theory (or Theory of Computations).
In my coming 2nd blog, I will continue the same topic with further related matters.
Thank You.
Unknown23 December 2015 at 05:51
Nice approach Sir.. This is very helpful..

Theory of Automata / Computation : Introduction of Alphabet, String, and Languages with Theorems

1. Formal Language Theory

1.1. Alphabet and Language

1.1A. String operations

1.1B. Empty string

1.1C. Sub strings

1.1D. Mirror reflection / image

Theorem 1.1: Let x be a string and let xrev be “the same’’ string but backwards. Prove that (xy)rev = yrevxrev for arbitrary strings x, y over an alphabet S.

Proof: We proceed by induction on the length of string.

Definitions: Let us consider following basic definitions with an example.

Theorem 1.2: Show that for any string w in S*, (wR)R = w.

1.1E. Repetitions / iteration

Theorem 1.3: Prove that εk = ε for all k ³ 0.

3 comments:

***Theorem 1.1: Let x be a string and let x^rev*** be “the same’’ string but backwards. Prove that *(xy)**^rev* *= y^re***^vx^revfor arbitrary strings x, y over an alphabet S.

***Theorem 1.2: Show that for any string w in S, (w^R)^R = w.

***Theorem 1.3*: Prove that ε^k = ε for all k** ³ 0.