Basic

Kleene Star/Clousre (\sum^*)

  • Unary Operator on the symbol set \sum
  • Infinite set of all possible strings of all possible lengths over the set \sum, including the
    empty string λ\lambda
  • eg. For ={a,b},={λ,a,b,aa,bb,ab,ba,}\sum= \bigl\{ a,b \bigr\} ,\sum^*=\bigl\{ \lambda,a,b,aa,bb,ab,ba,\dots\bigr\}

Kleene plus(+\sum^+)

Excluding the λ(Empty String)\lambda(\text{Empty String})

Languages

Languages are subsets of the Kleene star set \sum^* over some alphabet set \sum, which could be finite or infinite.

  • Syntax

  • Semantics

  • Natural Languages

  • Formal Languages

Formal Language

Formal language is specified by a well-defined set of rules of syntax;

Vocabulary

A vocabulary (alphabet) VV is a finite, nonempty set of elements called symbols.

A word (sentence) over V is a string of finite length of elements of VV

The empty string or null string, denoted by λ\lambda, is the string containing no symbols

The set of all words over VV is denoted by VV^*

A language over VV is a subset of VV^*

Phrase-Structure

G=(V,T,S,P)G = (V, T, S, P) consists of:
  • a vocabulary VV

  • a subset TT of VV consisting of terminal symbols

  • a start symbol SS from VV

  • a finite set of productions PP.

The set VTV - T is denoted by NN.

nonterminal symbols : Elements of NN are called

Every production in PP must contain at least one nonterminal on its left side.

A Language

Let G=(V,T,S,P)G = (V, T, S, P) be a phrase-structure grammar.

The language generated by GG denoted by L(G)L(G) is the set of all strings of terminals that are derivable from the starting state SS.

L(G)=wTS*wL(G)={w\in T^* | S \stackrel{\text{*}}{\Rightarrow}w}

Four Types of Grammar

  • Turning Machines Unrestricted or Phrase-Structure grammar
    • uv,uVNVu \rightarrow v, u\in V^*N V^*
  • Context sensitive grammar
    • uv,uv,length increasing grammaru \rightarrow v,|u| \leq|v|, \text{length increasing grammar}
    • α\alpha NN β\beta \rightarrow α\alpha χ\chi β\beta
    • α,χ,βV\alpha,\chi,\beta \in V^{*}
  • Push Down Automata - Context-Free grammar
    • Aα,where AT and αVA \rightarrow \alpha, \text{where }A \in T \text{ and } \alpha \in V^*
  • Finite State Automata
    • AaB,A \rightarrow aB, A,BTA,B \in T
    • Ab,A \rightarrow b, bT or λb \in T \text{ or }\lambda

Decidable Languages

  • If exist a TM that accepts and halts on every input string SS, then the language is called decidable or recursive.

  • Every decidable language is Turing-Acceptable.

  • If all yes instances of problem $P$ have decidable language LL, then the decision problem PP is decidable.

Undecidable Languages

  • There does not exist a TM that accepts the language and makes a decision on every input string SS for undecidable languages(Tho a TM might make a decision on some input strings).

  • If all yes instances of problem PP have languages LL that are undecidable, then PP is termed undecidable

  • Inexpressible languages not recursive languages, but sometimes recursively enumerable languages.

Derivation(Parsing)

Application of a series of grammar productions or rules

Ultimately result in a string composed entirely of terminal symbols.

  • leftmost derivation: At each step, only rewrite the leftmost non-terminal symbol
  • rightmost derivation(Canonical derivation): ⬆️RightMost

Unambiguous Grammar:Only one sentence corresponding to each parse tree

Ambiguous Grammar: Allow multiple parse trees for serveral sentences

Mildly context-sensitive language:

  • Allow for limited forms of context sensitivity beyond what is expressible by
    context-free grammars
  • Include tree-adjoining grammars, linear context-free rewriting systems, and range concatenation grammars.

Sub-Regular Hierarchy:

  • Include classes such as finite-state languages, star-free languages, visibly pushdown
    languages, and others.
  • Each level represents a different degree of complexity in terms of the computational resources
    required to recognize or generate languages within that level.

Regular Languages

Regular Grammar (Type 3 Grammar):

A type of formal grammar characterized by production rules of the form ABxA \rightarrow Bx or AxA \rightarrow x
where A,BNA, B \in N (non-terminal symbols) and xΣx \in \Sigma (terminal symbols).

Also referred to as a finite-state grammar (FSG) or a left-linear grammar.
If the production rule is of the form AxBA \rightarrow xB , then it’s called a right-linear grammar.
Languages generated by these grammars are recognized by finite automata.

BB is a non-terminal symbol.
  • In a right-linear grammar: AxBA \rightarrow xB or AxA \rightarrow x ,

  • In a left-linear grammar: ABxA \rightarrow Bx or AxA \rightarrow x ,

  • They are the most restrictive, allowing only a single non-terminal symbol on each side.

Arden’s Theorem:

Arden’s theorem is primarily used to find regular expressions for finite automata.

Pumping Lemma for Regular Languages:

To prove that specific languages are not within a given language class.
Cannot determine if a language is in a given class, as satisfying the lemma is a necessary but not sufficient condition for membership in a class.

Context-Free Language (CFL)
Context-Free Grammar(Type 2 Grammar):

All rules have the form AαA \rightarrow \alpha , where ANA \in N and α(NΣ) \alpha \in (N \cup \Sigma)^* .

These languages are recognized by pushdown automata.

Ambiguity in Context-Free Languages:

If a context-free grammar GG has more than one parse tree for a string wL(G)w \in L(G), it is said to be
ambiguous. For some strings generated by this grammar, there are multiple rightmost or leftmost derivations.

Pumping Lemma for Context-Free Languages:

Similar to the pumping lemma for regular languages, it’s used to check if a grammar is context-free. It’s based on the
pigeonhole principle.

Closure Properties of Context-Free Languages:

Under operations like Union, Concatenation, and Kleene Star, context-free languages remain closed.

Context-Sensitive Language (CSL):
Context-Sensitive Grammar (Type 1 Grammar):

The difference between the left and right sides of the derivation rules is reflected in
the middle of the strings, representing context sensitivity in everyday life.
If the production rules of a grammar GG satisfy the form αAβαγβ\alpha A \beta \rightarrow \alpha \gamma \beta
where ANA \in N , α,β,γ(NΣ)\alpha, \beta, \gamma \in (N \cup \Sigma)^* , and γ\gamma contains at least one character, then GG is called a context-sensitive
grammar.

Recognized by linear-bounded automata.

Recursively Enumerable Languages:
Unrestricted Grammar (Type 0 Grammar):

The production rules satisfy αβ\alpha \rightarrow \beta ,
where α\alpha and β\beta are strings, α\alpha contains at least one non-terminal symbol, and it’s not an
empty string. Languages generated by these grammars are recognized by Turing machines. Due to their generality and lack
of restrictions, they cannot describe the syntax of programming or natural languages.

Backus-Naur Form

Context Free

eg. Identifier

1
2
3
4
5
<identifier> ::== <letter> | <identifier><letter> | <identifier><digit>

<letter> ::== a|b| \cdots |y|z

<digit> ::== 0|1|2|3|4|5|6|7|8|9

Automata

Finite-state machine

A finite-state machine M=(S,I,O,f,g,s0)M = (S,I,O,f,g,s_0) consists of a finite set SS
of states,

II: a finite input alphabet
OO: a finite output alphabet
ff: a transition funciton that assigns to each state and input pair a new state
gg: an output function that assigns to each state and input pair an output
s0s_0: an initial state

Deterministic Finite Automata(DFA)

  • In a DFA, for each input character, the machine transitions to exactly one state.
  • The uniqueness of computation is referred to as determinism.
  • DFAs do not accept epsilon transitions.

Non-deterministic Finite Automata (NFA):

NFAs are capable of transitioning to any number of states for a particular input. They can accept epsilon transitions.

With Output Function FSM:

Moore Machine: Output function depends only on the state

Mealy Machine:Output function depends on the state and input symbol
Without Output function FSM (Semiautomatic or Transition System): DFA, NFA,ϵ\epsilon-NFA

Classification

  • Acceptors: Indicate whether or not the received input is accepted
  • Classifiers: Generalization of acceptors that produce n-ary output where n is strictly greater than two
  • Transducers: Produce output based on given input.
  • Sequencers(generators): Produce only a sequence as an output sequence of acceptor or transducer outputs

Pushdown Automata(PDA)

  • A finite automata with extra memory called stack
  • Stack help PDA to recognize Context Free Languages

New elements are pushed onto the stack.
These machines can do more than finite state machines but less than Turing machines.
There are two forms of PDA: “deterministic” and “non-deterministic,”

Extending PDA to allow a FSM to access two stacks to obtain a more powerful automaton equivalent to a TM

PDA components:

  1. Input tape
  2. Control unit
  3. Stack with unlimited size

A PDA can be formally described as a 7-tuple M=(Q,Σ,S,δ,q0,I,F)M = (Q, \Sigma, S, \delta, q_0, I, F) , where:

  • QQ : the finite set of states.
  • Σ\Sigma : the input alphabet.
  • SS : the stack alphabet.
  • δ\delta : the transition function: Q×(Σ{ε})×SQ×SQ \times (\Sigma \cup \{\varepsilon\}) \times S \rightarrow Q \times S^* .
  • q0q_0 : the initial state (q0Qq_0 \in Q ).
  • II : the initial stack symbol (ISI \in S ).
  • FF : a set of accepting states (FQF \subset Q ).

Linear-bounded Automata(LBA)

  • A restricted type of TM.

  • Its computation is limited to a finite bounded region.

  • Can be used as acceptors for CSL. Similar to TM

  • Characteristics:

    • Non-deterministic logic
    • Multiple tracks
    • Tape with finite length

An LBA can be formally described as an 8-tuple M=(Q,T,E,q0,ML,MR,S,F)M = (Q, T, E, q_0, ML, MR, S, F) , where:

  • QQ : the finite set of transition states.
  • TT : the tape alphabet.
  • EE : the input alphabet.
  • q0q_0 : the initial state.
  • MLML : the left boundary of the tape.
  • MRMR : the right boundary of the tape.
  • SS : the transition function.
  • FF : a finite set of final states.

Turning Machine

f(s1,a)=(s2,A,R)f(s_1,a)=(s_2,A,R) sis_i : a state AA: the replacement symbol R(right)R(right): thr direction and control unit also can move L(Left)L(Left)
  1. The machine starts
  2. Moves left or right replacing symbols as defined
  3. Change the state as defined
  4. Reach a halting state
  5. The tape has changed from input to output

TM changes symbols on the tape and simulate the execution and storage of a computer.
Hence it could model all computations that can be computed by modern computers today.

The language consisting of strings accepted by a TM is called recursively enumerable language.

If the TM halts for every string, then the language is called recursive language.

Automata Theory (Chomsky hierarchy)

Grammar Langugages Automaton Production Rules(Constraints)
Type-0 Recursively enumerable Turing Machine αβ\alpha \rightarrow \beta
Type-1 Context-Sensitive Linear-bounded non-deterministic Turning Machine αAβαγβ\alpha A\beta \rightarrow \alpha \gamma \beta
Type-2 Context-Free Non-deterministic pushdown automaton AγA \rightarrow \gamma
Type-3 Regular Finite State Automaton AαA \rightarrow \alpha and AαBA \rightarrow \alpha B