Formal Languages & Automata 1
Basic
Kleene Star/Clousre ()
- Unary Operator on the symbol set
- Infinite set of all possible strings of all possible lengths over the set , including the
empty string - eg. For
Kleene plus()
Excluding the
Languages
Languages are subsets of the Kleene star set over some alphabet set , which could be finite or infinite.
Syntax
Semantics
Natural Languages
Formal Languages
Formal Language
Formal language is specified by a well-defined set of rules of syntax;
Vocabulary
A vocabulary (alphabet) is a finite, nonempty set of elements called symbols.
A word (sentence) over V is a string of finite length of elements of
The empty string or null string, denoted by , is the string containing no symbols
The set of all words over is denoted by
A language over is a subset of
Phrase-Structure
consists of:a vocabulary
a subset of consisting of terminal symbols
a start symbol from
a finite set of productions .
The set is denoted by .
nonterminal symbols : Elements of are called
Every production in must contain at least one nonterminal on its left side.
A Language
Let be a phrase-structure grammar.
The language generated by denoted by is the set of all strings of terminals that are derivable from the starting state .
Four Types of Grammar
- Turning Machines Unrestricted or Phrase-Structure grammar
- Context sensitive grammar
- Push Down Automata - Context-Free grammar
- Finite State Automata
Decidable Languages
If exist a TM that accepts and halts on every input string , then the language is called decidable or recursive.
Every decidable language is Turing-Acceptable.
If all yes instances of problem $P$ have decidable language , then the decision problem is decidable.
Undecidable Languages
There does not exist a TM that accepts the language and makes a decision on every input string for undecidable languages(Tho a TM might make a decision on some input strings).
If all yes instances of problem have languages that are undecidable, then is termed undecidable
Inexpressible languages not recursive languages, but sometimes recursively enumerable languages.
Derivation(Parsing)
Application of a series of grammar productions or rules
Ultimately result in a string composed entirely of terminal symbols.
- leftmost derivation: At each step, only rewrite the leftmost non-terminal symbol
- rightmost derivation(Canonical derivation): ⬆️RightMost
Unambiguous Grammar:Only one sentence corresponding to each parse tree
Ambiguous Grammar: Allow multiple parse trees for serveral sentences
Mildly context-sensitive language:
- Allow for limited forms of context sensitivity beyond what is expressible by
context-free grammars - Include tree-adjoining grammars, linear context-free rewriting systems, and range concatenation grammars.
Sub-Regular Hierarchy:
- Include classes such as finite-state languages, star-free languages, visibly pushdown
languages, and others. - Each level represents a different degree of complexity in terms of the computational resources
required to recognize or generate languages within that level.
Regular Languages
Regular Grammar (Type 3 Grammar):
A type of formal grammar characterized by production rules of the form or
where (non-terminal symbols) and (terminal symbols).
Also referred to as a finite-state grammar (FSG) or a left-linear grammar.
If the production rule is of the form , then it’s called a right-linear grammar.
Languages generated by these grammars are recognized by finite automata.
In a right-linear grammar: or ,
In a left-linear grammar: or ,
They are the most restrictive, allowing only a single non-terminal symbol on each side.
Arden’s Theorem:
Arden’s theorem is primarily used to find regular expressions for finite automata.
Pumping Lemma for Regular Languages:
To prove that specific languages are not within a given language class.
Cannot determine if a language is in a given class, as satisfying the lemma is a necessary but not sufficient condition for membership in a class.
Context-Free Language (CFL)
Context-Free Grammar(Type 2 Grammar):
All rules have the form , where and .
These languages are recognized by pushdown automata.
Ambiguity in Context-Free Languages:
If a context-free grammar has more than one parse tree for a string , it is said to be
ambiguous. For some strings generated by this grammar, there are multiple rightmost or leftmost derivations.
Pumping Lemma for Context-Free Languages:
Similar to the pumping lemma for regular languages, it’s used to check if a grammar is context-free. It’s based on the
pigeonhole principle.
Closure Properties of Context-Free Languages:
Under operations like Union, Concatenation, and Kleene Star, context-free languages remain closed.
Context-Sensitive Language (CSL):
Context-Sensitive Grammar (Type 1 Grammar):
The difference between the left and right sides of the derivation rules is reflected in
the middle of the strings, representing context sensitivity in everyday life.
If the production rules of a grammar satisfy the form
where , , and contains at least one character, then is called a context-sensitive
grammar.
Recognized by linear-bounded automata.
Recursively Enumerable Languages:
Unrestricted Grammar (Type 0 Grammar):
The production rules satisfy ,
where and are strings, contains at least one non-terminal symbol, and it’s not an
empty string. Languages generated by these grammars are recognized by Turing machines. Due to their generality and lack
of restrictions, they cannot describe the syntax of programming or natural languages.
Backus-Naur Form
Context Free
eg. Identifier
1 | <identifier> ::== <letter> | <identifier><letter> | <identifier><digit> |
Automata
Finite-state machine
A finite-state machine consists of a finite set
of states,
: a finite output alphabet
: a transition funciton that assigns to each state and input pair a new state
: an output function that assigns to each state and input pair an output
: an initial state
Deterministic Finite Automata(DFA)
- In a DFA, for each input character, the machine transitions to exactly one state.
- The uniqueness of computation is referred to as determinism.
- DFAs do not accept epsilon transitions.
Non-deterministic Finite Automata (NFA):
NFAs are capable of transitioning to any number of states for a particular input. They can accept epsilon transitions.
With Output Function FSM:
Moore Machine: Output function depends only on the state
Mealy Machine:Output function depends on the state and input symbol
Without Output function FSM (Semiautomatic or Transition System): DFA, NFA,-NFA
Classification
- Acceptors: Indicate whether or not the received input is accepted
- Classifiers: Generalization of acceptors that produce n-ary output where n is strictly greater than two
- Transducers: Produce output based on given input.
- Sequencers(generators): Produce only a sequence as an output sequence of acceptor or transducer outputs
Pushdown Automata(PDA)
- A finite automata with extra memory called stack
- Stack help PDA to recognize Context Free Languages
New elements are pushed onto the stack.
These machines can do more than finite state machines but less than Turing machines.
There are two forms of PDA: “deterministic” and “non-deterministic,”
Extending PDA to allow a FSM to access two stacks to obtain a more powerful automaton equivalent to a TM
PDA components:
- Input tape
- Control unit
- Stack with unlimited size
A PDA can be formally described as a 7-tuple , where:
- : the finite set of states.
- : the input alphabet.
- : the stack alphabet.
- : the transition function: .
- : the initial state ().
- : the initial stack symbol ().
- : a set of accepting states ().
Linear-bounded Automata(LBA)
A restricted type of TM.
Its computation is limited to a finite bounded region.
Can be used as acceptors for CSL. Similar to TM
Characteristics:
- Non-deterministic logic
- Multiple tracks
- Tape with finite length
An LBA can be formally described as an 8-tuple , where:
- : the finite set of transition states.
- : the tape alphabet.
- : the input alphabet.
- : the initial state.
- : the left boundary of the tape.
- : the right boundary of the tape.
- : the transition function.
- : a finite set of final states.
Turning Machine
: a state : the replacement symbol : thr direction and control unit also can move- The machine starts
- Moves left or right replacing symbols as defined
- Change the state as defined
- Reach a halting state
- The tape has changed from input to output
TM changes symbols on the tape and simulate the execution and storage of a computer.
Hence it could model all computations that can be computed by modern computers today.
The language consisting of strings accepted by a TM is called recursively enumerable language.
If the TM halts for every string, then the language is called recursive language.
Automata Theory (Chomsky hierarchy)
Grammar | Langugages | Automaton | Production Rules(Constraints) |
---|---|---|---|
Type-0 | Recursively enumerable | Turing Machine | |
Type-1 | Context-Sensitive | Linear-bounded non-deterministic Turning Machine | |
Type-2 | Context-Free | Non-deterministic pushdown automaton | |
Type-3 | Regular | Finite State Automaton | and |