Monday 26 December 2016

Regular Expression - Part 2

How regular expression works?

There are two type of regular expression engines. They are

  1. Text-directed engine (or) Deterministic Finite Automatons (DFA)
  2. Regex-directed engine (or) Nondeterministic Finite Automaton (NFA)
NFA is the commonly used interpreter as it is much more powerful for implementing advanced regular expressions. NFA is used in languages like Perl, Python, sed whereas DFA is commonly found in egrep and some implementation of awk. And they both differ in performance significantly, DFA works faster and more consistent than NFA. 

In NFA the regular expression pattern will be compiled into byte-code and the engine executes the code one at a time, moving from one state to another. If the result in a state is failure then it backtracks to try another string to see if it matches the input. Lets go through an example to understand it better. So the job is to find a string "that" in a sentence.

Input sentence:
The one who are crazy enough to think that they can change the world are the ones who do.

The regular expression would be:  /that/



Give the task to match the string "that", the javascript engine moves to the first character to see if it matches the pattern, yes it is. Then it moves forward to the next character and see if it matches, yes it is again. Then it moves to the next character "e", oops! failed to match. So it backtrack to position 2 to see if the character "h" forms the start of a pattern match, Nope, its not... Similarly, it keeps moving back and forth until it find the exact match "that" at position 39-42. The regular expression engine is a eager algorithm which return a match as soon as it find one and terminate the program, unless /g is specified in the regex to keep moving forward. 

Appendix

What is finite state automata?
Its a mathematical model for computation, represented using states. At any point in time, a program can be in only one state. They are extremely powerful for parsers which decide if a given string matches a specific pattern or not.


No comments:

Post a Comment