Alex Angelopoulos (aka at mvps dot org)
This is a summary of my posts to the scripting newsgroups on the subject of self-lexing scripts.
Stripping VBScript prior to tokenizing
Date: 2002-11-14 18:30:04 PST
An initial attempt at processing script via script, using regular expressions on the entire script to pre-process, then going through and "normalizing" lines of code.
Date: 2002-11-22 06:22:58 PST
Second iteration of the ideas from 11-14, packaged into a component and able to "code mine" to extract functions from script, do crude pretty-printing, etc.
Script Parsing Part 1: The Way to NOT do it.
Date: 2002-12-13 15:54:02 PST
I'm good at posts like this!;)
General summation of some of the issues and ideas I had in dealing with VBScript as a text unit. Looking back, I think this post marks when I was just beginning to realize the importance of dealing with script on an atomic level, as a character stream, as opposed to any larger intermediary units.
Script Parsing Part 2: Brief Discussion of State Machines
Date: 2002-12-14 11:23:48 PST
Arm-waving of "this is what a state machine is all about". Simplistic.
Script Parsing Part 3: Simplistic Finite Machine State Machine for VBScript
Date: 2002-12-17 11:46:49 PST
Examples for the prior post. I started with a `degenerate' state machine which had input and output but was single-state. Followed with a 2-state machine and a lightly commented FSM for parsing script, derived from the `Script Cooker'.
Includes links to some web resources. I've changed my mind about the merit of some, particularly Libero...
Finite State Machine based VBScript Parser
Date: 2002-12-22 11:45:00 PST
This is actually a lexer I believe, NOT a parser - and I did it the hard way, reading character by character and building tokens. Implemented as a class "fsmParser". It reads a character "stream"; it emits tokens to a stack; when it hits the end of a statement, it emits a complete statement as a token list (actually an array).
Does not handle some niceties such as date expressions very well. In terms of practical value, it is probably much less useful than `Script Cooker'. As theory, some people may find it useful for FSM concepts, but I was still very unclear on the idea so even that is doubtful.
Code Parsing Redux - stateless regex-based subtractive tokenizing
Date: 2003-02-27 11:52:54 PST
I have no idea what to call this. This is almost pure code - a regex-based direct parser. Instead of switching states and looking at input, it uses a single master state which checks for conformance to a set of patterns.