Chapter 8
Context sensitive lexer

8.1 Introduction

Before the version 2 of TPG, lexers were context sensitive. That means that the parser commands the lexer to match some tokens, i.e. different tokens can be matched in a same input string according to the grammar rules being used. These lexers were very flexible but slower than context free lexers because TPG backtracking caused tokens to be matched several times.

In TPG 2, the lexer is called before the parser and produces a list of tokens from the input string. This list is then given to the parser. In this case when TPG backtracks the token list remains unchanged.

Since TPG 2.1.2, context sensitive lexers have been reintroduced in TPG. By default lexers are context free but the CSL option (see  5.3.1) turns TPG into a context sensitive lexer.

8.2 Grammar structure

CSL grammar have the same structure than non CSL grammars (see  5.1) except from the lexer = CSL option (see  5.3.1).

8.3 CSL lexers

8.3.1 Regular expression syntax

The CSL lexer is based on the re module. The difference with non CSL lexers is that the given regular expression is compiled as this, without any encapsulation.

8.3.2 Token matching

In CSL parsers, tokens are matched as in non CSL parsers (see  6.3).

8.4 CSL parsers

There is no difference between CSL and non CSL parsers.