Lexer Test Specifications
Overview
The lexer (tokenizer) is responsible for converting a character stream into a token stream. These tests validate that the lexer correctly identifies and categorizes all TON language elements.
Test Cases
LEXER_001: Basic Structure Tokenization
Purpose: Verify tokenization of basic object structure
Input:
{ }
Expected Tokens:
- Token[0]: Type=LeftBrace, Value="{"
- Token[1]: Type=RightBrace, Value="}"
Validation: Exactly 2 tokens, correct types and values
LEXER_002: Object With Class Name
Purpose: Verify tokenization of object with class annotation
Input:
{(person)}
Expected Tokens:
- Token[0]: Type=LeftBrace, Value="{"
- Token[1]: Type=LeftParen, Value="("
- Token[2]: Type=Identifier, Value="person"
- Token[3]: Type=RightParen, Value=")"
- Token[4]: Type=RightBrace, Value="}"
LEXER_003: Property Assignment
Purpose: Verify tokenization of property assignments
Input:
name = 'John', age = 30
Expected Tokens:
- Token[0]: Type=Identifier, Value="name"
- Token[1]: Type=Equals, Value="="
- Token[2]: Type=String, Value="John"
- Token[3]: Type=Comma, Value=","
- Token[4]: Type=Identifier, Value="age"
- Token[5]: Type=Equals, Value="="
- Token[6]: Type=Number, Value="30"
LEXER_004: String Literals
Purpose: Verify tokenization of different string quote styles
Test Cases:
Input | Expected Token Value |
---|---|
'Hello World' |
Hello World |
"Hello World" |
Hello World |
`Hello World` |
Hello World |
Validation: All produce Type=String with unquoted value
LEXER_005: Escape Sequences
Purpose: Verify proper handling of escape sequences in strings
Input:
'Line 1\nLine 2\t\''
Expected Token:
- Type=String
- Value="Line 1[newline]Line 2[tab]'"
Note: [newline] and [tab] represent actual control characters
LEXER_006: Numeric Literals
Purpose: Verify tokenization of various number formats
Test Cases:
Input | Token Type | Token Value |
---|---|---|
123 |
Number | 123 |
-456 |
Number | -456 |
3.14 |
Number | 3.14 |
1.23e10 |
Number | 1.23e10 |
0xFF |
Number | 0xFF |
0b1010 |
Number | 0b1010 |
LEXER_007: Keywords
Purpose: Verify recognition of reserved keywords
Test Cases:
Input | Token Type |
---|---|
true |
Boolean |
false |
Boolean |
null |
Null |
undefined |
Undefined |
LEXER_008: GUID Tokenization
Purpose: Verify GUID recognition with and without braces
Test Cases:
- Input:
550e8400-e29b-41d4-a716-446655440000
Expected: Type=Guid, Value=550e8400-e29b-41d4-a716-446655440000 - Input:
{550e8400-e29b-41d4-a716-446655440000}
Expected: Type=Guid, Value={550e8400-e29b-41d4-a716-446655440000}
LEXER_009: Enum Tokenization
Purpose: Verify enum and enum set tokenization
Test Cases:
Input | Token Value | Description |
---|---|---|
|active| |
|active| | Single enum value |
|0| |
|0| | Numeric enum |
|read|write| |
|read|write| | Enum set |
|0|2|4| |
|0|2|4| | Numeric enum set |
LEXER_010: Special Prefixes
Purpose: Verify tokenization of special prefix characters
Test Cases:
Input | Tokens | Description |
---|---|---|
@name |
AtSign, Identifier("name") | At-prefixed property |
$value |
StringHint, Identifier("value") | String type hint |
%value |
NumberHint, Identifier("value") | Number type hint |
&value |
GuidHint, Identifier("value") | GUID type hint |
#@ header |
HeaderPrefix, Identifier("header") | Header section |
#! schema |
SchemaPrefix, Identifier("schema") | Schema section |
LEXER_011: Comment Handling
Purpose: Verify that comments are properly skipped
Test Case 1: Single-line comment
// This is a comment
name = 'value'
Expected: First token should be Identifier("name"), no comment tokens
Test Case 2: Multi-line comment
/* This is a
multi-line comment */
name = 'value'
Expected: First token should be Identifier("name"), no comment tokens
LEXER_012: Position Tracking
Purpose: Verify line and column number tracking
Input:
name = 'value'
age = 30
Expected Positions:
- Token "name": Line=1, Column=1
- Token "age": Line=2, Column=1
LEXER_013: Multi-line String Literals
Purpose: Verify tokenization of triple-quoted strings
Input:
"""
Line 1
Line 2
"""
Expected Token:
- Type=MultiLineString
- Value="Line 1\nLine 2"
Implementation Notes
- Whitespace between tokens should be ignored except within strings
- Line endings can be \n, \r\n, or \r
- Token position should reflect the starting position of the token
- Escape sequences must be processed during tokenization
- Comments should be completely skipped, not returned as tokens
Error Cases
The following should produce lexer errors:
- Unterminated strings
- Invalid escape sequences
- Malformed numbers (e.g., 0x, 0b without digits)
- Invalid characters outside of strings