Lexical Structure

This document defines the lexical structure of the TON file format, including character sets, whitespace handling, comments, identifiers, and all supported literal types.

Character Set

TON files are encoded in UTF-8. All Unicode characters are allowed within string literals. Outside of strings, only ASCII characters have syntactic meaning.

Whitespace

Whitespace characters (spaces, tabs, newlines) are generally ignored except:

Inside string literals where they are preserved
As separators between tokens
In multi-line strings where intelligent indentation processing applies

Comments

// Single-line comment
# Alternative single-line comment
/* Multi-line comment
   can span multiple lines */

Identifiers

Identifiers (property names) can be:

Unquoted if they match: [a-zA-Z_][a-zA-Z0-9_]*
Numeric (unquoted): 123, 456property
Quoted for any character sequence: "complex.name"
Prefixed with @: @propertyName

Keywords

Reserved keywords in TON:

true, false - Boolean literals
null - Null value
undefined - Undefined value

String Literals

// Single-quoted
'Simple string'

// Double-quoted
"String with \n escapes"

// Triple-quoted multi-line
"""
Multi-line string
with intelligent indentation
"""

Number Literals

// Decimal
123
-456
3.14159

// Hexadecimal
0xFF
0x1A2B

// Binary
0b1010
0B11110000

// Scientific notation
1.23e-4
6.02e23

Special Tokens

Token	Symbol	Usage
LeftBrace	{	Start of object
RightBrace	}	End of object
LeftParen	(	Class type start
RightParen	)	Class type end
LeftBracket	[	Array start
RightBracket	]	Array end
Equals	=	Assignment
Comma	,	Separator
Pipe	\|	Enum delimiter
ForwardSlash	/	Path separator in schemas