Lexical Structure
This document defines the lexical structure of the TON file format, including character sets, whitespace handling, comments, identifiers, and all supported literal types.
Character Set
TON files are encoded in UTF-8. All Unicode characters are allowed within string literals. Outside of strings, only ASCII characters have syntactic meaning.
Whitespace
Whitespace characters (spaces, tabs, newlines) are generally ignored except:
- Inside string literals where they are preserved
- As separators between tokens
- In multi-line strings where intelligent indentation processing applies
Comments
// Single-line comment
# Alternative single-line comment
/* Multi-line comment
can span multiple lines */
Identifiers
Identifiers (property names) can be:
- Unquoted if they match:
[a-zA-Z_][a-zA-Z0-9_]*
- Numeric (unquoted):
123
,456property
- Quoted for any character sequence:
"complex.name"
- Prefixed with @:
@propertyName
Keywords
Reserved keywords in TON:
true
,false
- Boolean literalsnull
- Null valueundefined
- Undefined value
String Literals
// Single-quoted
'Simple string'
// Double-quoted
"String with \n escapes"
// Triple-quoted multi-line
"""
Multi-line string
with intelligent indentation
"""
Number Literals
// Decimal
123
-456
3.14159
// Hexadecimal
0xFF
0x1A2B
// Binary
0b1010
0B11110000
// Scientific notation
1.23e-4
6.02e23
Special Tokens
Token | Symbol | Usage |
---|---|---|
LeftBrace | { | Start of object |
RightBrace | } | End of object |
LeftParen | ( | Class type start |
RightParen | ) | Class type end |
LeftBracket | [ | Array start |
RightBracket | ] | Array end |
Equals | = | Assignment |
Comma | , | Separator |
Pipe | | | Enum delimiter |
ForwardSlash | / | Path separator in schemas |