Lexical Structure

This document defines the lexical structure of the TON file format, including character sets, whitespace handling, comments, identifiers, and all supported literal types.

Character Set

TON files are encoded in UTF-8. All Unicode characters are allowed within string literals. Outside of strings, only ASCII characters have syntactic meaning.

Whitespace

Whitespace characters (spaces, tabs, newlines) are generally ignored except:

  • Inside string literals where they are preserved
  • As separators between tokens
  • In multi-line strings where intelligent indentation processing applies

Comments

// Single-line comment
# Alternative single-line comment
/* Multi-line comment
   can span multiple lines */

Identifiers

Identifiers (property names) can be:

  • Unquoted if they match: [a-zA-Z_][a-zA-Z0-9_]*
  • Numeric (unquoted): 123, 456property
  • Quoted for any character sequence: "complex.name"
  • Prefixed with @: @propertyName

Keywords

Reserved keywords in TON:

  • true, false - Boolean literals
  • null - Null value
  • undefined - Undefined value

String Literals

// Single-quoted
'Simple string'

// Double-quoted
"String with \n escapes"

// Triple-quoted multi-line
"""
Multi-line string
with intelligent indentation
"""

Number Literals

// Decimal
123
-456
3.14159

// Hexadecimal
0xFF
0x1A2B

// Binary
0b1010
0B11110000

// Scientific notation
1.23e-4
6.02e23

Special Tokens

Token Symbol Usage
LeftBrace{Start of object
RightBrace}End of object
LeftParen(Class type start
RightParen)Class type end
LeftBracket[Array start
RightBracket]Array end
Equals=Assignment
Comma,Separator
Pipe|Enum delimiter
ForwardSlash/Path separator in schemas