Semantic Parser
The @lokascript/semantic package parses multilingual
hyperscript directly into executable structures — without
requiring an English intermediate step.
Semantic vs i18n
The two packages serve different purposes:
| Semantic Parser | i18n Grammar | |
|---|---|---|
| Purpose | Parse for execution | Transform for display |
| Direction | Any language → AST | Any language ↔ any language |
| Output | Structured semantic nodes | Translated text string |
| Confidence | Scored (0–1) | Deterministic |
| Use case | Runtime, adapter plugin | Documentation, teaching |
Use the semantic parser when you need to execute
multilingual code. Use i18n when you need to translate code
for display or documentation.
How It Works
The parser uses a three-phase pipeline:
1. Tokenization
Input is split into tokens using language-aware rules:
English: "toggle .active"
→ [toggle (keyword), .active (selector)]
Japanese: ".active を 切り替え"
→ [.active (selector), を (particle), 切り替え (keyword)]
Each language has its own tokenization strategy. Japanese uses
particles (を, に, で) as word boundaries. English uses
spaces. Arabic handles right-to-left text.
2. Pattern Matching
Tokens are matched against language-specific command patterns:
English pattern (SVO):
[COMMAND_KEYWORD] [SELECTOR]
→ action=toggle, patient=.active
Japanese pattern (SOV):
[SELECTOR] [PARTICLE:を] [COMMAND_KEYWORD]
→ patient=.active, action=toggle
The same semantic roles are extracted regardless of language —
the patterns just account for different word orders.
3. Confidence Scoring
Each match produces a confidence score (0–1):
- ≥ 0.7 — High confidence. Use the semantic result.
- 0.5–0.7 — Medium. May fall back to traditional parser.
- < 0.5 — Low. Falls back to original text.
SOV languages (Japanese, Korean, Turkish) naturally produce
lower scores due to greater structural ambiguity. Use
per-language thresholds to tune this.
API
parse(code, options?)
Parse hyperscript with language-aware semantic analysis:
import { parse } from '@lokascript/semantic';
const result = parse('on click toggle .active on me', {
language: 'en',
confidenceThreshold: 0.7
});
console.log(result.confidence); // 0.98
console.log(result.language); // 'en'
console.log(result.ast); // Parsed AST
detect(code)
Auto-detect the language of hyperscript code:
import { detect } from '@lokascript/semantic';
const lang = detect('クリック で 私 の .active を 切り替え');
// { language: 'ja', confidence: 0.95 }
translate(code, fromLang, toLang)
Parse in one language and render in another:
import { translate } from '@lokascript/semantic';
const english = translate('.active を 切り替え', 'ja', 'en');
// → 'toggle .active'
Language-Specific Features
Japanese
- Particles (
を,に,から,で) mark semantic roles - No spaces — word boundaries from particle positions
- Morphological normalization handles conjugated verb forms
(e.g.,切り替えて→切り替え)
Korean
- Similar particle system to Japanese (
을/를,에,에서) - Vowel-dependent particle selection (을 vs 를)
Arabic
- Right-to-left text processing
- VSO word order (verb first)
- Diacritics stripped during normalization
Turkish
- Agglutinative suffixes instead of separate particles
- Vowel harmony in suffix selection
When the Semantic Parser Is Used
- LokaScript runtime — The core runtime uses the semantic
parser to handle_="..."attributes in non-English
languages - Adapter plugin — The
@lokascript/hyperscript-adapter
uses semantic parsing to translate code before the original
_hyperscript runtime processes it - Programmatic use — Call
parse()ortranslate()for
code analysis tools, linters, or documentation generators
Next Steps
- Writing in Your Language — Practical
guide to writing multilingual hyperscript - Grammar Transformation — How the i18n
grammar system transforms between languages - API Reference — Full semantic
parser API documentation