1 · Train a tokenizer

Build a Byte Pair Encoding vocabulary for your language. Drop text files, pick a vocab size, watch merges happen on your GPU. Save the .json — Pre-tokenize feeds it to the transformer in step 3.

Training Data

Drop files here or browse

Multiple files OK · browse folder for recursive scan

Vocabulary Size

Inspect

No vocabulary loaded — train one above or load a .json

Output Log

3 · Train a transformer

Two flows in one: pretrain a fresh foundation model on your .bin, or load a .llm checkpoint and fine-tune it on a smaller, task-specific corpus. Forward, backward, and AdamW all run on your GPU — no server, no cloud.

Engine

LLM engine not initialized

Dataset (.bin)

Drop a .bin file or browse

Produced by the Pre-tokenize tab

Model Size

d_model

n_heads

n_kv_heads

n_layers

d_ff mult

activation

Training

seq_len

steps

grad_accum

optimizer

precision

label_smooth

z_loss

warmup

min_lr

weight_decay

grad_clip

Checkpoint

Output Log

Generate from a model

Drop a .llm checkpoint, type a prompt, watch tokens stream. No .bin needed — vocab + weights ride with the checkpoint. Useful for sanity-checking a run, comparing two checkpoints, or just playing with sampling knobs.

Model

No model loaded — drop a .llm to begin