1 · Train a tokenizer

Build a Byte Pair Encoding vocabulary for your language. Drop text files, pick a vocab size, watch merges happen on your GPU. Save the .json — Pre-tokenize feeds it to the transformer in step 3.

Drop files here or browse
Multiple files OK · browse folder for recursive scan
No vocabulary loaded — train one above or load a .json

3 · Train a transformer

Two flows in one: pretrain a fresh foundation model on your .bin, or load a .llm checkpoint and fine-tune it on a smaller, task-specific corpus. Forward, backward, and AdamW all run on your GPU — no server, no cloud.

LLM engine not initialized
Drop a .bin file or browse
Produced by the Pre-tokenize tab
d_model
n_heads
n_kv_heads
n_layers
d_ff mult
activation
seq_len
lr
steps
grad_accum
optimizer
precision
label_smooth
z_loss
warmup
min_lr
weight_decay
grad_clip

Generate from a model

Drop a .llm checkpoint, type a prompt, watch tokens stream. No .bin needed — vocab + weights ride with the checkpoint. Useful for sanity-checking a run, comparing two checkpoints, or just playing with sampling knobs.

No model loaded — drop a .llm to begin