Add Custom Model
About This Playground
This testing harness uses Transformers.js to run language models directly in your browser using WebAssembly and WebGPU. All computation happens locally - no data is sent to any server.
Generation Parameters
- Max New Tokens: Maximum number of tokens to generate
- Temperature: Controls randomness (higher = more creative, lower = more focused)
- Top-K: Limits sampling to the K most likely next tokens
- Top-P: Nucleus sampling - samples from smallest set of tokens with cumulative probability ≥ P
Chrome Built-in AI Setup
If Chrome AI shows as "Unavailable", follow these steps:
- Use Chrome 127+ (Canary, Dev, or Beta recommended)
- Enable flag:
chrome://flags/#optimization-guide-on-device-model - Enable flag:
chrome://flags/#prompt-api-for-gemini-nano - Restart Chrome
- Visit
chrome://componentsand click "Check for update" on "Optimization Guide On Device Model" - Wait for the model to download (~1.7GB), then reload this page
Platform support: Windows 10/11, macOS 13+, Linux, ChromeOS (Chromebook Plus). Not available on mobile yet.
Debugging: Open DevTools Console (F12) to see detailed availability status and setup instructions.
Performance Tips
- Chrome Built-in AI is fastest and requires no download
- First load of other models may take a while as they're downloaded and cached
- Subsequent loads will be much faster thanks to browser caching
- For quick testing, start with DistilGPT-2 (smallest download)
- TinyLlama offers the best balance of quality and size for chat applications
Memory Management
- Automatic cleanup: When switching models, the previous model is automatically disposed from memory
- Memory usage: Loaded models stay in RAM (330MB-2.2GB depending on model)
- Refresh to clear: Reload the page to completely free all memory
- Note: Downloaded model files are cached by your browser for faster subsequent loads