+ Add Custom Model

About This Playground

This testing harness uses Transformers.js to run language models directly in your browser using WebAssembly and WebGPU. All computation happens locally - no data is sent to any server.

Generation Parameters

  • Max New Tokens: Maximum number of tokens to generate
  • Temperature: Controls randomness (higher = more creative, lower = more focused)
  • Top-K: Limits sampling to the K most likely next tokens
  • Top-P: Nucleus sampling - samples from smallest set of tokens with cumulative probability ≥ P

Chrome Built-in AI Setup

If Chrome AI shows as "Unavailable", follow these steps:

  1. Use Chrome 127+ (Canary, Dev, or Beta recommended)
  2. Enable flag: chrome://flags/#optimization-guide-on-device-model
  3. Enable flag: chrome://flags/#prompt-api-for-gemini-nano
  4. Restart Chrome
  5. Visit chrome://components and click "Check for update" on "Optimization Guide On Device Model"
  6. Wait for the model to download (~1.7GB), then reload this page

Platform support: Windows 10/11, macOS 13+, Linux, ChromeOS (Chromebook Plus). Not available on mobile yet.

Debugging: Open DevTools Console (F12) to see detailed availability status and setup instructions.

Performance Tips

  • Chrome Built-in AI is fastest and requires no download
  • First load of other models may take a while as they're downloaded and cached
  • Subsequent loads will be much faster thanks to browser caching
  • For quick testing, start with DistilGPT-2 (smallest download)
  • TinyLlama offers the best balance of quality and size for chat applications

Memory Management

  • Automatic cleanup: When switching models, the previous model is automatically disposed from memory
  • Memory usage: Loaded models stay in RAM (330MB-2.2GB depending on model)
  • Refresh to clear: Reload the page to completely free all memory
  • Note: Downloaded model files are cached by your browser for faster subsequent loads