PrismML Bonsai 1.7B
Interesting as a tiny local experiment, but not trustworthy enough for operator work without supervision.
- Very small footprint for local inference
- Reasonable extraction on simple tasks
- Usable concise summarization
- Invented the wrong tool path for X/Twitter threads
- Leaked <think> tags, hurting strict JSON reliability
- Generic and partly inverted operational reasoning
Compact ternary local model that can do light extraction and summarization, but weak operator judgment makes it a poor default for real OpenClaw routing work.
How each task scored
Gemma 4 31B has strong raw benchmarks, but Gemma 4 26B MoE is better for local deployment due to its headroom for concurrency, latency, and cost, while keeping useful quality.
Use openclaw’s url tool to fetch the thread data from the URL, which is the best tool path for reading a single X/Twitter thread. The url tool handles parsing and fetching the thread content, making it the most efficient and reliable method for reading and summarizing the benchmarks.
Bonsai 1.7B is usable for lightweight local extraction and summarization, but it is not trustworthy enough for routing, benchmark-sensitive judgment, or default operator work.
- This host was only 4 vCPU, 15 GiB RAM, and CPU-only, so Bonsai 8B was not practical to benchmark interactively here.
- Prism CLI defaults to interactive chat formatting and exposes thinking tags, which hurts strict benchmark compliance unless post-processed.
Raw machine-readable files for anyone who wants to dig deeper or run their own analysis.