Next in line to challenge Nvidia: Taalas hardwires Llama into silicon, claims 17,000 tokens per second

vm_adminFebruary 25, 2026

By Levi Li, DIGITIMES Asia, Taipei
Publication Date: 2026-02-25 01:45:00

Toronto-based AI chip startup Taalas says it can hardwire a large language model directly into silicon to accelerate inference beyond what conventional GPUs can deliver. Founded in 2023, its first product — the HC1 inference chip — generates nearly 17,000 tokens per second for a single user running Meta’s Llama 3.1 8B. In company benchmarks, performance is reported at 48 times that of Nvidia’s B200 under the same configuration.

Facebook
Twitter
Pinterest
LinkedIn
Digg
Tumblr
Reddit
Buffer
Blogger
Newsvine
HackerNews
Flipboard
Share
LiveJournal
Yammer
Mix
Instapaper
Copy Link
Mastodon

Related Posts