- Cerebras hits 969 tokens/second on Llama 3.1 405B, 75x faster than AWS
- Claims industry-low 240ms latency, twice as fast as Google Vertex
- Cerebras Inference runs on the CS-3 with the WSE-3 AI processor
Cerebras Systems says it has set a new…
Article Source
https://www.techradar.com/pro/nvidias-closest-rival-once-again-obliterates-cloud-giants-in-ai-performance