LLM inference optimization

Accelerating LLM inference with post-training weight and activation using AWQ and GPTQ on Amazon SageMaker AI | Amazon Web Services

By
January 9, 2026

Foundation models (FMs) and large language models (LLMs) have been rapidly scaling, often doubling in parameter count within months, leading to significant improvements in…

Optimizing LLM inference on Amazon SageMaker AI with BentoMLs LLM- Optimizer | Amazon Web Services

By
December 24, 2025

The rise of powerful large language models (LLMs) that can be consumed via API calls has made it remarkably straightforward to integrate artificial intelligence…