By Abdullahi Olaoye
Publication Date: 2025-11-24 19:49:00
As generative AI advances, organizations need AI agents that are accurate, reliable, and informed by data specific to their business. The NVIDIA AI-Q Research Assistant and Enterprise RAG Blueprints use retrieval-augmented generation (RAG) and NVIDIA Nemotron reasoning AI models to automate document comprehension, extract insights, and generate high-value analysis and reports from vast datasets.
Deploying these tools requires secure and scalable AI infrastructure that also maximizes performance and cost efficiency. In this blog post, we walk through deploying these blueprints on Amazon Elastic Kubernetes Service (EKS) on Amazon Web Services (AWS), while using services like Amazon OpenSearch Serverless vector database, Amazon Simple Storage Service (S3) for object storage, and Karpenter for dynamic GPU scaling.
Core components of the blueprints
The NVIDIA AI-Q Research Assistant blueprint builds directly upon the NVIDIA Enterprise RAG Blueprint. This RAG blueprint serves as the foundational component for the entire system. Both blueprints covered in this blog are built from a collection of NVIDIA NIM microservices. These are optimized inference containers designed for high-throughput, low-latency performance of AI models on GPUs.
The components can be categorized by their role in the solution:
1. Foundational RAG components
These models form the core of the Enterprise RAG blueprint and serve as the essential foundation for the AI-Q assistant:

