LLM Training & Observability
Overview
Monte Carlo leverages large language models (LLMs) to power some AI-driven features. Monte Carlo does not perform model training (building a model from the ground up by feeding it very large datasets) or fine-tuning (taking an already-trained model and adjusting it with specific datasets so that it produces domain-specific outputs) of LLMs.
Instead, Monte Carlo uses pre-trained models via Amazon Bedrock and focuses on ensuring reliable and performance-driven usage through careful prompt design, evaluation, and iterative improvements guided by observability tooling.
LLM Source and Training
As noted above, Monte Carlo uses models provided by Amazon Bedrock. These models are hosted entirely within Monte Carlo’s AWS environment.
Although Amazon Bedrock offers the ability to fine-tune some foundational models, Monte Carlo does not fine-tune or retrain them. We exclusively use the pre-trained versions provided.
Observability and Monitoring Tools
Monte Carlo employs a multi-layered observability and monitoring strategy, dependent on the needs of the feature or LLM, to ensure LLM-powered features meet enterprise expectations for accuracy, reliability, and performance.
Monitors system-level health, performance metrics, and latency to ensure AI features maintain reliability and availability.
LLM-as-judge or deterministic evaluations to assess the quality of agent responses to detect low-quality or faulty outputs and performance issues.
Used for prompt evaluation, versioning, and detailed trace analysis, helping us measure LLM behavior and accuracy across different contexts.
Iteration and Improvements
Monte Carlo follows a data-driven improvement process designed to ensure that AI features evolve in a predictable, measurable, and customer-centric way. Improvements are guided by performance metrics, feature observability, and user feedback loops, ensuring changes demonstrably enhance accuracy, reliability, and usability.
Prompt Engineering
Observability data guides adjustments to prompts to yield more accurate responses.
Feature Performance Analysis
Data from observability tools informs refinements in design and functionality.
Feedback Integration
Customer and internal feedback loops are incorporated into iterative updates.
Guardrail Evaluation
Monitoring tools help detect and minimize risks such as hallucinations, irrelevant outputs, or response latency.
Feature Specific Considerations
For additional information on specific features, click here.
Updated about 3 hours ago