You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

1.3 KiB

Concept: Mastra Evaluations

Purpose: Quality assurance and scoring for LLM outputs.

Last Updated: 2026-01-09


Core Idea

Evaluations in Mastra use Scorers to assess the quality, accuracy, and safety of LLM-generated content. They provide a quantitative way to measure performance and detect issues like hallucinations or factual errors.

Key Points

  • Scorers: Specialized functions that take LLM output (and optionally ground truth) and return a score (0-1).
  • Integration: Registered in the Mastra instance and can be triggered automatically during workflow execution.
  • Metrics: Common metrics include hallucination detection, fact validation, and relevance scoring.
  • Audit Trail: Scorer results are stored in the mastra_scorers table for long-term analysis and reporting.

Quick Example

// Scorer definition
export const hallucinationDetector = new Scorer({
  id: 'hallucination-detector',
  description: 'Detects hallucinations in LLM output',
  execute: async ({ output, context }) => {
    // Logic to detect hallucinations
    return { score: 0.95, rationale: 'No hallucinations found' };
  },
});

// Registration
export const mastra = new Mastra({
  scorers: { hallucinationDetector },
});

Reference: src/mastra/scorers/, src/mastra/evaluation/ Related:

  • concepts/core.md
  • concepts/workflows.md