Skip to main content

Data Models Reference

This document describes the core data models used in the Reward Kit for representing messages, evaluation results, and metrics.

Message Models

Message

The Message class represents a single message in a conversation.
from reward_kit import Message

message = Message(
    role="assistant",
    content="This is the response content",
    name=None,  # Optional
    tool_call_id=None,  # Optional
    tool_calls=None,  # Optional
    function_call=None  # Optional
)

Attributes

  • role (str): The role of the message sender. Typically one of:
    • "user": Message from the user
    • "assistant": Message from the assistant
    • "system": System message providing context/instructions
  • content (str): The text content of the message.
  • name (Optional[str]): Optional name of the sender (for named system messages).
  • tool_call_id (Optional[str]): Optional ID for a tool call (used in tool calling).
  • tool_calls (Optional[List[Dict[str, Any]]]): Optional list of tool calls in the message.
  • function_call (Optional[Dict[str, Any]]): Optional function call information (legacy format).

Compatibility

The Message class is compatible with OpenAI’s ChatCompletionMessageParam interface, allowing for easy integration with OpenAI-compatible APIs.

Evaluation Models

EvaluateResult

The EvaluateResult class represents the complete result of an evaluator with multiple metrics.
from reward_kit import EvaluateResult, MetricResult

result = EvaluateResult(
    score=0.75,
    reason="Overall good response with minor issues",
    metrics={
        "clarity": MetricResult(score=0.8, reason="Clear and concise", success=True),
        "accuracy": MetricResult(score=0.7, reason="Contains a minor factual error", success=True)
    },
    error=None  # Optional error message
)

Attributes

  • score (float): The overall evaluation score, typically between 0.0 and 1.0.
  • reason (Optional[str]): Optional explanation for the overall score.
  • metrics (Dict[str, MetricResult]): Dictionary of component metrics.
  • error (Optional[str]): Optional error message if the evaluation encountered a problem.

MetricResult

The MetricResult class represents a single metric in an evaluation.
from reward_kit import MetricResult

metric = MetricResult(
    score=0.8,
    reason="The response provides a clear explanation with appropriate examples",
    success=True
)

Attributes

  • score (float): The score for this specific metric, typically between 0.0 and 1.0.
  • reason (str): Explanation for why this score was assigned.
  • success (bool): Indicates whether the metric condition was met (e.g., pass/fail).

Example Usages

Working with Messages

from reward_kit import Message

# Create a user message
user_message = Message(
    role="user",
    content="Can you explain how machine learning works?"
)

# Create an assistant message
assistant_message = Message(
    role="assistant",
    content="Machine learning is a method where computers learn from data without being explicitly programmed."
)

# Create a system message
system_message = Message(
    role="system",
    content="You are a helpful assistant that provides clear and accurate explanations."
)

# Create a message with tool calls
tool_call_message = Message(
    role="assistant",
    content=None,
    tool_calls=[{
        "id": "call_123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": '{"location": "San Francisco", "unit": "celsius"}'
        }
    }]
)

Working with EvaluateResult

from reward_kit import EvaluateResult, MetricResult

# Create an EvaluateResult
eval_result = EvaluateResult(
    score=0.75,
    reason="Overall good response with some minor issues",
    metrics={
        "clarity": MetricResult(score=0.8, reason="Clear and concise explanation", success=True),
        "accuracy": MetricResult(score=0.7, reason="Contains one minor factual error", success=True),
        "relevance": MetricResult(score=0.75, reason="Mostly relevant to the query", success=True)
    }
)

# Access metrics
clarity_score = eval_result.metrics["clarity"].score
print(f"Clarity score: {clarity_score}")  # Clarity score: 0.8

# Check for errors
if eval_result.error:
    print(f"Evaluation error: {eval_result.error}")
else:
    print(f"Evaluation successful with score: {eval_result.score}")

Type Compatibility

While the classes provide strong typing for development, the Reward Kit also accepts dictionary representations for flexibility:
# Using dictionaries instead of Message objects
messages = [
    {"role": "user", "content": "What is machine learning?"},
    {"role": "assistant", "content": "Machine learning is a method..."}
]

# These are automatically converted to the appropriate types internally
This flexibility makes it easier to integrate with different APIs and data formats.
I