# PatronusAI SDK

> PatronusAI Python SDK for systematic LLM evaluation - Build, test, and improve AI applications with evaluations, experiments, and prompt management

The Patronus SDK provides tools for observability, evaluation, experimentation, and prompt management for Large Language Models (LLMs), helping you build reliable and high-quality AI applications.

# Getting Started

## API Key

To use the Patronus SDK, you'll need an API key from the Patronus platform. If you don't have one yet:

1. Sign up at <https://app.patronus.ai>
1. Navigate to "API Keys"
1. Create a new API key

## Configuration

There are several ways to configure the Patronus SDK:

### Environment Variables

Set your API key as an environment variable:

```bash
export PATRONUS_API_KEY="your-api-key"

```

### Configuration File

Create a `patronus.yaml` file in your project directory:

```yaml
api_key: "your-api-key"
project_name: "Global"
app: "default"

```

### Direct Configuration

Pass configuration values directly when initializing the SDK:

```python
import patronus

patronus.init(
    api_key="your-api-key",
    project_name="Global",
    app="default",
)

```

## Verification

To verify your installation and configuration:

```python
import patronus

patronus.init()

# Create a simple tracer
@patronus.traced()
def test_function():
    return "Installation successful!"

# Call the function to test tracing
result = test_function()
print(result)

```

If no errors occur, your Patronus SDK is correctly installed and configured.

## Advanced

### Return Value

The `patronus.init()` function returns a PatronusContext object that serves as the central access point for all SDK components and functionality. Additionally, `patronus.init()` automatically sets this context globally, making it accessible throughout your application:

```python
import patronus

# Capture the returned context
patronus_context = patronus.init()  # Also sets context globally

# Direct access is possible but not typically needed
tracer_provider = patronus_context.tracer_provider
api_client = patronus_context.api_client
scope = patronus_context.scope

```

See the PatronusContext API reference for the complete list of available components and their descriptions.

This context is particularly useful when integrating with OpenTelemetry instrumentation libraries that require explicit tracer provider configuration, such as in [distributed tracing scenarios](../../observability/tracing/#distributed-tracing).

### Manual Context Management

For advanced use cases, you can build and manage contexts manually using build_context() and the context manager pattern:

```python
from patronus.init import build_context
from patronus import context

# Build a context manually with custom configuration
custom_context = build_context(...)

# Use the context temporarily without setting it globally
with context._CTX_PAT.using(custom_context):
    # All Patronus SDK operations within this block use custom_context
    result = some_patronus_operation()
# Context reverts to previous state after exiting the block

```

This pattern is particularly useful when you need to send data to multiple projects within the same process, or when building testing frameworks that require isolated contexts.

## Next Steps

Now that you've installed the Patronus SDK, proceed to the [Quickstart](../quickstart/) guide to learn how to use it effectively.

# Installation

The Patronus SDK provides tools for evaluating, monitoring, and improving LLM applications.

## Requirements

- Python 3.9 or higher
- A package manager (uv or pip)

## Basic Installation

### Using uv (Recommended)

[uv](https://github.com/astral-sh/uv) is a fast Python package installer and resolver:

```bash
uv add patronus

```

### Using pip

```bash
pip install patronus

```

## Optional Dependencies

### For Experiments

To use Patronus experiments functionality (including pandas support):

```bash
# Using uv
uv add "patronus[experiments]"

# Using pip
pip install "patronus[experiments]"

```

## Quick Start with Examples

If you'd like to see Patronus in action quickly, check out our [examples](../../examples/). These examples demonstrate how to use Patronus with various LLM frameworks and APIs.

For instance, to run the Smolagents weather example:

```bash
# Export required API keys
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

# Run the example with uv
uv run --no-cache --with "patronus-examples[smolagents]" \
    -m patronus_examples.tracking.smolagents_weather

```

See the [examples documentation](../../examples/) for more detailed information on running and understanding the available examples.

# Quickstart

This guide will help you get started with the Patronus SDK through three practical examples. We'll explore tracing, evaluation, and experimentation to give you a hands-on introduction to the core features.

## Initialization

Before running any of the examples, initialize the Patronus SDK:

```python
import os
import patronus

# Initialize with your API key
patronus.init(
    # This is the default and can be omitted
    api_key=os.environ.get("PATRONUS_API_KEY")
)

```

You can also use a configuration file instead of direct initialization:

```yaml
# patronus.yaml

api_key: "your-api-key"
project_name:  "Global"
app: "default"

```

For experiments, you don't need to explicitly call init() as run_experiment() handles initialization automatically.

## Example 1: Tracing with a Functional Evaluator

This example demonstrates how to trace function execution and create a simple functional evaluator.

```python
import patronus
from patronus import evaluator, traced

patronus.init()

@evaluator()
def exact_match(expected: str, actual: str) -> bool:
    return expected.strip() == actual.strip()

@traced()
def process_query(query: str) -> str:
    # In a real application, this would call an LLM
    return f"Processed response for: {query}"

# Use the traced function and evaluator together
@traced()
def main():
    query = "What is machine learning?"
    response = process_query(query)
    print(f"Response: {response}")

    expected_response = "Processed response for: What is machine learning?"
    result = exact_match(expected_response, response)
    print(f"Evaluation result: {result}")

if __name__ == "__main__":
    main()

```

In this example:

1. We created a simple `exact_match` evaluator using the `@evaluator()` decorator
1. We traced the `process_query` function using the `@traced()` decorator
1. We ran an evaluation by calling the evaluator function directly

The tracing will automatically capture execution details, timing, and results, making them available in the Patronus platform.

## Example 2: Using a Patronus Evaluator

This example shows how to use a Patronus Evaluator to assess model outputs for hallucinations.

```python
import patronus
from patronus import traced
from patronus.evals import RemoteEvaluator

patronus.init()


@traced()
def generate_insurance_response(query: str) -> str:
    # In a real application, this would call an LLM
    return "To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028."


@traced("Quickstart: detect hallucination")
def main():
    check_hallucinates = RemoteEvaluator("lynx", "patronus:hallucination")

    context = """
    To qualify for our car insurance policy, you need a way to show competence
    in driving which can be accomplished through a valid driver's license.
    You must have multiple years of experience and cannot be graduating from driving school before or on 2028.
    """

    query = "What is the car insurance policy?"
    response = generate_insurance_response(query)
    print(f"Query: {query}")
    print(f"Response: {response}")

    # Evaluate the response for hallucinations
    resp = check_hallucinates.evaluate(
        task_input=query,
        task_context=context,
        task_output=response
    )

    # Print the evaluation results
    print(f"""
Hallucination evaluation:
Passed: {resp.pass_}
Score: {resp.score}
Explanation: {resp.explanation}
""")

if __name__ == "__main__":
    main()

```

In this example:

1. We created a traced function generate_insurance_response to simulate an LLM response
1. We used the Patronus Lynx Evaluator
1. We evaluated whether the response contains information not supported by the context
1. We displayed the detailed evaluation results

Patronus Evaluators run on Patronus infrastructure and provide sophisticated assessment capabilities without requiring you to implement complex evaluation logic.

## Example 3: Running an Experiment with OpenAI

This example demonstrates how to run a comprehensive experiment to evaluate OpenAI model performance across multiple samples and criteria.

Before running Example 3, you'll need to install Pandas and the OpenAI SDK and OpenInference instrumentation:

```shell
pip install pandas openai openinference-instrumentation-openai

```

The OpenInference instrumentation automatically adds spans for all OpenAI API calls, capturing prompts, responses, and model parameters without any code changes. These details will appear in your Patronus traces for complete visibility into model interactions.

```python
from typing import Optional
import os

import patronus
from patronus.evals import evaluator, RemoteEvaluator, EvaluationResult
from patronus.experiments import run_experiment, FuncEvaluatorAdapter, Row, TaskResult
from openai import OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor

oai = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

patronus.init()


@evaluator()
def fuzzy_match(row: Row, task_result: TaskResult, **kwargs) -> Optional[EvaluationResult]:
    if not row.gold_answer or not task_result:
        return None

    gold_answer = row.gold_answer.lower()
    response = task_result.output.lower()

    key_terms = [term.strip() for term in gold_answer.split(',')]
    matches = sum(1 for term in key_terms if term in response)
    match_ratio = matches / len(key_terms) if key_terms else 0

    # Return a score between 0-1 indicating match quality
    return EvaluationResult(
        pass_=match_ratio > 0.7,
        score=match_ratio,
    )


def rag_task(row, **kwargs):
    # In a real RAG system, this would retrieve context before calling the LLM
    prompt = f"""
    Based on the following context, answer the question.

    Context:
    {row.task_context}

    Question: {row.task_input}

    Answer:
    """

    # Call OpenAI to generate a response
    response = oai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system",
             "content": "You are a helpful assistant that answers questions based only on the provided context."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=150
    )

    return response.choices[0].message.content


test_data = [
    {
        "task_input": "What is the main impact of climate change on coral reefs?",
        "task_context": """
        Climate change affects coral reefs through several mechanisms. Rising sea temperatures can cause coral bleaching,
        where corals expel their symbiotic algae and turn white, often leading to death. Ocean acidification, caused by
        increased CO2 absorption, makes it harder for corals to build their calcium carbonate structures. Sea level rise
        can reduce light availability for photosynthesis. More frequent and intense storms damage reef structures. The
        combination of these stressors is devastating to coral reef ecosystems worldwide.
        """,
        "gold_answer": "coral bleaching, ocean acidification, reduced calcification, habitat destruction"
    },
    {
        "task_input": "How do quantum computers differ from classical computers?",
        "task_context": """
        Classical computers process information in bits (0s and 1s), while quantum computers use quantum bits or qubits.
        Qubits can exist in multiple states simultaneously thanks to superposition, allowing quantum computers to process
        vast amounts of information in parallel. Quantum entanglement enables qubits to be correlated in ways impossible
        for classical bits. While classical computers excel at everyday tasks, quantum computers potentially have advantages
        for specific problems like cryptography, simulation of quantum systems, and certain optimization tasks. However,
        quantum computers face significant challenges including qubit stability, error correction, and scaling up to useful sizes.
        """,
        "gold_answer": "qubits instead of bits, superposition, entanglement, parallel processing"
    }
]

evaluators = [
    FuncEvaluatorAdapter(fuzzy_match),
    RemoteEvaluator("answer-relevance", "patronus:answer-relevance")
]

# Run the experiment with OpenInference instrumentation
print("Running RAG evaluation experiment...")
experiment = run_experiment(
    dataset=test_data,
    task=rag_task,
    evaluators=evaluators,
    tags={"system": "rag-prototype", "model": "gpt-3.5-turbo"},
    integrations=[OpenAIInstrumentor()]
)

# Export results to CSV (optional)
# experiment.to_csv("rag_evaluation_results.csv")

```

In this example:

1. We defined a task function `answer_questions` that generates responses for our experiment
1. We created a custom evaluator `contains_key_information` to check for specific content
1. We set up an experiment with multiple evaluators (both remote and custom)
1. We ran the experiment across a dataset of questions

Experiments provide a powerful way to systematically evaluate your LLM applications across multiple samples and criteria, helping you identify strengths and weaknesses in your models.
# Observability

# Observability Configuration

## Exporter Protocols

The SDK supports two OTLP exporter protocols:

| Protocol | Value | Default Endpoint | Available Ports | | --- | --- | --- | --- | | gRPC | `grpc` | `https://otel.patronus.ai:4317` | 4317 | | HTTP | `http/protobuf` | `https://otel.patronus.ai:4318` | 4318, 443 |

## Configuration Methods

### 1. Patronus Configuration

```python
patronus.init(
    otel_endpoint="https://otel.patronus.ai:4318",
    otel_exporter_otlp_protocol="http/protobuf"
)

```

```yaml
# patronus.yaml
otel_endpoint: "https://otel.patronus.ai:4318"
otel_exporter_otlp_protocol: "http/protobuf"

```

```bash
export PATRONUS_OTEL_ENDPOINT="https://otel.patronus.ai:4318"
export PATRONUS_OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf"

```

### 2. OpenTelemetry Environment Variables

```bash
# General (applies to all signals)
export OTEL_EXPORTER_OTLP_PROTOCOL="grpc"

# Signal-specific
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_LOGS_PROTOCOL="grpc"

```

## Configuration Priority

1. Function parameters
1. Environment variables (`PATRONUS_OTEL_EXPORTER_OTLP_PROTOCOL`)
1. Configuration file (`patronus.yaml`)
1. `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` / `OTEL_EXPORTER_OTLP_LOGS_PROTOCOL`
1. `OTEL_EXPORTER_OTLP_PROTOCOL`
1. Default: `grpc`

## Endpoint Configuration

### Custom Endpoints

```python
patronus.init(
    otel_endpoint="https://collector.example.com:4317",
    otel_exporter_otlp_protocol="grpc"
)

```

### Connection Security

Security is determined by the URL scheme for both gRPC and HTTP protocols:

- `https://` - Secure connection (TLS)
- `http://` - Insecure connection

```python
# Secure gRPC
patronus.init(otel_endpoint="https://collector.example.com:4317")

# Insecure gRPC
patronus.init(otel_endpoint="http://collector.example.com:4317")

# Secure HTTP
patronus.init(
    otel_endpoint="https://collector.example.com:4318",
    otel_exporter_otlp_protocol="http/protobuf"
)

# Insecure HTTP
patronus.init(
    otel_endpoint="http://collector.example.com:4318",
    otel_exporter_otlp_protocol="http/protobuf"
)

```

### HTTP Path Handling

For HTTP protocol, paths are automatically appended:

- Traces: `<endpoint>/v1/traces`
- Logs: `<endpoint>/v1/logs`

## Examples

### HTTP Protocol with Custom Endpoint

```python
patronus.init(
    otel_endpoint="http://internal-collector:8080",
    otel_exporter_otlp_protocol="http/protobuf"
)

```

### HTTP Protocol on Standard HTTPS Port

```python
patronus.init(
    otel_endpoint="https://otel.example.com:443",
    otel_exporter_otlp_protocol="http/protobuf"
)

```

### gRPC with Insecure Connection

```python
patronus.init(
    otel_endpoint="http://internal-collector:4317",
    otel_exporter_otlp_protocol="grpc"
)

```

### Mixed Protocols

```bash
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf"
export OTEL_EXPORTER_OTLP_LOGS_PROTOCOL="grpc"

```

# Logging

Logging is an essential feature of the Patronus SDK that allows you to record events, debug information, and track the execution of your LLM applications. This page covers how to set up and use logging in your code.

Configuration

For information about configuring observability features, including exporter protocols and endpoints, see the [Observability Configuration](../configuration/) guide.

## Getting Started with Logging

The Patronus SDK provides a simple logging interface that integrates with Python's standard logging module while also automatically exporting logs to the Patronus AI Platform:

```python
import patronus

patronus.init()
log = patronus.get_logger()

# Basic logging
log.info("Processing user query")

# Different log levels are available
log.debug("Detailed debug information")
log.warning("Something might be wrong")
log.error("An error occurred")
log.critical("System cannot continue")

```

## Configuring Console Output

By default, Patronus logs are sent to the Patronus AI Platform but are not printed to the console. To display logs in your console output, you can add a standard Python logging handler:

```python
import sys
import logging
import patronus

patronus.init()
log = patronus.get_logger()

# Add a console handler to see logs in your terminal
console_handler = logging.StreamHandler(sys.stdout)
log.addHandler(console_handler)

# Now logs will appear in both console and Patronus Platform
log.info("This message appears in the console and is sent to Patronus")

```

You can also customize the format of console logs:

```python
import sys
import logging
import patronus

patronus.init()
log = patronus.get_logger()

formatter = logging.Formatter('[%(asctime)s] %(levelname)-8s: %(message)s')

console_handler = logging.StreamHandler(sys.stdout)
console_handler.setFormatter(formatter)
log.addHandler(console_handler)

# Logs will now include timestamp and level
log.info("Formatted log message")

```

## Advanced Configuration

Patronus integrates with Python's logging module, allowing for advanced configuration options. The SDK uses two main loggers:

- `patronus.sdk` - For client-emitted messages that are automatically exported to the Patronus AI Platform
- `patronus.core` - For library-emitted messages related to the SDK's internal operations

Here's how to configure these loggers using standard library methods:

```python
import logging
import patronus

# Initialize Patronus before configuring logging
patronus.init()

# Configure the root Patronus logger
patronus_root_logger = logging.getLogger("patronus")
patronus_root_logger.setLevel(logging.WARNING)  # Set base level for all Patronus loggers

# Add a console handler with custom formatting
console_handler = logging.StreamHandler()
formatter = logging.Formatter(
    fmt='[%(asctime)s] %(levelname)-8s %(name)s: %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)
console_handler.setFormatter(formatter)
patronus_root_logger.addHandler(console_handler)

# Configure specific loggers
patronus_core_logger = logging.getLogger("patronus.core")
patronus_core_logger.setLevel(logging.WARNING)  # Only show warnings and above for internal SDK messages

patronus_sdk_logger = logging.getLogger("patronus.sdk")
patronus_sdk_logger.setLevel(logging.INFO)  # Show info and above for your application logs

```

## Logging with Traces

Patronus logging integrates seamlessly with the tracing system, allowing you to correlate logs with specific spans in your application flow:

```python
import patronus
from patronus import traced, start_span

patronus.init()
log = patronus.get_logger()

@traced()
def process_user_query(query):
    log.info("Processing query")

    with start_span("Query Analysis"):
        log.info("Analyzing query intent")
        ...

    with start_span("Response Generation"):
        log.info("Generating LLM response")
        ...

    return "Response to: " + query

# Logs will be associated with the appropriate spans
result = process_user_query("Tell me about machine learning")

```

# Tracing

Tracing is a core feature of the Patronus SDK that allows you to monitor and understand the behavior of your LLM applications. This page covers how to set up and use tracing in your code.

Configuration

For information about configuring observability features, including exporter protocols and endpoints, see the [Observability Configuration](../configuration/) guide.

## Getting Started with Tracing

Tracing in Patronus works through two main mechanisms:

1. **Function decorators**: Easily trace entire functions
1. **Context managers**: Trace specific code blocks within functions

## Using the `@traced()` Decorator

The simplest way to add tracing is with the `@traced()` decorator:

```python
import patronus
from patronus import traced

patronus.init()

@traced()
def generate_response(prompt: str) -> str:
    # Your LLM call or processing logic here
    return f"Response to: {prompt}"

# Call the traced function
result = generate_response("Tell me about machine learning")

```

### Decorator Options

The `@traced()` decorator accepts several parameters for customization:

```python
@traced(
    span_name="Custom span name",   # Default: function name
    log_args=True,                  # Whether to log function arguments
    log_results=True,               # Whether to log function return values
    log_exceptions=True,            # Whether to log exceptions
    disable_log=False,              # Completely disable logging (maintains spans)
    attributes={"key": "value"}     # Custom attributes to add to the span
)
def my_function():
    pass

```

See the API Reference for complete details.

## Using the `start_span()` Context Manager

For more granular control, use the `start_span()` context manager to trace specific blocks of code:

```python
import patronus
from patronus.tracing import start_span

patronus.init()

def complex_workflow(data):
    # First phase
    with start_span("Data preparation", attributes={"data_size": len(data)}):
        prepared_data = preprocess(data)

    # Second phase
    with start_span("Model inference"):
        results = run_model(prepared_data)

    # Third phase
    with start_span("Post-processing"):
        final_results = postprocess(results)

    return final_results

```

### Context Manager Options

The `start_span()` context manager accepts these parameters:

```python
with start_span(
    "Span name",                        # Name of the span (required)
    record_exception=False,             # Whether to record exceptions
    attributes={"custom": "attribute"}  # Custom attributes to add
) as span:
    # Your code here
    # You can also add attributes during execution:
    span.set_attribute("dynamic_value", 42)

```

See the API Reference for complete details.

## Custom Attributes

Both tracing methods allow you to add custom attributes that provide additional context for your traces:

```python
@traced(attributes={
    "model": "gpt-4",
    "version": "1.0",
    "temperature": 0.7
})
def generate_with_gpt4(prompt):
    # Function implementation
    pass

# Or with context manager
with start_span("Query processing", attributes={
    "query_type": "search",
    "filters_applied": True,
    "result_limit": 10
}):
    # Processing code
    pass

```

## Distributed Tracing

The Patronus SDK is built on OpenTelemetry and automatically supports context propagation across distributed services. This enables you to trace requests as they flow through multiple services in your application architecture. The [OpenTelemetry Python Contrib](https://github.com/open-telemetry/opentelemetry-python-contrib) repository provides instrumentation for many popular frameworks and libraries.

### Example: FastAPI Services with Context Propagation

First, install the required dependencies:

```bash
uv add opentelemetry-instrumentation-httpx \
    opentelemetry-instrumentation-fastapi \
    fastapi[all] \
    patronus

```

Here's a complete example showing two FastAPI services with automatic trace context propagation:

**Backend Service (`service_backend.py`):**

```python
import patronus
from fastapi import FastAPI
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

# Initialize Patronus SDK
patronus_context = patronus.init(service="backend")

app = FastAPI(title="Backend Service")

@app.get("/hello/{name}")
async def hello_backend(name: str):
    return {
        "message": f"Hello {name} from Backend Service!",
        "service": "backend"
    }

# Instrument FastAPI after Patronus initialization
FastAPIInstrumentor.instrument_app(app, tracer_provider=patronus_context.tracer_provider)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8001)

```

**Gateway Service (`service_gateway.py`):**

```python
import httpx
import patronus
from fastapi import FastAPI
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

# Initialize Patronus SDK with HTTPX instrumentation
patronus_context = patronus.init(
    service="gateway",
    integrations=[
        HTTPXClientInstrumentor(),
    ]
)

app = FastAPI(title="Gateway Service")

@app.get("/hello/{name}")
async def hello_gateway(name: str):
    # This HTTP call will automatically propagate trace context
    async with httpx.AsyncClient() as client:
        response = await client.get(f"http://localhost:8001/hello/{name}")
        backend_data = response.json()

    return {
        "gateway_message": f"Gateway received request for {name}",
        "backend_response": backend_data
    }

# Instrument FastAPI after Patronus initialization
FastAPIInstrumentor.instrument_app(app, tracer_provider=patronus_context.tracer_provider)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

```

### Running the Example

First, export your Patronus API key:

```bash
export PATRONUS_API_KEY="your-api-key"

```

Then run the services:

1. Start the backend: `python service_backend.py`
1. Start the gateway: `python service_gateway.py`
1. Make a request: `curl http://localhost:8000/hello/world`

After making the request, you should see the connected traces in the Patronus Platform showing the complete request flow from gateway to backend service.

### Important Notes

- FastAPI instrumenter requires manual setup with `FastAPIInstrumentor.instrument_app()` after Patronus initialization
- Pass the `tracer_provider` from Patronus context to ensure proper integration
- Trace context is automatically propagated through HTTP headers when services are properly instrumented
# Evaluations

# Batch Evaluations

When evaluating multiple outputs or using multiple evaluators, Patronus provides efficient batch evaluation capabilities. This page covers how to perform batch evaluations and manage evaluation groups.

## Using Patronus Client

For more advanced batch evaluation needs, use the `Patronus` client:

```python
from patronus import init
from patronus.pat_client import Patronus
from patronus.evals import RemoteEvaluator

init()

with Patronus() as client:
    # Run multiple evaluators in parallel
    results = client.evaluate(
        evaluators=[
            RemoteEvaluator("judge", "patronus:is-helpful"),
            RemoteEvaluator("lynx", "patronus:hallucination")
        ],
        task_input="What is quantum computing?",
        task_output="Quantum computing uses quantum bits or qubits to perform computations...",
        gold_answer="Computing that uses quantum phenomena like superposition and entanglement"
    )

    # Check if all evaluations passed
    if results.all_succeeded():
        print("All evaluations passed!")
    else:
        print("Some evaluations failed:")
        for failed in results.failed_evaluations():
            print(f"  - {failed.text_output}")

```

The `Patronus` client provides:

- Parallel evaluation execution
- Connection pooling
- Error handling
- Result aggregation

### Asynchronous Evaluation

For asynchronous workflows, use `AsyncPatronus`:

```python
import asyncio
from patronus import init
from patronus.pat_client import AsyncPatronus
from patronus.evals import AsyncRemoteEvaluator

init()


async def evaluate_responses():
    async with AsyncPatronus() as client:
        # Run evaluations asynchronously
        results = await client.evaluate(
            evaluators=[
                AsyncRemoteEvaluator("judge", "patronus:is-helpful"),
                AsyncRemoteEvaluator("lynx", "patronus:hallucination")
            ],
            task_input="What is quantum computing?",
            task_output="Quantum computing uses quantum bits or qubits to perform computations...",
            gold_answer="Computing that uses quantum phenomena like superposition and entanglement"
        )

        print(f"Number of evaluations: {len(results.results)}")
        print(f"All passed: {results.all_succeeded()}")

# Run the async function
asyncio.run(evaluate_responses())

```

## Background Evaluation

For non-blocking evaluation, use the `evaluate_bg()` method:

```python
from patronus import init
from patronus.pat_client import Patronus
from patronus.evals import RemoteEvaluator

init()

with Patronus() as client:
    # Start background evaluation
    future = client.evaluate_bg(
        evaluators=[
            RemoteEvaluator("judge", "factual-accuracy"),
            RemoteEvaluator("judge", "patronus:helpfulness")
        ],
        task_input="Explain how vaccines work.",
        task_output="Vaccines work by training the immune system to recognize and combat pathogens..."
    )

    # Do other work while evaluation happens in background
    print("Continuing with other tasks...")

    results = future.get()  # Blocks until complete

    print(f"Evaluation complete: {results.all_succeeded()}")

```

The async version works similarly:

```python
async with AsyncPatronus() as client:
    # Start background evaluation
    task = client.evaluate_bg(
        evaluators=[...],
        task_input="...",
        task_output="..."
    )

    # Do other async work
    await some_other_async_function()

    # Get results when needed
    results = await task

```

## Working with Evaluation Results

The `evaluate()` method returns an `EvaluationContainer` with several useful methods:

```python
results = client.evaluate(evaluators=[...], task_input="...", task_output="...")

if results.any_failed():
    print("Some evaluations failed")

if results.all_succeeded():
    print("All evaluations passed")

for failed in results.failed_evaluations():
    print(f"Failed: {failed.text_output}")

for success in results.succeeded_evaluations():
    print(f"Passed: {success.text_output}")

if results.has_exception():
    results.raise_on_exception()  # Re-raise any exceptions that occurred

```

## Example: Comprehensive Quality Check

Here's a complete example of batch evaluation for content quality:

```python
from patronus import init
from patronus.pat_client import Patronus
from patronus.evals import RemoteEvaluator

init()

def check_content_quality(question, answer):
    with Patronus() as client:
        results = client.evaluate(
            evaluators=[
                RemoteEvaluator("judge", "factual-accuracy"),
                RemoteEvaluator("judge", "helpfulness"),
                RemoteEvaluator("judge", "coherence"),
                RemoteEvaluator("judge", "grammar"),
                RemoteEvaluator("lynx", "patronus:hallucination")
            ],
            task_input=question,
            task_output=answer
        )

        if results.any_failed():
            print("Content quality check failed")
            for failed in results.failed_evaluations():
                print(f"- Failed check: {failed.text_output}")
                print(f"  Explanation: {failed.explanation}")
            return False

        print("Content passed all quality checks")
        return True

check_content_quality(
    "What is the capital of France?",
    "The capital of France is Paris, which is located on the Seine River."
)

```

## Using the `bundled_eval()` Context Manager

The `bundled_eval()` is a lower-level context manager that groups multiple evaluations together based on their arguments. This is particularly useful when working with multiple user-defined evaluators that don't conform to the Patronus structured evaluator format.

```python
import patronus
from patronus.evals import bundled_eval, evaluator

patronus.init()

@evaluator()
def exact_match(actual, expected) -> bool:
    return actual == expected

@evaluator()
def iexact_match(actual: str, expected: str) -> bool:
    return actual.strip().lower() == expected.strip().lower()

# Group these evaluations together in a single trace and single log record
with bundled_eval():
    exact_match("string", "string")
    iexact_match("string", "string")

```

# User-Defined Evaluators

Evaluators are the core building blocks of Patronus's evaluation system. This page covers how to create and use your own custom evaluators to assess LLM outputs according to your specific criteria.

## Creating Basic Evaluators

The simplest way to create an evaluator is with the `@evaluator()` decorator:

```python
from patronus import evaluator

@evaluator()
def keyword_match(text: str, keywords: list[str]) -> float:
    """
    Evaluates whether the text contains the specified keywords.
    Returns a score between 0.0 and 1.0 based on the percentage of matched keywords.
    """
    matches = sum(keyword.lower() in text.lower() for keyword in keywords)
    return matches / len(keywords) if keywords else 0.0

```

This decorator automatically:

- Integrates with the Patronus tracing
- Exports evaluation results to the Patronus Platform

### Flexible Input and Output

User-defined evaluators can accept any parameters and return several types of results:

```python
# Boolean evaluator (pass/fail)
@evaluator()
def contains_answer(text: str, answer: str) -> bool:
    return answer.lower() in text.lower()


# Numeric evaluator (score)
@evaluator()
def semantic_similarity(text1: str, text2: str) -> float:
    # Simple example - in practice use proper semantic similarity
    words1, words2 = set(text1.lower().split()), set(text2.lower().split())
    intersection = words1.intersection(words2)
    union = words1.union(words2)
    return len(intersection) / len(union) if union else 0.0


# String evaluator
@evaluator()
def tone_classifier(text: str) -> str:
    positive = ['good', 'excellent', 'great', 'helpful']
    negative = ['bad', 'poor', 'unhelpful', 'wrong']

    pos_count = sum(word in text.lower() for word in positive)
    neg_count = sum(word in text.lower() for word in negative)

    if pos_count > neg_count:
        return "positive"
    elif neg_count > pos_count:
        return "negative"
    else:
        return "neutral"

```

### Return Types

Evaluators can return different types which are automatically converted to `EvaluationResult` objects:

- **Boolean**: `True`/`False` indicating pass/fail
- **Float/Integer**: Numerical scores (typically between 0-1)
- **String**: Text output categorizing the result
- **EvaluationResult**: Complete evaluation with scores, explanations, etc.

## Using EvaluationResult

For more detailed evaluations, return an `EvaluationResult` object:

```python
from patronus import evaluator
from patronus.evals import EvaluationResult

@evaluator()
def comprehensive_evaluation(response: str, reference: str) -> EvaluationResult:
    # Example implementation - replace with actual logic
    has_keywords = all(word in response.lower() for word in ["important", "key", "concept"])
    accuracy = 0.85  # Calculated accuracy score

    return EvaluationResult(
        score=accuracy,  # Numeric score (typically 0-1)
        pass_=accuracy >= 0.7,  # Boolean pass/fail
        text_output="Satisfactory" if accuracy >= 0.7 else "Needs improvement",  # Category
        explanation=f"Response {'contains' if has_keywords else 'is missing'} key terms. Accuracy: {accuracy:.2f}",
        metadata={  # Additional structured data
            "has_required_keywords": has_keywords,
            "response_length": len(response),
            "accuracy": accuracy
        }
    )

```

The `EvaluationResult` object can include:

- **score**: Numerical assessment (typically 0-1)
- **pass\_**: Boolean pass/fail status
- **text_output**: Categorical or textual result
- **explanation**: Human-readable explanation of the result
- **metadata**: Additional structured data for analysis
- **tags**: Key-value pairs for filtering and organization

## Using Evaluators

Once defined, evaluators can be used directly:

```python
# Use evaluators as normal function
result = keyword_match("The capital of France is Paris", ["capital", "France", "Paris"])
print(f"Score: {result}")  # Output: Score: 1.0


# Using class-based evaluator
safety_check = ContentSafetyEvaluator()
result = safety_check.evaluate(
    task_output="This is a helpful and safe response."
)
print(f"Safety check passed: {result.pass_}")  # Output: Safety check passed: True

```

# Patronus Evaluators

Patronus provides a suite of evaluators that help you assess LLM outputs without writing complex evaluation logic. These managed evaluators run on Patronus infrastructure. Visit Patronus Platform console to define your own criteria.

## Using Patronus Evaluators

You can use Patronus evaluators through the `RemoteEvaluator` class:

```python
from patronus import init
from patronus.evals import RemoteEvaluator

init()

factual_accuracy = RemoteEvaluator("judge", "factual-accuracy")

# Evaluate an LLM output
result = factual_accuracy.evaluate(
    task_input="What is the capital of France?",
    task_output="The capital of France is Paris, which is located on the Seine River.",
    gold_answer="Paris"
)

print(f"Passed: {result.pass_}")
print(f"Score: {result.score}")
print(f"Explanation: {result.explanation}")

```

## Synchronous and Asynchronous Versions

Patronus evaluators are available in both synchronous and asynchronous versions:

```python
# Synchronous usage (as shown above)
factual_accuracy = RemoteEvaluator("judge", "factual-accuracy")
result = factual_accuracy.evaluate(...)

# Asynchronous usage
from patronus.evals import AsyncRemoteEvaluator

async_factual_accuracy = AsyncRemoteEvaluator("judge", "factual-accuracy")
result = await async_factual_accuracy.evaluate(...)

```
# Experiments

# Advanced Experiment Features

This page covers advanced features of the Patronus Experimentation Framework that help you build more sophisticated evaluation workflows.

## Multi-Stage Processing with Chains

For complex workflows, you can use chains to create multi-stage processing and evaluation pipelines. Chains connect multiple processing stages where the output of one stage becomes the input to the next.

### Basic Chain Structure

```python
from patronus.experiments import run_experiment
from patronus.evals import RemoteEvaluator

experiment = run_experiment(
    dataset=dataset,
    chain=[
        # Stage 1: Generate summaries
        {
            "task": generate_summary,
            "evaluators": [
                RemoteEvaluator("judge", "conciseness"),
                RemoteEvaluator("judge", "coherence")
            ]
        },
        # Stage 2: Generate questions from summaries
        {
            "task": generate_questions,
            "evaluators": [
                RemoteEvaluator("judge", "relevance"),
                QuestionDiversityEvaluator()
            ]
        },
        # Stage 3: Answer questions
        {
            "task": answer_questions,
            "evaluators": [
                RemoteEvaluator("judge", "factual-accuracy"),
                RemoteEvaluator("judge", "helpfulness")
            ]
        }
    ]
)

```

Each stage in the chain can:

1. Apply its own task function (or no task if set to `None`)
1. Use its own set of evaluators
1. Access results from previous stages

### Accessing Previous Results in Chain Tasks

Tasks in later chain stages can access outputs and evaluations from earlier stages through the `parent` parameter:

```python
def generate_questions(row, parent, **kwargs):
    """Generate questions based on a summary from the previous stage."""
    # Get the summary from the previous task
    summary = parent.task.output if parent and parent.task else None

    if not summary:
        return None

    # Check if summary evaluations are available
    if parent and parent.evals:
        coherence = parent.evals.get("judge:coherence")
        # Use previous evaluation results to guide question generation
        if coherence and coherence.score > 0.8:
            return "Here are three detailed questions based on the summary..."
        else:
            return "Here are three basic questions about the summary..."

    # Default questions if no evaluations available
    return "Here are some standard questions about the topic..."

```

This example demonstrates how a task can adapt its behavior based on previous outputs and evaluations.

## Concurrency Controls

For better performance, the framework automatically processes dataset examples concurrently. You can control this behavior to prevent rate limiting or resource exhaustion:

```python
experiment = run_experiment(
    dataset=large_dataset,
    task=api_intensive_task,
    evaluators=[evaluator1, evaluator2],
    # Limit the number of concurrent tasks and evaluations
    max_concurrency=5
)

```

This is particularly important for:

- Tasks that make API calls with rate limits
- Resource-intensive processing
- Large datasets with many examples

## OpenTelemetry Integrations

The framework supports OpenTelemetry instrumentation for enhanced tracing and monitoring:

```python
from openinference.instrumentation.openai import OpenAIInstrumentor

experiment = run_experiment(
    dataset=dataset,
    task=openai_task,
    evaluators=[evaluator1, evaluator2],
    # Add OpenTelemetry instrumentors
    integrations=[OpenAIInstrumentor()]
)

```

Benefits of OpenTelemetry integration include:

- Automatic capture of API calls and parameters
- Detailed timing information for performance analysis
- Integration with observability platforms

## Organizing Experiments

### Custom Experiment Names and Projects

Organize your experiments into projects with descriptive names for better management:

```python
experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[evaluator1, evaluator2],
    # Organize experiments
    project_name="RAG System Evaluation",
    experiment_name="baseline-gpt4-retrieval"
)

```

The framework automatically appends a timestamp to experiment names for uniqueness.

### Tags for Filtering and Organization

Tags help organize and filter experiment results:

```python
experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[evaluator1, evaluator2],
    # Add tags for filtering and organization
    tags={
        "model": "gpt-4",
        "version": "2.0",
        "retrieval_method": "bm25",
        "environment": "staging"
    }
)

```

Important notes about tags:

- Tags are propagated to all evaluation results in the experiment
- They cannot be overridden by tasks or evaluators
- Use a small set of consistent values for each tag (avoid having too many unique values)
- Tags are powerful for filtering and grouping in analysis

### Experiment Metadata

Experiments automatically capture important metadata, including evaluator weights when specified:

```python
from patronus.experiments import run_experiment, FuncEvaluatorAdapter
from patronus.evals import RemoteEvaluator
from patronus import evaluator

@evaluator()
def custom_check(row, **kwargs):
    return True

# Experiment with weighted evaluators
experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[
        RemoteEvaluator("judge", "patronus:is-concise", weight=0.6),
        FuncEvaluatorAdapter(custom_check, weight="0.4")
    ]
)

# Weights are automatically stored in experiment metadata
# as "evaluator_weights": {
#     "judge:patronus:is-concise": "0.6",
#     "custom_check:": "0.4"
# }

```

Evaluator weights are automatically collected and stored in the experiment's metadata under the `evaluator_weights` key. This provides a permanent record of how evaluators were weighted in each experiment for reproducibility and analysis.

For more details on using evaluator weights, see the [Using Evaluators](../evaluators/#evaluator-weights-experiments-only) page.

## Custom API Configuration

For on-prem environments, you can customize the API configuration:

```python
experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[evaluator1, evaluator2],
    # Custom API configuration
    api_key="your-api-key",
    api_url="https://custom-endpoint.patronus.ai",
    otel_endpoint="https://custom-telemetry.patronus.ai",
    timeout_s=120
)

```

## Manual Experiment Control

For fine-grained control over the experiment lifecycle, you can create and run experiments manually:

```python
from patronus.experiments import Experiment

# Create the experiment
experiment = await Experiment.create(
    dataset=dataset,
    task=task,
    evaluators=evaluators,
    # Additional configuration...
)

# Perform custom setup if needed
# ...

# Run the experiment when ready
await experiment.run()

# Export results
experiment.to_csv("results.csv")

```

This pattern is useful when you need to:

- Perform additional setup after experiment creation
- Control exactly when execution starts
- Implement custom pre- or post-processing

## Best Practices

When using advanced experiment features:

1. **Start simple**: Begin with basic experiments before adding chain complexity
1. **Test incrementally**: Validate each stage before combining them
1. **Monitor resources**: Watch for memory usage with large datasets
1. **Set appropriate concurrency**: Balance throughput against rate limits
1. **Use consistent tags**: Create a standard tagging system across experiments

# Working with Datasets

Datasets provide the foundation for Patronus experiments, containing the examples that your tasks and evaluators will process. This page explains how to create, load, and work with datasets effectively.

## Dataset Structure and Evaluator Compatibility

Patronus experiments are designed to work with `StructuredEvaluator` classes, which expect specific input parameters. The standard dataset fields map directly to these parameters, making integration seamless:

- `system_prompt`: System instruction for LLM-based tasks
- `task_context`: Additional information or context (string or list of strings)
- `task_metadata`: Additional structured information about the task
- `task_attachments`: Files or other binary data
- `task_input`: The primary input query or text
- `task_output`: The model's response or output to evaluate
- `gold_answer`: The expected correct answer or reference output
- `tags`: Key-value pairs
- `sid`: A unique identifier for the example (automatically generated if not provided)

While you can include any custom fields in your dataset, using these standard field names ensures compatibility with structured evaluators without additional configuration.

## Creating Datasets

Patronus accepts datasets in several formats:

### List of Dictionaries

```python
dataset = [
    {
        "task_input": "What is machine learning?",
        "gold_answer": "Machine learning is a subfield of artificial intelligence...",
        "tags": {"category": "ai", "difficulty": "beginner"},
        "difficulty": "beginner"  # Custom field
    },
    {
        "task_input": "Explain quantum computing",
        "gold_answer": "Quantum computing uses quantum phenomena...",
        "tags": {"category": "physics", "difficulty": "advanced"},
        "difficulty": "advanced"  # Custom field
    }
]

experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[my_evaluator]
)

```

### Pandas DataFrame

```python
import pandas as pd

df = pd.DataFrame({
    "task_input": ["What is Python?", "What is JavaScript?"],
    "gold_answer": ["Python is a programming language...", "JavaScript is a programming language..."],
    "tags": [{"type": "backend"}, {"type": "frontend"}],
    "language_type": ["backend", "frontend"]  # Custom field
})

experiment = run_experiment(dataset=df, ...)

```

### CSV or JSONL Files

```python
from patronus.datasets import read_csv, read_jsonl

# Load with default field mappings
dataset = read_csv("questions.csv")

# Load with custom field mappings
dataset = read_jsonl(
    "custom.jsonl",
    task_input_field="question",     # Map "question" field to "task_input"
    gold_answer_field="answer",      # Map "answer" field to "gold_answer"
    system_prompt_field="instruction", # Map "instruction" field to "system_prompt"
    tags_field="metadata"            # Map "metadata" field to "tags"
)

```

### Remote Datasets

Patronus allows you to work with datasets stored remotely on the Patronus platform. This is useful for sharing standard datasets across your organization or utilizing pre-built evaluation datasets.

```python
from patronus.datasets import RemoteDatasetLoader

# Load a dataset from the Patronus platform using its name
remote_dataset = RemoteDatasetLoader("financebench")

# Load a dataset from the Patronus platform using its ID
remote_dataset = RemoteDatasetLoader(by_id="d-eo6a5zy3nwach69b")

experiment = run_experiment(
    dataset=remote_dataset,
    task=my_task,
    evaluators=[my_evaluator],
)

```

The `RemoteDatasetLoader` asynchronously fetches the dataset from the Patronus API when the experiment runs. It handles the data mapping automatically, transforming the API response into the standard dataset structure with all the expected fields (`system_prompt`, `task_input`, `gold_answer`, etc.).

Remote datasets follow the same structure and field conventions as local datasets, making them interchangeable in your experiment code.

## Accessing Dataset Fields

During experiment execution, dataset examples are provided as `Row` objects:

```python
def my_task(row, **kwargs):
    # Access standard fields
    question = row.task_input
    reference = row.gold_answer
    context = row.task_context

    # Access tags
    if row.tags:
        category = row.tags.get("category")

    # Access custom fields directly
    difficulty = row.difficulty  # Access custom field by name

    # Access row ID
    sample_id = row.sid

    return f"Answering {difficulty} question (ID: {sample_id}): {question}"

```

The `Row` object automatically provides attributes for all fields in your dataset, making access straightforward for both standard and custom fields.

## Using Custom Dataset Schemas

If your dataset uses a different schema than the standard field names, you have two options:

1. **Map fields during loading**: Use field mapping parameters when loading data

   ```python
   from patronus.datasets import read_csv

   dataset = read_csv("data.csv",
                     task_input_field="question",
                     gold_answer_field="answer",
                     tags_field="metadata")

   ```

1. **Use evaluator adapters**: Create adapters that transform your data structure to match what evaluators expect

```python
from patronus import evaluator
from patronus.experiments import run_experiment, FuncEvaluatorAdapter

@evaluator()
def my_evaluator_function(*, expected, actual, context):
    ...

class CustomAdapter(FuncEvaluatorAdapter):
    def transform(self, row, task_result, parent, **kwargs):
        # Transform dataset fields to evaluator parameters.
        # The first value is list of positional arguments (*args) passed to the evaluator function.
        # The second value is named arguments (**kwargs) passed to the evaluator function.
        return [], {
            "expected": row.reference_answer,  # Map custom field to expected parameter
            "actual": task_result.output if task_result else None,
            "context": row.additional_info    # Map custom field to context parameter
        }

experiment = run_experiment(
    dataset=custom_dataset,
    evaluators=[CustomAdapter(my_evaluator_function)]
)

```

This adapter approach is particularly important for function-based evaluators, which need to be explicitly adapted for use in experiments.

## Dataset IDs and Sample IDs

Each dataset and row can have identifiers that are used for organization and tracing:

```python
from patronus.datasets import Dataset

# Dataset with explicit ID
dataset = Dataset.from_records(
    records=[...],
    dataset_id="qa-dataset-v1"
)

# Dataset with explicit sample IDs
dataset = Dataset.from_records([
    {"sid": "q1", "task_input": "Question 1", "gold_answer": "Answer 1"},
    {"sid": "q2", "task_input": "Question 2", "gold_answer": "Answer 2"}
])

```

If not provided, sample IDs (`sid`) are automatically generated.

## Best Practices

1. **Use standard field names when possible**: This minimizes the need for custom adapters
1. **Include gold answers**: This enables more comprehensive evaluation
1. **Use tags for organization**: Tags provide a flexible way to categorize examples
1. **Keep task inputs focused**: Clear, concise inputs lead to better evaluations
1. **Add relevant metadata**: Additional context helps with result analysis
1. **Normalize data before experiments**: Pre-process data to ensure consistent format
1. **Consider remote datasets for team collaboration**: Use the Patronus platform to share standardized datasets

In the next section, we'll explore how to create tasks that process your dataset examples.

# Using Evaluators in Experiments

Evaluators are the core assessment tools in Patronus experiments, measuring the quality of task outputs against defined criteria. This page covers how to use various types of evaluators in the Patronus Experimentation Framework.

## Evaluator Types

The framework supports several types of evaluators:

- **Remote Evaluators**: Use Patronus's managed evaluation services
- **Custom Evaluators**: Your own evaluation logic.
  - **Function-based**: Simple functions decorated with @evaluator() that need to be wrapped with FuncEvaluatorAdapter when used in experiments.
  - **Class-based**: More powerful evaluators created by extending `StructuredEvaluator` (synchronous) or `AsyncStructuredEvaluator` (asynchronous) base classes with predefined interfaces.

Each type has different capabilities and use cases.

## Remote Evaluators

Remote evaluators run on Patronus infrastructure and provide standardized, high-quality assessments:

```python
from patronus.evals import RemoteEvaluator
from patronus.experiments import run_experiment

experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[
        RemoteEvaluator("judge", "patronus:is-concise"),
        RemoteEvaluator("lynx", "patronus:hallucination"),
        RemoteEvaluator("judge", "patronus:is-helpful")
    ]
)

```

## Class-Based Evaluators

You can create custom evaluator classes by inheriting from the Patronus base classes:

> **Note**: The following example uses the `transformers` library from Hugging Face. Install it with `pip install transformers` before running this code.

```python
import numpy as np
from transformers import BertTokenizer, BertModel

from patronus import StructuredEvaluator, EvaluationResult
from patronus.experiments import run_experiment


class BERTScore(StructuredEvaluator):
    def __init__(self, pass_threshold: float):
        self.pass_threshold = pass_threshold
        self.tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
        self.model = BertModel.from_pretrained("bert-base-uncased")

    def evaluate(self, *, task_output: str, gold_answer: str, **kwargs) -> EvaluationResult:
        output_toks = self.tokenizer(task_output, return_tensors="pt", padding=True, truncation=True)
        gold_answer_toks = self.tokenizer(gold_answer, return_tensors="pt", padding=True, truncation=True)

        output_embeds = self.model(**output_toks).last_hidden_state.mean(dim=1).detach().numpy()
        gold_answer_embeds = self.model(**gold_answer_toks).last_hidden_state.mean(dim=1).detach().numpy()

        score = np.dot(output_embeds, gold_answer_embeds.T) / (
            np.linalg.norm(output_embeds) * np.linalg.norm(gold_answer_embeds)
        )

        return EvaluationResult(
            score=score,
            pass_=score >= self.pass_threshold,
            tags={"pass_threshold": str(self.pass_threshold)},
        )


experiment = run_experiment(
    dataset=[
        {
            "task_output": "Translate 'Goodbye' to Spanish.",
            "gold_answer": "Adiós",
        }
    ],
    evaluators=[BERTScore(pass_threshold=0.8)],
)

```

Class-based evaluators that inherit from `StructuredEvaluator` or `AsyncStructuredEvaluator` are automatically adapted for use in experiments.

## Function Evaluators

For simpler evaluation logic, you can use function-based evaluators. When using function evaluators in experiments, you must wrap them with `FuncEvaluatorAdapter`.

### Standard Function Adapter

By default, `FuncEvaluatorAdapter` expects functions that follow this interface:

```python
from typing import Optional
from patronus import evaluator
from patronus.datasets import Row
from patronus.experiments.types import TaskResult, EvalParent
from patronus.evals import EvaluationResult
from patronus.experiments import run_experiment, FuncEvaluatorAdapter

@evaluator()
def standard_evaluator(
    row: Row,
    task_result: TaskResult,
    parent: EvalParent,
    **kwargs
) -> Optional[EvaluationResult]:
    """
    Standard interface for function evaluators used with FuncEvaluatorAdapter.
    """
    if not task_result or not task_result.output:
        # Skip the evaluation
        return None

    if row.gold_answer and row.gold_answer.lower() in task_result.output.lower():
        return EvaluationResult(score=1.0, pass_=True, text_output="Contains answer")
    else:
        return EvaluationResult(score=0.0, pass_=False, text_output="Missing answer")

# Use with standard adapter
experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[
        FuncEvaluatorAdapter(standard_evaluator)
    ]
)

```

### Custom Function Adapters

If your evaluator function doesn't match the standard interface, you can create a custom adapter:

```python
from patronus import evaluator
from patronus.datasets import Row
from patronus.experiments.types import TaskResult, EvalParent
from patronus.experiments.adapters import FuncEvaluatorAdapter

# An evaluator function with a different interface
@evaluator()
def exact_match(expected: str, actual: str, case_sensitive: bool = False) -> bool:
    """
    Checks if actual text exactly matches expected text.
    """
    if not case_sensitive:
        return expected.lower() == actual.lower()
    return expected == actual

# Custom adapter to transform experiment arguments to evaluator arguments
class ExactMatchAdapter(FuncEvaluatorAdapter):
    def __init__(self, case_sensitive=False):
        super().__init__(exact_match)
        self.case_sensitive = case_sensitive

    def transform(
        self,
        row: Row,
        task_result: TaskResult,
        parent: EvalParent,
        **kwargs
    ) -> tuple[list, dict]:
        # Create arguments list and dict for the evaluator function
        args = []  # No positional arguments in this case

        # Create keyword arguments matching the evaluator's parameters
        evaluator_kwargs = {
            "expected": row.gold_answer,
            "actual": task_result.output if task_result else "",
            "case_sensitive": self.case_sensitive
        }

        return args, evaluator_kwargs

# Use custom adapter in an experiment
experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[
        ExactMatchAdapter(case_sensitive=False)
    ]
)

```

The `transform()` method is the key to adapting any function to the experiment framework. It takes the standard arguments provided by the framework and transforms them into the format your evaluator function expects.

## Combining Evaluator Types

You can use multiple types of evaluators in a single experiment:

```python
experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[
        # Remote evaluator
        RemoteEvaluator("judge", "factual-accuracy", weight=0.4),

        # Class-based evaluator
        BERTScore(pass_threshold=0.7, weight=0.3),

        # Function evaluator with standard adapter
        FuncEvaluatorAdapter(standard_evaluator, weight=0.2),

        # Function evaluator with custom adapter
        ExactMatchAdapter(case_sensitive=False, weight=0.1)
    ]
)

```

## Evaluator Chains

In multi-stage evaluation chains, evaluators from one stage can see the results of previous stages:

```python
experiment = run_experiment(
    dataset=dataset,
    chain=[
        # First stage
        {
            "task": generate_summary,
            "evaluators": [
                RemoteEvaluator("judge", "conciseness"),
                RemoteEvaluator("judge", "coherence")
            ]
        },
        # Second stage - evaluating based on first stage results
        {
            "task": None,  # No additional processing
            "evaluators": [
                # This evaluator can see previous evaluations
                DependentEvaluator()
            ]
        }
    ]
)

# Example of a function evaluator that uses previous results
@evaluator()
def final_aggregate_evaluator(row, task_result, parent, **kwargs):
    # Check if we have previous evaluation results
    if not parent or not parent.evals:
        return None

    # Access evaluations from previous stage
    conciseness = parent.evals.get("judge:conciseness")
    coherence = parent.evals.get("judge:coherence")

    # Use the previous results
    avg_score = ((conciseness.score or 0) + (coherence.score or 0)) / 2
    return EvaluationResult(score=avg_score, pass_=avg_score > 0.7)

```

## Evaluator Weights (Experiments Only)

Experiments Feature

Evaluator weights are only supported when using evaluators within the experiment framework. This feature is not available for standalone evaluator usage.

You can assign weights to evaluators to indicate their relative importance in your evaluation strategy. Weights can be provided as either strings or floats representing valid decimal numbers and are automatically stored as experiment metadata.

Weights work consistently across all evaluator types but are configured differently depending on whether you're using remote evaluators, function-based evaluators, or class-based evaluators.

### Weight Support by Evaluator Type

Each evaluator type handles weight configuration differently:

#### Remote Evaluators

For remote evaluators, pass the `weight` parameter directly to the `RemoteEvaluator` constructor:

```python
from patronus.evals import RemoteEvaluator
from patronus.experiments import run_experiment

# Remote evaluator with weight (string or float)
pii_evaluator = RemoteEvaluator("pii", "patronus:pii:1", weight="0.6")
conciseness_evaluator = RemoteEvaluator("judge", "patronus:is-concise", weight=0.4)

experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[pii_evaluator, conciseness_evaluator]
)

```

#### Function-Based Evaluators

For function-based evaluators, pass the `weight` parameter to the `FuncEvaluatorAdapter` that wraps your evaluator function:

```python
from patronus import evaluator
from patronus.experiments import FuncEvaluatorAdapter, run_experiment
from patronus.datasets import Row

@evaluator()
def exact_match(row: Row, **kwargs) -> bool:
    return row.task_output.lower().strip() == row.gold_answer.lower().strip()

# Function evaluator with weight (string or float)
exact_match_weighted = FuncEvaluatorAdapter(exact_match, weight=0.7)

experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[exact_match_weighted]
)

```

#### Class-Based Evaluators

For class-based evaluators, pass the `weight` parameter to your evaluator's constructor and ensure it's passed to the parent class:

```python
from typing import Union
from patronus import StructuredEvaluator, EvaluationResult
from patronus.experiments import run_experiment

class CustomEvaluator(StructuredEvaluator):
    def __init__(self, threshold: float, weight: Union[str, float] = None):
        super().__init__(weight=weight)  # Pass to parent class
        self.threshold = threshold

    def evaluate(self, *, task_output: str, **kwargs) -> EvaluationResult:
        score = len(task_output) / 100  # Simple length-based scoring
        return EvaluationResult(
            score=score,
            pass_=score >= self.threshold
        )

# Class-based evaluator with weight (string or float)
custom_evaluator = CustomEvaluator(threshold=0.5, weight=0.3)

experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[custom_evaluator]
)

```

### Complete Example

Here's a comprehensive example demonstrating weighted evaluators of all three types, based on the patterns shown in the experiment framework:

```python
from patronus.experiments import FuncEvaluatorAdapter, run_experiment
from patronus import RemoteEvaluator, EvaluationResult, StructuredEvaluator, evaluator
from patronus.datasets import Row

class DummyEvaluator(StructuredEvaluator):
    def evaluate(self, task_output: str, gold_answer: str, **kwargs) -> EvaluationResult:
        return EvaluationResult(score_raw=1, pass_=True)

@evaluator
def exact_match(row: Row, **kwargs) -> bool:
    return row.task_output.lower().strip() == row.gold_answer.lower().strip()

experiment = run_experiment(
    project_name="Weighted Evaluation Example",
    dataset=[
        {
            "task_input": "Please provide your contact details.",
            "task_output": "My email is john.doe@example.com and my phone number is 123-456-7890.",
            "gold_answer": "My email is john.doe@example.com and my phone number is 123-456-7890.",
        },
        {
            "task_input": "Share your personal information.",
            "task_output": "My name is Jane Doe and I live at 123 Elm Street.",
            "gold_answer": "My name is Jane Doe and I live at 123 Elm Street.",
        },
    ],
    evaluators=[
        RemoteEvaluator("pii", "patronus:pii:1", weight="0.3"),           # Remote evaluator with string weight
        FuncEvaluatorAdapter(exact_match, weight="0.3"),                   # Function evaluator with string weight
        DummyEvaluator(weight="0.4"),                                      # Class evaluator with string weight
    ],
    experiment_name="Weighted Evaluators Demo"
)

```

### Weight Validation and Rules

1. **Experiments Only**: Weights are exclusively available within the experiment framework - they cannot be used with standalone evaluator calls
1. **Valid Format**: Weights must be valid decimal numbers provided as either strings or floats (e.g., "0.3", 1.0, 0.7)
1. **Consistency**: The same evaluator (identified by its canonical name) cannot have different weights within the same experiment
1. **Automatic Storage**: Weights are automatically collected and stored in the experiment's metadata under the "evaluator_weights" key
1. **Optional**: Weights are optional - evaluators without weights will simply not have weight metadata stored
1. **Best Practice**: Consider making weights sum to 1.0 for clearer interpretation of relative importance

### Error Examples

```python
# Invalid weight format - will raise TypeError
RemoteEvaluator("judge", "patronus:is-concise", weight="invalid")
RemoteEvaluator("judge", "patronus:is-concise", weight=[1, 2, 3])  # Lists not supported

# Inconsistent weights for same evaluator - will raise TypeError during experiment
run_experiment(
    dataset=dataset,
    task=my_task,
    evaluators=[
        RemoteEvaluator("judge", "patronus:is-concise", weight=0.7),
        RemoteEvaluator("judge", "patronus:is-concise", weight="0.3"),  # Different weight!
    ]
)

```

## Best Practices

When using evaluators in experiments:

1. **Use the right evaluator type for the job**: Remote evaluators for standardized assessments, custom evaluators for specialized logic
1. **Focus each evaluator on one aspect**: Create multiple focused evaluators rather than one complex evaluator
1. **Provide detailed explanations**: Include explanations to help understand evaluation results
1. **Create custom adapters when needed**: Don't force your evaluator functions to match the standard interface if there's a more natural way to express them
1. **Handle edge cases gracefully**: Consider what happens with empty inputs, very long texts, etc.
1. **Reuse evaluators across experiments**: Create a library of evaluators for consistent assessment
1. **Weight consistency across evaluator types**: When using evaluator weights, maintain consistency across experiments regardless of whether you're using remote, function-based, or class-based evaluators
1. **Consider weight distribution**: When using weights, consider making them sum to 1.0 for clearer interpretation of relative importance (e.g., "0.4", "0.3", "0.3" rather than "0.1", "0.1", "0.1")
1. **Document weight rationale**: Consider documenting why specific weights were chosen for your evaluation strategy, especially when mixing different evaluator types

Next, we'll explore advanced features of the Patronus Experimentation Framework.

# Introduction to Experiments

The Patronus Experimentation Framework provides a systematic way to evaluate, compare, and improve Large Language Model (LLM) applications. By standardizing the evaluation process, the framework enables consistent testing across model versions, prompting strategies, and data inputs.

## What are Experiments?

In Patronus, an experiment is a structured evaluation that:

1. Processes a **dataset** of examples
1. Runs each example through a **task** function (optional)
1. Evaluates the output using one or more **evaluators**
1. Records and analyzes the results

This approach provides a comprehensive view of how your LLM application performs across different inputs, making it easier to identify strengths, weaknesses, and areas for improvement.

## Key Concepts

### Dataset

A dataset in Patronus consists of examples that your models or systems will process. Each example, represented as a `Row` object, can contain:

- Input data
- Context information
- Expected outputs (gold answers)
- Metadata
- And more...

Datasets can be loaded from various sources including JSON files, CSV files, Pandas DataFrames, or defined directly in your code.

### Task

A task is a function that processes each dataset example. Tasks typically:

- Receive a `Row` object from the dataset
- Perform some processing (like calling an LLM)
- Return a `TaskResult` containing the output

Tasks are optional - you can evaluate pre-existing outputs by including them directly in your dataset.

### Evaluators

Evaluators assess the quality of task outputs based on specific criteria. Patronus supports various types of evaluators:

- **Remote Evaluators**: Use Patronus's managed evaluation services
- **Custom Evaluators**: Your own evaluation logic.
  - **Function-based**: Simple functions decorated with @evaluator() that need to be wrapped with FuncEvaluatorAdapter when used in experiments.
  - **Class-based**: More powerful evaluators created by extending `StructuredEvaluator` (synchronous) or `AsyncStructuredEvaluator` (asynchronous) base classes with predefined interfaces.

Each evaluator produces an `EvaluationResult` containing scores, pass/fail status, explanations, and other metadata.

**Evaluator Weights**: You can assign weights to evaluators to indicate their relative importance in your evaluation strategy. Weights are stored as experiment metadata and can be provided as either strings or floats representing valid decimal numbers. See the [Using Evaluators](../evaluators/#evaluator-weights-experiments-only) page for detailed information.

### Chains

For more complex workflows, Patronus supports multi-stage evaluation chains where the output of one evaluation stage becomes the input for the next. This allows for pipeline-based approaches to LLM evaluation.

## Why Use the Experimentation Framework?

The Patronus Experimentation Framework offers several advantages over ad-hoc evaluation approaches:

- **Consistency**: Standardized evaluation across models and time
- **Reproducibility**: Experiments can be re-run with the same configuration
- **Scalability**: Process large datasets efficiently with concurrent execution
- **Comprehensive Analysis**: Collect detailed metrics and explanations
- **Integration**: Built-in tracing and logging with the broader Patronus ecosystem

## Example: Basic Experiment

Here's a simple example of a Patronus experiment:

```python
# experiment.py

from patronus.evals import RemoteEvaluator
from patronus.experiments import run_experiment

# Define a simple task function
def my_task(row, **kwargs):
    return f"The answer is: {row.task_input}"

# Run the experiment
experiment = run_experiment(
    dataset=[
        {"task_input": "What is 2+2?", "gold_answer": "4"},
        {"task_input": "Who wrote Hamlet?", "gold_answer": "Shakespeare"}
    ],
    task=my_task,
    evaluators=[
        RemoteEvaluator("judge", "patronus:fuzzy-match")
    ]
)

experiment.to_csv("./experiment-result.csv")

```

You can run the experiment by simply executing the python file:

```shell
python ./exeriment.py

```

The output of the script should look similar to this:

```text
==================================
Experiment  Global/root-1742834029: 100%|██████████| 2/2 [00:04<00:00,  2.44s/sample]

patronus:fuzzy-match (judge) [link_idx=0]
-----------------------------------------
Count     : 2
Pass rate : 0
Mean      : 0.0
Min       : 0.0
25%       : 0.0
50%       : 0.0
75%       : 0.0
Max       : 0.0

Score distribution
Score Range          Count      Histogram
0.00 - 0.20          2          ####################
0.20 - 0.40          0
0.40 - 0.60          0
0.60 - 0.80          0
0.80 - 1.00          0

```

In the following sections, we'll explore how to set up, run, and analyze experiments in detail.

# Running Experiments

This page covers how to set up and run experiments using the Patronus Experimentation Framework.

## Basic Experiment Structure

A Patronus experiment requires at minimum:

- A dataset to process
- One or more evaluators to assess outputs

Additionally, most experiments will include:

- A task function that processes each dataset example
- Configuration options for tracing, logging, and concurrency

## Setting Up an Experiment

### The `run_experiment` Function

The main entry point for the framework is the `run_experiment()` function:

```python
from patronus.experiments import run_experiment

experiment = run_experiment(
    dataset=my_dataset,               # Required: What to evaluate
    task=my_task_function,            # Optional: How to process inputs
    evaluators=[my_evaluator],        # Required: How to assess outputs
    tags={"dataset-version": "v1.0"}, # Optional: Tags for the experiment
    max_concurrency=10,               # Optional: Control parallel execution
    project_name="My Project",        # Optional: Override the global project name
    experiment_name="Test Run"        # Optional: Name this experiment run
)

```

## Creating a Simple Experiment

Let's walk through a complete example:

```python
from patronus import evaluator, RemoteEvaluator
from patronus.experiments import run_experiment, FuncEvaluatorAdapter

dataset = [
    {
        "task_input": "What is the capital of France?",
        "gold_answer": "Paris"
    },
    {
        "task_input": "Who wrote Romeo and Juliet?",
        "gold_answer": "William Shakespeare"
    }
]

# Define a task (in a real scenario, this would call an LLM)
def answer_question(row, **kwargs):
    if "France" in row.task_input:
        return "The capital of France is Paris."
    elif "Romeo and Juliet" in row.task_input:
        return "Romeo and Juliet was written by William Shakespeare."
    return "I don't know the answer to that question."

@evaluator()
def contains_answer(task_result, row, **kwargs) -> bool:
    if not task_result or not row.gold_answer:
        return False
    return row.gold_answer.lower() in task_result.output.lower()

run_experiment(
    dataset=dataset,
    task=answer_question,
    evaluators=[
        # Use a Patronus-managed evaluator
        RemoteEvaluator("judge", "patronus:fuzzy-match"),

        # Use our custom evaluator
        FuncEvaluatorAdapter(contains_answer)
    ],
    tags={"model": "simulated", "version": "v1"}
)

```

## Experiment Execution Flow

When you call `run_experiment()`, the framework follows these steps:

1. **Preparation**: Initializes the experiment context and prepares the dataset
1. **Processing**: For each dataset row:
1. Runs the task function if provided
1. Passes the task output to the evaluators
1. Collects evaluation results
1. **Reporting**: Generates a summary of evaluation results
1. **Return**: Returns an `Experiment` object with the complete results

## Synchronous vs. Asynchronous Execution

The `run_experiment()` function detects whether it's being called from an async context:

- In a synchronous context, it will block until the experiment completes
- In an async context, it returns an awaitable that can be awaited

```python
# Synchronous usage:
experiment = run_experiment(dataset, task, evaluators)

# Asynchronous usage:
experiment = await run_experiment(dataset, task, evaluators)

```

## Manual Experiment Control

For more control over the experiment lifecycle, you can create and run an experiment manually:

```python
from patronus.experiments import Experiment

# Create the experiment
experiment = await Experiment.create(
    dataset=dataset,
    task=task,
    evaluators=evaluators,
    # Additional configuration options...
)

# Run the experiment when ready
experiment = await experiment.run()

```

This approach is useful when you need to perform additional setup between experiment creation and execution.

## Experiment Results

After an experiment completes, you can access the results in several ways:

```python
# Get a Pandas DataFrame
df = experiment.to_dataframe()

# Save to CSV
experiment.to_csv("results.csv")

# Access the built-in summary
# (This is automatically printed at the end of the experiment)

```

The experiment results include:

- Inputs from the dataset
- Task outputs
- Evaluation scores and pass/fail statuses
- Explanations and metadata
- Performance timing information

In the next sections, we'll explore datasets, tasks, and evaluators in more detail.

# Creating Tasks

Tasks in Patronus experiments are functions that process each dataset example and produce outputs that will be evaluated. This page covers how to create and use tasks effectively.

## Task Function Basics

A task function receives a dataset row and produces an output. The simplest task functions look like this:

```python
def simple_task(row, **kwargs):
    # Process the input from the row
    input_text = row.task_input

    # Generate an output (typically a score between 0 and 1)
    quality_score = 0.85

    # Return the output as a float
    return quality_score

```

The framework automatically converts numeric outputs to `TaskResult` objects.

## Task Function Parameters

Task functions always receive these parameters:

- `row`: Row - The dataset example to process
- `parent`: EvalParent - Information from previous chain stages (if any)
- `tags`: Tags - Tags associated with the experiment and dataset
- `**kwargs`: Additional keyword arguments

Here's a more complete task function:

```python
from patronus.datasets import Row
from patronus.experiments.types import EvalParent

def complete_task(
    row: Row,
    parent: EvalParent = None,
    tags: dict[str, str] = None,
    **kwargs
):
    # Access dataset fields
    input_text = row.task_input
    context = row.task_context
    system_prompt = row.system_prompt
    gold_answer = row.gold_answer

    # Access parent information (from previous chain steps)
    previous_output = None
    if parent and parent.task:
        previous_output = parent.task.output

    # Access tags
    model_name = tags.get("model_name", "default")

    # Generate output (in real usage, this would call an LLM)
    output = f"Model {model_name} processed: {input_text}"

    # Return the output
    return output

```

## Return Types

Task functions can return several types:

### String Output

Here's an improved example for the string return type section that demonstrates a classification task:

```python
def classify_sentiment(row: Row, **kwargs) -> str:
    # Extract the text to classify
    text = row.task_input

    # Simple rule-based sentiment classifier
    positive_words = ["good", "great", "excellent", "happy", "positive"]
    negative_words = ["bad", "terrible", "awful", "sad", "negative"]

    text_lower = text.lower()
    positive_count = sum(word in text_lower for word in positive_words)
    negative_count = sum(word in text_lower for word in negative_words)

    # Classify based on word counts
    if positive_count > negative_count:
        return "positive"
    elif negative_count > positive_count:
        return "negative"
    else:
        return "neutral"

```

The string output represents a specific classification category, which is a common pattern in text classification tasks.

### Numeric Output (Float/Int)

For score-based outputs:

```python
def score_task(row: Row, **kwargs) -> float:
    # Calculate a relevance score between 0 and 1
    return 0.92

```

### TaskResult Object

For more control, return a TaskResult object:

```python
from patronus.experiments.types import TaskResult

def task_result(row: Row, **kwargs) -> TaskResult:
    # Generate output
    output = f"Processed: {row.task_input}"

    # Include metadata about the processing
    metadata = {
        "processing_time_ms": 42,
        "confidence": 0.95,
        "tokens_used": 150
    }

    # Add tags for filtering and organization
    tags = {
        "model": "gpt-4",
        "temperature": "0.7"
    }

    # Return a complete TaskResult
    return TaskResult(
        output=output,
        metadata=metadata,
        tags=tags
    )

```

### None / Skipping Examples

Return `None` to skip processing this example:

```python
def selective_task(row: Row, **kwargs) -> None:
    # Skip examples without the required fields
    if not row.task_input or not row.gold_answer:
        return None

    # Process valid examples
    return f"Processed: {row.task_input}"

```

## Calling LLMs

A common use of tasks is to generate outputs using Large Language Models:

```python
from openai import OpenAI
from patronus.datasets import Row
from patronus.experiments.types import TaskResult

oai = OpenAI()

def openai_task(row: Row, **kwargs) -> TaskResult:
    # Prepare the input for the model
    system_message = row.system_prompt or "You are a helpful assistant."
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": row.task_input}
    ]

    # Call the OpenAI API
    response = oai.chat.completions.create(
        model="gpt-4",
        messages=messages,
        temperature=0.7,
        max_tokens=150
    )

    # Extract the output
    output = response.choices[0].message.content

    # Include metadata about the call
    metadata = {
        "model": response.model,
        "tokens": {
            "prompt": response.usage.prompt_tokens,
            "completion": response.usage.completion_tokens,
            "total": response.usage.total_tokens
        }
    }

    return TaskResult(
        output=output,
        metadata=metadata
    )

```

## Async Tasks

For better performance, especially with API calls, you can use async tasks:

```python
import asyncio
from openai import AsyncOpenAI
from patronus.datasets import Row
from patronus.experiments.types import TaskResult

oai = AsyncOpenAI()

async def async_openai_task(
    row: Row,
    parent: EvalParent = None,
    tags: dict[str, str] = None,
    **kwargs
) -> TaskResult:
    # Create async client

    # Prepare the input
    system_message = row.system_prompt or "You are a helpful assistant."
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": row.task_input}
    ]

    # Call the OpenAI API asynchronously
    response = await oai.chat.completions.create(
        model="gpt-4",
        messages=messages,
        temperature=0.7,
        max_tokens=150
    )

    # Extract and return the output
    output = response.choices[0].message.content

    return TaskResult(
        output=output,
        metadata={"model": response.model}
    )

```

The Patronus framework automatically handles both synchronous and asynchronous tasks.

## Using Parent Information

In multi-stage chains, tasks can access the results of previous stages:

```python
from patronus.datasets import Row
from patronus.experiments.types import EvalParent

def second_stage_task(
    row: Row,
    parent: EvalParent,
    tags: dict[str, str] = None,
    **kwargs
) -> str:
    # Access previous task output
    if parent and parent.task:
        previous_output = parent.task.output
        return f"Building on previous output: {previous_output}"

    # Fallback if no previous output
    return f"Starting fresh: {row.task_input}"

```

## Error Handling

Task functions should handle exceptions appropriately:

```python
from patronus import get_logger
from patronus.datasets import Row

def robust_task(row: Row, **kwargs):
    try:
        # Attempt to process
        if row.task_input:
            return f"Processed: {row.task_input}"
        else:
            # Skip if input is missing
            return None
    except Exception as e:
        # Log the error
        get_logger().exception(f"Error processing row {row.sid}: {e}")
        # Skip this example
        return None

```

If an unhandled exception occurs, the experiment will log the error and skip that example.

## Task Tracing

Tasks are automatically traced with the Patronus tracing system. You can add additional tracing:

```python
from patronus.tracing import start_span
from patronus.datasets import Row

def traced_task(row: Row, **kwargs):
    # Outer span is created automatically by the framework

    # Create spans for subtasks
    with start_span("Preprocessing"):
        # Preprocessing logic...
        preprocessed = preprocess(row.task_input)

    with start_span("Model Call"):
        # Model call logic...
        output = call_model(preprocessed)

    with start_span("Postprocessing"):
        # Postprocessing logic...
        final_output = postprocess(output)

    return final_output

```

This helps with debugging and performance analysis.

## Best Practices

When creating task functions:

1. **Handle missing data gracefully**: Check for required fields and handle missing data
1. **Include useful metadata**: Add information about processing steps, model parameters, etc.
1. **Use async for API calls**: Async tasks significantly improve performance for API-dependent workflows
1. **Add explanatory tags**: Tags help with filtering and analyzing results
1. **Add tracing spans**: For complex processing, add spans to help with debugging and optimization
1. **Keep functions focused**: Tasks should have a clear purpose; use chains for multi-step processes

Next, we'll explore how to use evaluators in experiments to assess task outputs.
# Integrations

# Agent Integrations

The Patronus SDK provides integrations with various agent frameworks to enable observability, evaluation, and experimentation with agent-based LLM applications.

## Pydantic AI

[Pydantic AI](https://ai.pydantic.dev/) is a framework for building AI agents with type-safe tools and structured outputs. The Patronus SDK provides a dedicated integration that automatically instruments all Pydantic AI agents for observability.

### Installation

Make sure you have both the Patronus SDK and Pydantic AI installed:

```bash
pip install patronus pydantic-ai

```

### Usage

To enable Pydantic AI integration with Patronus:

```python
from patronus import init
from patronus.integrations.pydantic_ai import PydanticAIIntegrator

# Initialize Patronus with the Pydantic AI integration
patronus_ctx = init(
    integrations=[PydanticAIIntegrator()]
)

# Now all Pydantic AI agents will automatically send telemetry to Patronus

```

### Configuration Options

The `PydanticAIIntegrator` accepts the following parameters:

- `event_mode`: Controls how agent events are captured
  - `"logs"` (default): Captures events as logs, which works best with the Patronus Platform
  - `"attributes"`: Captures events as span attributes

Example with custom configuration:

```python
from patronus import init
from patronus.integrations.pydantic_ai import PydanticAIIntegrator

patronus_ctx = init(
    integrations=[PydanticAIIntegrator(event_mode="logs")]
)

```

# LLM Integrations

The Patronus SDK provides integrations with various LLM providers to enable observability, evaluation, and experimentation with LLM applications.

## OpenTelemetry LLM Instrumentors

Patronus supports any OpenTelemetry-based LLM instrumentation. This allows you to easily capture telemetry data from your LLM interactions and send it to the Patronus platform for analysis.

A popular option for LLM instrumentation is [OpenInference](https://github.com/Arize-ai/openinference), which provides instrumentors for multiple LLM providers.

### Anthropic Claude Integration

To instrument Anthropic's Claude API calls:

```shell
# Install the required package
pip install openinference-instrumentation-anthropic

```

```python
from patronus import init
from openinference.instrumentation.anthropic import AnthropicInstrumentor

# Initialize Patronus with Anthropic instrumentation
patronus_ctx = init(
    integrations=[AnthropicInstrumentor()]
)

# Now all Claude API calls will be automatically instrumented
# and the telemetry will be sent to Patronus

```

### OpenAI Integration

To instrument OpenAI API calls:

```shell
# Install the required package
pip install openinference-instrumentation-openai

```

```python
from patronus import init
from openinference.instrumentation.openai import OpenAIInstrumentor

# Initialize Patronus with OpenAI instrumentation
patronus_ctx = init(
    integrations=[OpenAIInstrumentor()]
)

# Now all OpenAI API calls will be automatically instrumented
# and the telemetry will be sent to Patronus

```

### Using Multiple LLM Instrumentors

You can combine multiple instrumentors to capture telemetry from different LLM providers:

```python
from patronus import init
from openinference.instrumentation.anthropic import AnthropicInstrumentor
from openinference.instrumentation.openai import OpenAIInstrumentor

# Initialize Patronus with multiple LLM instrumentors
patronus_ctx = init(
    project_name="my-multi-llm-project",
    app="llm-application",
    integrations=[
        AnthropicInstrumentor(),
        OpenAIInstrumentor()
    ]
)

# Now both Anthropic and OpenAI API calls will be automatically instrumented

```
# Prompts

# Prompt Management

The Patronus SDK provides tools to version, retrieve, and render prompts in your LLM applications.

## Quick Start

### Creating a Prompt

```python
import patronus
import textwrap
from patronus.prompts import Prompt, push_prompt

patronus.init()

# Create a new prompt
prompt = Prompt(
    name="support/troubleshooting/login-issues",
    body=textwrap.dedent("""
        You are a support specialist for {product_name}.
        ISSUE: {issue_description}
        TIER: {subscription_tier}

        Provide a solution for this {issue_type} problem. Be concise.
        Include steps and end with an offer for further help.
        """),
    description="Support prompt for login issues",
    metadata={"temperature": 0.7, "tone": "helpful"}
)

# Push the prompt to Patronus
loaded_prompt = push_prompt(prompt)

# Render the prompt
rendered = prompt.render(
    issue_description="Cannot log in with correct credentials",
    product_name="CloudWorks",
    subscription_tier="Business",
    issue_type="authentication"
)
print(rendered)

```

### Loading a Prompt

```python
import patronus
from patronus.prompts import load_prompt

patronus.init()

# Get the latest version of the prompt we just created
prompt = load_prompt(name="support/troubleshooting/login-issues")

# Access metadata
print(prompt.metadata)

# Render the prompt with different parameters
rendered = prompt.render(
    issue_description="Password reset link not working",
    product_name="CloudWorks",
    subscription_tier="Enterprise",
    issue_type="password reset"
)
print(rendered)

```

## Loading Prompts

Use `load_prompt` to retrieve prompts from the Patronus platform:

```python
import patronus
from patronus.prompts import load_prompt

patronus.init()

# Load an instruction prompt that doesn't need any parameters
prompt = load_prompt(name="content/writing/blog-instructions")
rendered = prompt.render()
print(rendered)

```

For async applications:

```python
from patronus.prompts import aload_prompt

prompt = await aload_prompt(name="content/writing/blog-instructions")

```

### Loading Specific Versions

Retrieve prompts by revision number or label:

```python
# Load a specific revision
prompt = load_prompt(name="content/blog/technical-explainer", revision=3)

# Load by label (production environment)
prompt = load_prompt(name="legal/contracts/privacy-policy", label="production")

```

## Creating and Updating Prompts

Create new prompts using `push_prompt`:

````python
from patronus.prompts import Prompt, push_prompt

new_prompt = Prompt(
    name="dev/bug-fix/python-error",
    body="Fix this Python code error: {error_message}. Code: ```python\n{code_snippet}\n```",
    description="Template for Python debugging assistance",
    metadata={
        "creator": "dev-team",
        "temperature": 0.7,
        "max_tokens": 250
    }
)

loaded_prompt = push_prompt(new_prompt)

````

For async applications:

```python
from patronus.prompts import apush_prompt

loaded_prompt = await apush_prompt(new_prompt)

```

The `push_prompt` function automatically handles duplicate detection - if a prompt with identical content already exists, it returns the existing revision instead of creating a new one.

## Rendering Prompts

Render prompts with variables:

```python
rendered = prompt.render(user_query="How do I optimize database performance?", expertise_level="intermediate")

```

### Template Engines

Patronus supports multiple template engines:

```python
# F-string templating (default)
rendered = prompt.with_engine("f-string").render(**kwargs)

# Mustache templating
rendered = prompt.with_engine("mustache").render(**kwargs)

# Jinja2 templating
rendered = prompt.with_engine("jinja2").render(**kwargs)

```

## Working with Labels

Labels provide stable references to specific revisions:

```python
from patronus import context

client = context.get_api_client().prompts

# Add audience-specific labels
client.add_label(
    prompt_id="prompt_123",
    revision=3,
    label="technical-audience"
)

# Update label to point to a new revision
client.add_label(
    prompt_id="prompt_123",
    revision=5,
    label="technical-audience"
)

# Add environment label
client.add_label(
    prompt_id="prompt_456",
    revision=2,
    label="production"
)

```

## Metadata Usage

Prompt revisions support arbitrary metadata:

```python
from patronus.prompts import Prompt, push_prompt, load_prompt

# Create with metadata
prompt_with_meta = Prompt(
    name="research/data-analysis/summarize-findings",
    body="Analyze the {data_type} data and summarize the key {metric_type} trends in {time_period}.",
    metadata={
        "models": ["gpt-4", "claude-3"],
        "created_by": "data-team",
        "tags": ["data", "analysis"]
    }
)

loaded_prompt = push_prompt(prompt_with_meta)

# Access metadata
prompt = load_prompt(name="research/data-analysis/summarize-findings")
supported_models = prompt.metadata.get("models", [])
creator = prompt.metadata.get("created_by", "unknown")

print(f"Prompt supports models: {', '.join(supported_models)}")
print(f"Created by: {creator}")

```

## Using Multiple Prompts Together

Complex applications often use multiple prompts together:

```python
import patronus
from patronus.prompts import load_prompt
import openai

patronus.init()

# Load different prompt components
system_prompt = load_prompt(name="support/chat/system")
user_query_template = load_prompt(name="support/chat/user-message")
response_formatter = load_prompt(name="support/chat/response-format")

# Create OpenAI client
client = openai.OpenAI()

# Combine the prompts in a chat completion
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": system_prompt.render(
            product_name="CloudWorks Pro",
            available_features=["file sharing", "collaboration", "automation"],
            knowledge_cutoff="2024-05-01"
        )},
        {"role": "user", "content": user_query_template.render(
            user_name="Alex",
            user_tier="premium",
            user_query="How do I share files with external users?"
        )}
    ],
    temperature=0.7,
    max_tokens=500
)

# Post-process the response using another prompt
formatted_response = response_formatter.render(
    raw_response=response.choices[0].message.content,
    user_name="Alex",
    add_examples=True
)

```

## Naming Conventions

Use a descriptive, hierarchical naming structure similar to file paths. This makes prompts easier to organize, find, and manage:

```text
[domain]/[use-case]/[component]/[prompt-type]

```

Where `[prompt-type]` indicates the intended role of the prompt in an LLM conversation (optional but recommended):

- `system` - Sets the overall behavior, persona, or context for the model
- `instruction` - Provides specific instructions for a task
- `user` - Represents a user message template
- `assistant` - Template for assistant responses
- `few-shot` - Contains examples of input/output pairs

Examples:

- `support/troubleshooting/diagnostic-questions/system`
- `marketing/email-campaigns/follow-up-template/instruction`
- `dev/code-generation/python-function/instruction`
- `finance/report/quarterly-analysis`
- `content/blog-post/technical-tutorial/few-shot`
- `legal/contracts/terms-of-service-v2/system`

Including the prompt type in the name helps team members quickly understand the intended usage context in multi-prompt conversations.

### Consistent Prefixes

Use consistent prefixes for prompts that work together in the same feature:

```text
# Onboarding chat prompts share the prefix onboarding/chat/
onboarding/chat/welcome/system
onboarding/chat/questions/user
onboarding/chat/intro/assistant

# Support classifier prompts
support/classifier/system
support/classifier/categories/instruction

```

This approach simplifies filtering and management of related prompts, making it easier to maintain and evolve complete prompt flows as your library grows.

## Configuration

The default template engine can be configured during initialization:

```python
import patronus

patronus.init(
    # Default template engine for all prompts
    prompt_templating_engine="mustache"
)

```

For additional configuration options, see the [Configuration](../configuration/) page.

## Using with LLMs

Prompts can be used with any LLM provider:

```python
import patronus
from patronus.prompts import load_prompt
import anthropic

patronus.init()

system_prompt = load_prompt(name="support/knowledge-base/technical-assistance")

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-3-opus-20240229",
    system=system_prompt.render(
        product_name="CloudWorks Pro",
        user_tier="enterprise",
        available_features=["advanced monitoring", "auto-scaling", "SSO integration"]
    ),
    messages=[
        {"role": "user", "content": "How do I configure the load balancer for high availability?"}
    ]
)

```

## Additional Resources

While the SDK provides high-level, convenient access to Patronus functionality, you can also use the lower-level APIs for more direct control:

- [REST API documentation](https://docs.patronus.ai/docs/api_ref) - For direct HTTP access to the Patronus platform
- [Patronus API Python library](https://github.com/patronus-ai/patronus-api-python) - A typed Python client for the REST API with both synchronous and asynchronous support
# Configuration

# Configuration

The Patronus Experimentation Framework offers several configuration options that can be set in the following ways:

1. Through function parameters (in code)
1. Environment variables
1. YAML configuration file

Configuration options are prioritized in the order listed above, meaning that if a configuration value is provided through function parameters, it will override values from environment variables or the YAML file.

## Configuration Options

| Config name | Environment Variable | Default Value | | --- | --- | --- | | service | PATRONUS_SERVICE | Defaults to value retrieved from `OTEL_SERVICE_NAME` env var or `platform.node()`. | | project_name | PATRONUS_PROJECT_NAME | `Global` | | app | PATRONUS_APP | `default` | | api_key | PATRONUS_API_KEY | | | api_url | PATRONUS_API_URL | `https://api.patronus.ai` | | ui_url | PATRONUS_UI_URL | `https://app.patronus.ai` | | otel_endpoint | PATRONUS_OTEL_ENDPOINT | `https://otel.patronus.ai:4317` | | otel_exporter_otlp_protocol | PATRONUS_OTEL_EXPORTER_OTLP_PROTOCOL | Falls back to OTEL env vars, defaults to `grpc` | | timeout_s | PATRONUS_TIMEOUT_S | `300` | | prompt_templating_engine | PATRONUS_PROMPT_TEMPLATING_ENGINE | `f-string` | | prompt_providers | PATRONUS_PROMPT_PROVIDERS | `["local", "api"]` | | resource_dir | PATRONUS_RESOURCE_DIR | `./patronus` |

## Configuration Methods

### 1. Function Parameters

You can provide configuration options directly through function parameters when calling key Patronus functions.

#### Using init()

Use the `init()` function when you need to set up the Patronus SDK for evaluations, logging, and tracing outside of experiments. This initializes the global context used by the SDK.

```python
import patronus

# Initialize with specific configuration
patronus.init(
    project_name="my-project",
    app="recommendation-service",
    api_key="your-api-key",
    api_url="https://api.patronus.ai",
    service="my-service",
    prompt_templating_engine="mustache"
)

```

#### Using run_experiment() or Experiment.create()

Use these functions when running experiments. They handle their own initialization, so you don't need to call `init()` separately. Experiments create their own context scoped to the experiment.

```python
from patronus import run_experiment

# Run experiment with specific configuration
experiment = run_experiment(
    dataset=my_dataset,
    task=my_task,
    evaluators=[my_evaluator],
    project_name="my-project",
    api_key="your-api-key",
    service="my-service"
)

```

### 2. Environment Variables

You can set configuration options using environment variables with the prefix `PATRONUS_`:

```bash
export PATRONUS_API_KEY="your-api-key"
export PATRONUS_PROJECT_NAME="my-project"
export PATRONUS_SERVICE="my-service"

```

### 3. YAML Configuration File (`patronus.yaml`)

You can also provide configuration options using a `patronus.yaml` file. This file must be present in the working directory when executing your script.

```yaml
service: "my-service"
project_name: "my-project"
app: "my-agent"

api_key: "YOUR_API_KEY"
api_url: "https://api.patronus.ai"
ui_url: "https://app.patronus.ai"
otel_endpoint: "https://otel.patronus.ai:4317"
otel_exporter_otlp_protocol: "grpc"  # or "http/protobuf"
timeout_s: 300

# Prompt management configuration
prompt_templating_engine: "mustache"
prompt_providers: [ "local", "api" ]
resource_dir: "./my-resources"

```

## Configuration Precedence

When determining the value for a configuration option, Patronus follows this order of precedence:

1. Function parameter values (highest priority)
1. Environment variables
1. YAML configuration file
1. Default values (lowest priority)

For example, if you provide `project_name` as a function parameter and also have it defined in your environment variables and YAML file, the function parameter value will be used.

## Programmatic Configuration Access

For more advanced use cases, you can directly access the configuration system through the Config class and the config() function:

```python
from patronus.config import config

# Access the configuration singleton
cfg = config()

# Read configuration values
api_key = cfg.api_key
project_name = cfg.project_name

# Check for specific conditions
if cfg.api_url != "https://api.patronus.ai":
    print("Using custom API endpoint")

```

This approach is particularly useful when you need to inspect or log the current configuration state.

## Observability Configuration

For detailed information about configuring observability features like tracing and logging, including exporter protocol selection and endpoint configuration, see the [Observability Configuration](../observability/configuration/) guide.
# Examples

# Examples

Examples of how to use Patronus and what it can do.

## Usage

These examples demonstrate common use cases and integration patterns for Patronus.

### Setting required environment variables

Most examples require you to set up authentication with Patronus and other services. In most cases, you'll need to set the following environment variables:

```bash
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

```

Some examples may require additional API keys (like `ANTHROPIC_API_KEY`).

### Running Examples

There are three ways to run the examples:

#### 1. Running with `uv`

You can run examples with `uv`, which automatically installs the required dependencies:

```bash
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[smolagents]" \
    -m patronus_examples.tracking.smolagents_weather

```

This installs the `patronus-examples` package with the necessary optional dependencies.

#### 2. Pulling the repository and executing the scripts directly

You can clone the repository and run the scripts directly:

```bash
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/smolagents_weather.py

```

See the script files for more information. They use uv script annotations to handle dependencies.

#### 3. Copy and paste example

You can copy the example code into your own project and install the dependencies with any package manager of your choice. Each example file includes a list of required dependencies at the top of the document.

## Available Examples

Patronus provides examples for various LLM frameworks and direct API integrations:

### Direct LLM API Integrations

- [OpenAI Weather Example](openai-weather/) - Simple example of tracing OpenAI API calls
- [Anthropic Weather Example](anthropic-weather/) - Simple example of tracing Anthropic API calls

### Agent Frameworks

- [Smolagents Weather](smolagents-weather/) - Using Patronus with Smolagents
- [PydanticAI Weather](pydanticai-weather/) - Using Patronus with PydanticAI
- [OpenAI Agents Weather](openai-agents-weather/) - Using Patronus with OpenAI Agents
- [LangChain Weather](langchain-weather/) - Using Patronus with LangChain and LangGraph
- [CrewAI Weather](crewai-weather/) - Using Patronus with CrewAI

Each example demonstrates:

- How to set up Patronus integrations with the specific framework
- How to trace LLM calls and tool usage
- How to analyze the execution flow of your application

All examples follow a similar pattern using a weather application to make it easy to compare the different frameworks.

### Advanced Examples

- [Manual OpenTelemetry with OpenAI](otel-openai-weather/) - An example showing how to use OpenTelemetry directly without Patronus SDK

## Running the example

To run this example, you need to add API keys to your environment:

```shell
export PATRONUS_API_KEY=your-api-key
export ANTHROPIC_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[anthropic]" \
    -m patronus_examples.tracking.anthropic_weather

```

### Running the script directly

If you've cloned the repository, you can run the script directly:

```shell
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/anthropic_weather.py

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install patronus
pip install anthropic
pip install openinference-instrumentation-anthropic

```

## Example overview

This example demonstrates how to use Patronus to trace Anthropic API calls when implementing a simple weather application. The application:

1. Uses the Anthropic Claude API to parse a user question about weather
1. Extracts location coordinates from the LLM's output through Claude's tool calling
1. Calls a weather API to get actual temperature data
1. Returns the result to the user

The example shows how Patronus can help you monitor and debug Anthropic API interactions, track tool usage, and visualize the entire application flow.

## Example code

```python
# examples/patronus_examples/tracking/anthropic_weather.py

import requests

import anthropic
from openinference.instrumentation.anthropic import AnthropicInstrumentor
import patronus

# Initialize patronus with Anthropic Instrumentor
patronus.init(integrations=[AnthropicInstrumentor()])


def get_weather(latitude, longitude):
    response = requests.get(
        f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m"
    )
    data = response.json()
    return data["current"]["temperature_2m"]


def get_client():
    client = anthropic.Anthropic()
    return client


@patronus.traced()
def call_llm(client, user_prompt):
    tools = [
        {
            "name": "get_weather",
            "description": "Get current temperature for provided coordinates in celsius.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "latitude": {"type": "number"},
                    "longitude": {"type": "number"},
                },
                "required": ["latitude", "longitude"],
            },
        }
    ]

    response = client.messages.create(
        model="claude-3-7-sonnet-20250219",
        max_tokens=1024,
        tools=tools,
        messages=[{"role": "user", "content": user_prompt}],
    )
    return response


@patronus.traced("anthropic-weather")
def main():
    user_prompt = "What's the weather like in Paris today?"

    client = get_client()
    response = call_llm(client, user_prompt)
    print("LLM Response")
    print(response.model_dump_json())

    weather_response = None
    if response.content:
        for content_block in response.content:
            if content_block.type == "tool_use" and content_block.name == "get_weather":
                kwargs = content_block.input
                print("Weather API Response")
                weather_response = get_weather(**kwargs)
                print(weather_response)

    if weather_response:
        print(user_prompt)
        print(f"Answer: {weather_response}")


if __name__ == "__main__":
    main()

```

## Running the example

To run this example, you need to add API keys to your environment:

```shell
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[crewai]" \
    -m patronus_examples.tracking.crewai_weather

```

### Running the script directly

If you've cloned the repository, you can run the script directly:

```shell
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/crewai_weather.py

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install patronus
pip install crewai
pip install openinference.instrumentation.crewai
pip install opentelemetry-instrumentation-threading
pip install opentelemetry-instrumentation-asyncio

```

## Example overview

This example demonstrates how to use Patronus to trace and monitor CrewAI agents in a weather application. The example:

1. Sets up a specialized Weather Information Specialist agent with a custom weather tool
1. Creates a manager agent that coordinates information requests
1. Defines tasks for each agent to perform
1. Configures a hierarchical workflow using the CrewAI Crew construct
1. Traces the entire execution flow with Patronus

The example shows how Patronus integrates with CrewAI to provide visibility into agent interactions, tool usage, and the hierarchical task execution process.

## Example code

```python
# examples/patronus_examples/tracking/crewai_weather.py

from crewai import Agent, Task, Crew, Process
from crewai.tools import BaseTool
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor
from openinference.instrumentation.crewai import CrewAIInstrumentor
import patronus

patronus.init(
    integrations=[CrewAIInstrumentor(), ThreadingInstrumentor(), AsyncioInstrumentor()]
)


# Create a custom tool for weather information
class WeatherTool(BaseTool):
    name: str = "get_weather_api"
    description: str = "Returns the weather report for a specific location"

    def _run(self, location: str) -> str:
        """
        Returns the weather report.

        Args:
            location: the name of the place that you want the weather for. Should be a place name, followed by possibly a city name, then a country, like "Anchor Point, Taghazout, Morocco".

        Returns:
            The weather report.
        """
        temperature_celsius, risk_of_rain, wave_height = 10, 0.5, 4  # mock outputs
        return f"Weather report for {location}: Temperature will be {temperature_celsius}°C, risk of rain is {risk_of_rain * 100:.0f}%, wave height is {wave_height}m."


# Initialize weather tool
weather_tool = WeatherTool()

# Define agents
weather_agent = Agent(
    role="Weather Information Specialist",
    goal="Provide accurate weather information for specific locations and times",
    backstory="""You are a weather information specialist that must call the available tool to get the most recent reports""",
    verbose=False,
    allow_delegation=False,
    tools=[weather_tool],
    max_iter=5,
)

manager_agent = Agent(
    role="Information Manager",
    goal="Coordinate information requests and delegate to specialized agents",
    backstory="""You manage and coordinate information requests, delegating specialized
    queries to the appropriate experts. You ensure users get the most accurate and relevant
    information.""",
    verbose=False,
    allow_delegation=True,
    max_iter=10,
)

# Create tasks
weather_task = Task(
    description="""Find out the current weather at a specific location.""",
    expected_output="Complete weather report with temperature, rain and wave height information",
    agent=weather_agent,
)

manager_task = Task(
    description="""Process the user query about weather in Paris, France.
    Ensure the weather information is complete (with temperature, rain and wave height) and properly formatted.
    You must coordinate with the weather agent for this task.""",
    expected_output="Weather report for Paris",
    agent=manager_agent,
)

# Instantiate crew with a sequential process
crew = Crew(
    agents=[weather_agent],
    tasks=[manager_task, weather_task],
    verbose=False,
    manager_agent=manager_agent,
    process=Process.hierarchical,
)


@patronus.traced("weather-crew-ai")
def main():
    result = crew.kickoff()
    print(result)


if __name__ == "__main__":
    main()

```

## Running the example

To run this example, you need to add API keys to your environment:

```shell
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[langchain]" \
    -m patronus_examples.tracking.langchain_weather

```

### Running the script directly

If you've cloned the repository, you can run the script directly:

```shell
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/langchain_weather.py

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install patronus
pip install pydantic
pip install langchain_openai
pip install langgraph
pip install langchain_core
pip install openinference-instrumentation-langchain
pip install opentelemetry-instrumentation-threading
pip install opentelemetry-instrumentation-asyncio

```

## Example overview

This example demonstrates how to use Patronus to trace a LangChain and LangGraph workflow for a weather application. The example:

1. Sets up a StateGraph with manager and weather agent nodes
1. Implements a router to control workflow transitions
1. Uses a tool to provide mock weather data
1. Traces the entire LangChain and LangGraph execution with Patronus

The example shows how Patronus can provide visibility into complex, multi-node LangGraph workflows, including tool usage and agent transitions.

## Example code

```python
# examples/patronus_examples/tracking/langchain_weather.py

from typing import Literal, Dict, List, Any
from langchain_core.messages import (
    HumanMessage,
    AIMessage,
    BaseMessage,
)
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, StateGraph
from langchain_core.tools import tool
from pydantic import BaseModel, Field

from openinference.instrumentation.langchain import LangChainInstrumentor
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor

import patronus

patronus.init(
    integrations=[
        LangChainInstrumentor(),
        ThreadingInstrumentor(),
        AsyncioInstrumentor(),
    ]
)


@tool
def get_weather(city: str) -> str:
    """Get the current weather in a given city.

    Args:
        city: The name of the city to get weather for.

    Returns:
        A string describing the current weather in the city.
    """
    return f"The weather in {city} is sunny"


class MessagesState(BaseModel):
    """State for the manager-weather agent workflow."""

    messages: List[BaseMessage] = Field(default_factory=list)
    current_agent: str = Field(default="manager")


manager_model = ChatOpenAI(temperature=0, model="gpt-4o")
weather_model = ChatOpenAI(temperature=0, model="gpt-4o")

tools = [get_weather]
tools_dict = {tool.name: tool for tool in tools}

weather_model_with_tools = weather_model.bind_tools(tools)


def manager_agent(state: MessagesState) -> Dict[str, Any]:
    messages = state.messages  # Access as attribute
    # Get response from the manager model
    response = manager_model.invoke(messages)

    # Check if the manager wants to use the weather agent
    manager_text = response.content.lower()
    if "weather" in manager_text and "in" in manager_text:
        # Delegate to weather agent
        return {
            "messages": messages
                        + [
                            AIMessage(
                                content="I'll check the weather for you. Delegating to weather agent."
                            )
                        ],
            "current_agent": "weather",
        }

    return {"messages": messages + [response], "current_agent": "manager"}


# Define the weather agent node using a simpler approach
def weather_agent(state: MessagesState) -> Dict[str, Any]:
    messages = state.messages  # Access as attribute
    human_queries = [msg for msg in messages if isinstance(msg, HumanMessage)]
    if not human_queries:
        return {
            "messages": messages + [AIMessage(content="I need a query about weather.")],
            "current_agent": "manager",
        }

    query = human_queries[-1].content

    try:
        # weather_prompt = (
        #     f"Extract the city name from this query and provide the weather: '{query}'"
        # )

        city_match = None

        # Common cities that might be mentioned
        common_cities = [
            "Paris",
            "London",
            "New York",
            "Tokyo",
            "Berlin",
            "Rome",
            "Madrid",
        ]
        for city in common_cities:
            if city.lower() in query.lower():
                city_match = city
                break

        if city_match:
            weather_result = get_weather.invoke(city_match)
            weather_response = (
                f"I checked the weather for {city_match}. {weather_result}"
            )
        else:
            if "weather in " in query.lower():
                parts = query.lower().split("weather in ")
                if len(parts) > 1:
                    city_match = parts[1].strip().split()[0].capitalize()
                    weather_result = get_weather.invoke(city_match)
                    weather_response = (
                        f"I checked the weather for {city_match}. {weather_result}"
                    )
                else:
                    weather_response = (
                        "I couldn't identify a specific city in your query."
                    )
            else:
                weather_response = "I couldn't identify a specific city in your query."

        return {
            "messages": messages
                        + [AIMessage(content=f"Weather Agent: {weather_response}")],
            "current_agent": "manager",
        }
    except Exception as e:
        error_message = f"I encountered an error while checking the weather: {str(e)}"
        return {
            "messages": messages
                        + [AIMessage(content=f"Weather Agent: {error_message}")],
            "current_agent": "manager",
        }


def router(state: MessagesState) -> Literal["manager", "weather", END]:
    if len(state.messages) > 10:  # Prevent infinite loops
        return END

    # Route based on current_agent
    if state.current_agent == "weather":
        return "weather"
    elif state.current_agent == "manager":
        # Check if the last message is from the manager and indicates completion
        if len(state.messages) > 0 and isinstance(state.messages[-1], AIMessage):
            if "delegating to weather agent" not in state.messages[-1].content.lower():
                return END

    return "manager"


workflow = StateGraph(MessagesState)
workflow.add_node("manager", manager_agent)
workflow.add_node("weather", weather_agent)

workflow.set_entry_point("manager")
workflow.add_conditional_edges("manager", router)
workflow.add_conditional_edges("weather", router)

checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)


def run_workflow(query: str):
    initial_state = MessagesState(
        messages=[HumanMessage(content=query)], current_agent="manager"
    )

    config = {"configurable": {"thread_id": "weather_demo_thread"}}
    final_state = app.invoke(initial_state, config=config)

    for message in final_state["messages"]:
        if isinstance(message, HumanMessage):
            print(f"Human: {message.content}")
        elif isinstance(message, AIMessage):
            print(f"AI: {message.content}")
        else:
            print(f"Other: {message.content}")

    return final_state


@patronus.traced("weather-langchain")
def main():
    final_state = run_workflow("What is the weather in Paris?")
    return final_state


if __name__ == "__main__":
    main()

```

## Running the example

To run this example, you need to add API keys to your environment:

```shell
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[openai-agents]" \
    -m patronus_examples.tracking.openai_agents_weather

```

### Running the script directly

If you've cloned the repository, you can run the script directly:

```shell
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/openai_agents_weather.py

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install patronus
pip install openai-agents
pip install openinference-instrumentation-openai-agents
pip install opentelemetry-instrumentation-threading
pip install opentelemetry-instrumentation-asyncio

```

## Example overview

This example demonstrates how to use Patronus to trace and monitor OpenAI Agents in an asynchronous weather application. The example:

1. Sets up a weather agent with a function tool to retrieve weather information
1. Creates a manager agent that can delegate to the weather agent
1. Handles the workflow using the OpenAI Agents Runner
1. Traces the entire agent execution flow with Patronus

The example shows how Patronus integrates with OpenAI Agents to provide visibility into agent hierarchies, tool usage, and asynchronous workflows.

## Example code

```python
# examples/patronus_examples/tracking/openai_agents_weather.py

from agents import Agent, Runner, function_tool
from openinference.instrumentation.openai_agents import OpenAIAgentsInstrumentor
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor
import asyncio
import patronus

patronus.init(
    integrations=[
        OpenAIAgentsInstrumentor(),
        ThreadingInstrumentor(),
        AsyncioInstrumentor(),
    ]
)


@function_tool
def get_weather(city: str) -> str:
    return f"The weather in {city} is sunny"


def get_agents(tools=[]):
    weather_agent = Agent(
        name="weather_agent",
        instructions="You are a helpful assistant that can call tools and return weather related information",
        model="o3-mini",
        tools=tools,
    )

    manager_agent = Agent(
        name="manager_agent",
        instructions="You are a helpful assistant that can call other agents to accomplish different tasks",
        model="o3-mini",
        handoffs=[weather_agent],
    )
    return manager_agent


@patronus.traced("weather-openai-agent")
async def main():
    manager_agent = get_agents([get_weather])
    result = await Runner.run(manager_agent, "How is the weather in Paris, France?")
    return result.final_output


if __name__ == "__main__":
    print("Starting agent...")
    result = asyncio.run(main())
    print(result)

```

## Running the example

To run this example, you need to add API keys to your environment:

```shell
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[openai]" \
    -m patronus_examples.tracking.openai_weather

```

### Running the script directly

If you've cloned the repository, you can run the script directly:

```shell
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/openai_weather.py

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install patronus
pip install openai
pip install openinference-instrumentation-openai

```

## Example overview

This example demonstrates how to use Patronus to trace OpenAI API calls when implementing a simple weather application. The application:

1. Uses the OpenAI API to parse a user question about weather
1. Extracts location coordinates from the LLM's output
1. Calls a weather API to get actual temperature data
1. Returns the result to the user

The example shows how Patronus can help you monitor and debug OpenAI API interactions, track tool usage, and visualize the entire application flow.

## Example code

```python
# examples/patronus_examples/tracking/openai_weather.py

import json

import requests
from openai import OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor
import patronus

# Initialize patronus with OpenAI Instrumentor
patronus.init(integrations=[OpenAIInstrumentor()])


def get_weather(latitude, longitude):
    response = requests.get(
        f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m"
    )
    data = response.json()
    return data["current"]["temperature_2m"]


def get_client():
    client = OpenAI()
    return client


@patronus.traced()
def call_llm(client, user_prompt):
    tools = [
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get current temperature for provided coordinates in celsius.",
            "parameters": {
                "type": "object",
                "properties": {
                    "latitude": {"type": "number"},
                    "longitude": {"type": "number"},
                },
                "required": ["latitude", "longitude"],
                "additionalProperties": False,
            },
            "strict": True,
        }
    ]

    input_messages = [{"role": "user", "content": user_prompt}]

    response = client.responses.create(
        model="gpt-4.1",
        input=input_messages,
        tools=tools,
    )
    return response


@patronus.traced("openai-weather")
def main():
    user_prompt = "What's the weather like in Paris today?"

    client = get_client()
    response = call_llm(client, user_prompt)
    print("LLM Response")
    print(response.model_dump_json())

    weather_response = None
    if response.output:
        output = response.output[0]
        if output.type == "function_call" and output.name == "get_weather":
            kwargs = json.loads(output.arguments)
            print("Weather API Response")
            weather_response = get_weather(**kwargs)
            print(weather_response)

    if weather_response:
        print(user_prompt)
        print(f"Answer: {weather_response}")


if __name__ == "__main__":
    main()

```

## Manual OpenTelemetry Tracing Example

This example demonstrates how to use OpenTelemetry (OTel) directly with OpenInference instrumenters to trace a simple OpenAI weather application **without** using Patronus SDK. This shows how to implement manual instrumentation combined with automatic instrumenters.

## Running the example

To run this example, you need to add your OpenAI API key to your environment:

```shell
export OPENAI_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example
uv run --no-cache --with "patronus-examples opentelemetry-api>=1.31.0 opentelemetry-sdk>=1.31.0 opentelemetry-exporter-otlp>=1.31.0 openinference-instrumentation-openai>=0.1.28 openai httpx>=0.27.0" \
    -m patronus_examples.tracking.otel_openai_weather

```

### Running with Patronus OTel collector

To export traces to Patronus OTel collector, set these additional environment variables:

```shell
export PATRONUS_API_KEY=your-api-key
export OTEL_EXPORTER_OTLP_ENDPOINT="https://otel.patronus.ai:4317"
export OTEL_EXPORTER_OTLP_HEADERS="x-api-key=$PATRONUS_API_KEY"

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install openai
pip install opentelemetry-api
pip install opentelemetry-sdk
pip install opentelemetry-exporter-otlp
pip install openinference-instrumentation-openai
pip install httpx

```

## Example overview

This example demonstrates how to combine manual OpenTelemetry instrumentation with OpenInference auto-instrumentation for an OpenAI-based weather application. The application:

1. Sets up a complete OpenTelemetry tracing pipeline
1. Initializes OpenInference instrumenter for OpenAI
1. Calls the OpenAI API which is automatically traced by OpenInference
1. Adds additional manual spans for non-OpenAI components
1. Makes an instrumented HTTP request using httpx to a weather API
1. Records all relevant attributes and events in spans

The example shows how to:

- Configure an OpenTelemetry TracerProvider
- Set up either console or OTLP exporters
- Initialize OpenInference instrumenters with OpenTelemetry
- Create nested manual spans for tracking operations
- Use httpx for HTTP requests with proper tracing
- Add attributes to spans for better observability
- Handle errors and exceptions in spans

## Example code

```python
# examples/patronus_examples/tracking/otel_openai_weather.py

import json
import os
import httpx
from openai import OpenAI

# OpenTelemetry imports
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes

# Import OpenInference instrumenter for OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor

# Configure OpenTelemetry
resource = Resource(attributes={
    ResourceAttributes.SERVICE_NAME: "openai-weather-app",
    ResourceAttributes.SERVICE_VERSION: "0.1.0",
})

# Initialize the trace provider with the resource
trace_provider = TracerProvider(resource=resource)

# If OTEL_EXPORTER_OTLP_ENDPOINT is not set, we'll use console exporter
# Otherwise, we'll use OTLP exporter for sending to the Patronus collector
if os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT"):
    # Configure OTLPSpanExporter
    # The environment variables OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS
    # should be set before running this example
    otlp_exporter = OTLPSpanExporter()
    trace_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
else:
    # For local development/testing we can use ConsoleSpanExporter
    from opentelemetry.sdk.trace.export import ConsoleSpanExporter
    trace_provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))

# Set the provider
trace.set_tracer_provider(trace_provider)

# Initialize OpenInference instrumenter for OpenAI
# This will automatically instrument all OpenAI API calls
openai_instrumentor = OpenAIInstrumentor()
openai_instrumentor.instrument()

# Get a tracer for our manual spans
tracer = trace.get_tracer("openai.weather.example")


def get_weather(latitude, longitude):
    """Get weather data from the Open Meteo API using httpx"""
    with tracer.start_as_current_span(
        "get_weather",
        attributes={
            "service.name": "weather_api",
            "weather.latitude": latitude,
            "weather.longitude": longitude
        }
    ) as span:
        try:
            # Create the URL with parameters
            url = "https://api.open-meteo.com/v1/forecast"
            params = {
                "latitude": latitude,
                "longitude": longitude,
                "current": "temperature_2m,wind_speed_10m",
                "hourly": "temperature_2m,relative_humidity_2m,wind_speed_10m"
            }

            # Trace the HTTP request using httpx
            with tracer.start_as_current_span(
                "http_request",
                attributes={
                    "http.method": "GET",
                    "http.url": url,
                    "http.request.query": str(params)
                }
            ):
                # Use httpx client for the request
                with httpx.Client() as client:
                    response = client.get(url, params=params)

                # Add response information to the span
                span.set_attribute("http.status_code", response.status_code)

                if response.status_code != 200:
                    span.record_exception(Exception(f"Weather API returned status {response.status_code}"))
                    span.set_status(trace.StatusCode.ERROR)
                    return None

                data = response.json()
                temperature = data["current"]["temperature_2m"]

                # Add weather data to the span
                span.set_attribute("weather.temperature_celsius", temperature)

                return temperature
        except Exception as e:
            # Record the exception in the span
            span.record_exception(e)
            span.set_status(trace.StatusCode.ERROR, str(e))
            raise


def get_client():
    """Create and return an OpenAI client"""
    with tracer.start_as_current_span("get_openai_client"):
        return OpenAI()


def call_llm(client, user_prompt):
    """Call the OpenAI API to process the user prompt

    Note: With OpenInference instrumenter, the OpenAI API call will be
    automatically traced. This function adds some additional manual spans
    for demonstration purposes.
    """
    with tracer.start_as_current_span(
        "call_llm",
        attributes={
            "ai.prompt.text": user_prompt,
            "ai.prompt.tokens": len(user_prompt.split())
        }
    ) as span:
        try:
            # Define tools available to the model
            tools = [
                {
                    "type": "function",
                    "name": "get_weather",
                    "description": "Get current temperature for provided coordinates in celsius.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "latitude": {"type": "number"},
                            "longitude": {"type": "number"},
                        },
                        "required": ["latitude", "longitude"],
                        "additionalProperties": False,
                    },
                    "strict": True,
                }
            ]

            input_messages = [{"role": "user", "content": user_prompt}]

            # The OpenAI API call will be automatically traced by OpenInference
            # We don't need to create a span for it, but we can add attributes to our parent span
            response = client.responses.create(
                model="gpt-4.1",
                input=input_messages,
                tools=tools,
            )

            # Check if the response contains a tool call
            has_tool_call = False
            if response.output and len(response.output) > 0:
                output = response.output[0]
                if output.type == "function_call":
                    has_tool_call = True
                    span.set_attribute("openai.response.tool_called", output.name)

            span.set_attribute("openai.response.has_tool_call", has_tool_call)

            return response
        except Exception as e:
            span.record_exception(e)
            span.set_status(trace.StatusCode.ERROR, str(e))
            raise


def main():
    """Main function to process the weather query"""
    with tracer.start_as_current_span("openai-weather-main") as root_span:
        user_prompt = "What's the weather like in Paris today?"
        root_span.set_attribute("query", user_prompt)

        try:
            client = get_client()
            response = call_llm(client, user_prompt)
            print("LLM Response")
            print(response.model_dump_json())

            weather_response = None
            if response.output:
                output = response.output[0]
                if output.type == "function_call" and output.name == "get_weather":
                    # Parse the arguments from the function call
                    with tracer.start_as_current_span(
                        "parse_function_call",
                        attributes={"function_name": output.name}
                    ):
                        kwargs = json.loads(output.arguments)
                        root_span.set_attribute("weather.latitude", kwargs.get("latitude"))
                        root_span.set_attribute("weather.longitude", kwargs.get("longitude"))

                    print("Weather API Response")
                    weather_response = get_weather(**kwargs)
                    print(weather_response)

            if weather_response:
                with tracer.start_as_current_span("format_weather_response"):
                    print(user_prompt)
                    formatted_answer = f"Answer: {weather_response}"
                    print(formatted_answer)
                    root_span.set_attribute("weather.answer", formatted_answer)

            # Mark the trace as successful
            root_span.set_status(trace.StatusCode.OK)

        except Exception as e:
            # Record any exceptions that occurred
            root_span.record_exception(e)
            root_span.set_status(trace.StatusCode.ERROR, str(e))
            print(f"Error: {e}")


if __name__ == "__main__":
    main()
    # Ensure all spans are exported before the program exits
    trace_provider.shutdown()

```

## Running the example

To run this example, you need to add API keys to your environment:

```shell
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[pydantic-ai]" \
    -m patronus_examples.tracking.pydanticai_weather

```

### Running the script directly

If you've cloned the repository, you can run the script directly:

```shell
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/pydanticai_weather.py

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install patronus
pip install pydantic-ai-slim[openai]
pip install opentelemetry-instrumentation-asyncio
pip install opentelemetry-instrumentation-threading

```

## Example overview

This example demonstrates how to use Patronus to trace Pydantic-AI agent interactions in an asynchronous application. The example:

1. Sets up two Pydantic-AI agents: a weather agent and a manager agent
1. Configures the weather agent with a tool to provide mock weather data
1. Configures the manager agent with a tool to call the weather agent
1. Demonstrates how to handle agent-to-agent communication

The example shows how Patronus can trace asynchronous workflows and provide visibility into multi-agent systems built with Pydantic-AI.

## Example code

```python
# examples/patronus_examples/tracking/pydanticai_weather.py

import asyncio
from pydantic_ai import Agent
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor
from patronus.integrations.pydantic_ai import PydanticAIIntegrator
import patronus

patronus.init(
    integrations=[
        PydanticAIIntegrator(),
        ThreadingInstrumentor(),
        AsyncioInstrumentor(),
    ]
)


def get_agent(system_prompt="You are a helpful assistant"):
    agent = Agent("openai:gpt-4o", output_type=str, system_prompt=system_prompt)
    return agent


@patronus.traced("weather-pydantic-ai")
async def main():
    # Create weather agent and attach tool to it
    weather_agent = get_agent(
        "You are a helpful assistant that can help with weather information."
    )

    @weather_agent.tool_plain()
    async def get_weather():
        # Mock tool output
        return (
            "Today's weather is Sunny with a forecasted high of 30°C and a low of 25°C. "
            "The wind is expected at 4 km/h."
        )

    # Create manager agent
    manager_agent = get_agent(
        "You are a helpful assistant that can coordinate with other subagents "
        "and query them for more information about topics."
    )

    # Create a tool to execute the weather agent
    @manager_agent.tool_plain()
    async def call_weather_agent():
        weather_info = await weather_agent.run("What is the weather in Paris, France?")
        return str(weather_info)

    # Run the manager
    print("Running the agent...")
    return await manager_agent.run("What is the weather in Paris, France?")


if __name__ == "__main__":
    result = asyncio.run(main())
    print(result)

```

## Running the example

To run this example, you need to add API keys to your environment:

```shell
export PATRONUS_API_KEY=your-api-key
export OPENAI_API_KEY=your-api-key

```

### Running with `uv`

You can run the example as a one-liner with zero setup:

```shell
# Remember to export environment variables before running the example.
uv run --no-cache --with "patronus-examples[smolagents]" \
    -m patronus_examples.tracking.smolagents_weather

```

### Running the script directly

If you've cloned the repository, you can run the script directly:

```shell
# Clone the repository
git clone https://github.com/patronus-ai/patronus-py.git
cd patronus-py

# Run the example script (requires uv)
./examples/patronus_examples/tracking/smolagents_weather.py

```

### Manual installation

If you prefer to copy the example code to your own project, you'll need to install these dependencies:

```shell
pip install patronus
pip install smolagents[litellm]
pip install openinference-instrumentation-smolagents
pip install opentelemetry-instrumentation-threading

```

## Example overview

This example demonstrates how to use Patronus to trace Smolagents tool calls and LLM interactions. The application:

1. Sets up a Smolagents agent with a weather tool
1. Configures a hierarchical agent structure with subagents
1. Processes a user query about weather in Paris
1. Handles the tool calling workflow automatically

The example shows how Patronus provides visibility into the agent's decision-making process, tool usage, and interaction between different agent layers.

## Example code

```python
# examples/patronus_examples/tracking/smolagents_weather.py

from datetime import datetime

from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from smolagents import LiteLLMModel, ToolCallingAgent, tool

import patronus

patronus.init(integrations=[SmolagentsInstrumentor(), ThreadingInstrumentor()])


@tool
def get_weather_api(location: str, date_time: str) -> str:
    """
    Returns the weather report.

    Args:
        location: the name of the place that you want the weather for.
            Should be a place name, followed by possibly a city name, then a country,
            like "Anchor Point, Taghazout, Morocco".
        date_time: the date and time for which you want the report, formatted as '%m/%d/%y %H:%M:%S'.
    """
    try:
        date_time = datetime.strptime(date_time, "%m/%d/%y %H:%M:%S")
    except Exception as e:
        raise ValueError(
            "Conversion of `date_time` to datetime format failed, "
            f"make sure to provide a string in format '%m/%d/%y %H:%M:%S': {e}"
        )
    temperature_celsius, risk_of_rain, wave_height = 10, 0.5, 4  # mock outputs
    return (
        f"Weather report for {location}, {date_time}: "
        f"Temperature will be {temperature_celsius}°C, "
        f"risk of rain is {risk_of_rain * 100:.0f}%, wave height is {wave_height}m."
    )


def create_agent(model_id):
    # Create weather agent
    weather_model = LiteLLMModel(model_id, temperature=0.0, top_p=1.0)
    weather_subagent = ToolCallingAgent(
        tools=[get_weather_api],
        model=weather_model,
        max_steps=10,
        name="weather_agent",
        description="This agent can provide information about the weather at a certain location",
    )

    # Create manager agent and add weather agent as subordinate
    manager_model = LiteLLMModel(model_id, temperature=0.0, top_p=1.0)
    agent = ToolCallingAgent(
        model=manager_model,
        managed_agents=[weather_subagent],
        tools=[],
        add_base_tools=False,
    )
    return agent


@patronus.traced("weather-smolagents")
def main():
    agent = create_agent("openai/gpt-4o")
    agent.run("What is the weather in Paris, France?")


if __name__ == "__main__":
    main()

```
# API Reference

# API

## patronus.api.api_client.PatronusAPIClient

```python
PatronusAPIClient(
    *,
    client_http_async: AsyncClient,
    client_http: Client,
    base_url: str,
    api_key: str,
)

```

Bases: `BaseAPIClient`

Source code in `src/patronus/api/api_client_base.py`

```python
def __init__(
    self,
    *,
    client_http_async: httpx.AsyncClient,
    client_http: httpx.Client,
    base_url: str,
    api_key: str,
):
    self.version = importlib.metadata.version("patronus")
    self.http = client_http_async
    self.http_sync = client_http
    self.base_url = base_url.rstrip("/")
    self.api_key = api_key

```

### add_evaluator_criteria_revision

```python
add_evaluator_criteria_revision(
    evaluator_criteria_id,
    request: AddEvaluatorCriteriaRevisionRequest,
) -> api_types.AddEvaluatorCriteriaRevisionResponse

```

Adds a revision to existing evaluator criteria.

Source code in `src/patronus/api/api_client.py`

```python
async def add_evaluator_criteria_revision(
    self,
    evaluator_criteria_id,
    request: api_types.AddEvaluatorCriteriaRevisionRequest,
) -> api_types.AddEvaluatorCriteriaRevisionResponse:
    """Adds a revision to existing evaluator criteria."""
    resp = await self.call(
        "POST",
        f"/v1/evaluator-criteria/{evaluator_criteria_id}/revision",
        body=request,
        response_cls=api_types.AddEvaluatorCriteriaRevisionResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### add_evaluator_criteria_revision_sync

```python
add_evaluator_criteria_revision_sync(
    evaluator_criteria_id,
    request: AddEvaluatorCriteriaRevisionRequest,
) -> api_types.AddEvaluatorCriteriaRevisionResponse

```

Adds a revision to existing evaluator criteria.

Source code in `src/patronus/api/api_client.py`

```python
def add_evaluator_criteria_revision_sync(
    self,
    evaluator_criteria_id,
    request: api_types.AddEvaluatorCriteriaRevisionRequest,
) -> api_types.AddEvaluatorCriteriaRevisionResponse:
    """Adds a revision to existing evaluator criteria."""
    resp = self.call_sync(
        "POST",
        f"/v1/evaluator-criteria/{evaluator_criteria_id}/revision",
        body=request,
        response_cls=api_types.AddEvaluatorCriteriaRevisionResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### annotate

```python
annotate(
    request: AnnotateRequest,
) -> api_types.AnnotateResponse

```

Annotates log based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def annotate(self, request: api_types.AnnotateRequest) -> api_types.AnnotateResponse:
    """Annotates log based on the given request."""
    resp = await self.call(
        "POST",
        "/v1/annotate",
        body=request,
        response_cls=api_types.AnnotateResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### annotate_sync

```python
annotate_sync(
    request: AnnotateRequest,
) -> api_types.AnnotateResponse

```

Annotates log based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def annotate_sync(self, request: api_types.AnnotateRequest) -> api_types.AnnotateResponse:
    """Annotates log based on the given request."""
    resp = self.call_sync(
        "POST",
        "/v1/annotate",
        body=request,
        response_cls=api_types.AnnotateResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### batch_create_evaluations

```python
batch_create_evaluations(
    request: BatchCreateEvaluationsRequest,
) -> api_types.BatchCreateEvaluationsResponse

```

Creates multiple evaluations in a single request.

Source code in `src/patronus/api/api_client.py`

```python
async def batch_create_evaluations(
    self, request: api_types.BatchCreateEvaluationsRequest
) -> api_types.BatchCreateEvaluationsResponse:
    """Creates multiple evaluations in a single request."""
    resp = await self.call(
        "POST",
        "/v1/evaluations/batch",
        body=request,
        response_cls=api_types.BatchCreateEvaluationsResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### batch_create_evaluations_sync

```python
batch_create_evaluations_sync(
    request: BatchCreateEvaluationsRequest,
) -> api_types.BatchCreateEvaluationsResponse

```

Creates multiple evaluations in a single request.

Source code in `src/patronus/api/api_client.py`

```python
def batch_create_evaluations_sync(
    self, request: api_types.BatchCreateEvaluationsRequest
) -> api_types.BatchCreateEvaluationsResponse:
    """Creates multiple evaluations in a single request."""
    resp = self.call_sync(
        "POST",
        "/v1/evaluations/batch",
        body=request,
        response_cls=api_types.BatchCreateEvaluationsResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### create_annotation_criteria

```python
create_annotation_criteria(
    request: CreateAnnotationCriteriaRequest,
) -> api_types.CreateAnnotationCriteriaResponse

```

Creates annotation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def create_annotation_criteria(
    self, request: api_types.CreateAnnotationCriteriaRequest
) -> api_types.CreateAnnotationCriteriaResponse:
    """Creates annotation criteria based on the given request."""
    resp = await self.call(
        "POST",
        "/v1/annotation-criteria",
        body=request,
        response_cls=api_types.CreateAnnotationCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### create_annotation_criteria_sync

```python
create_annotation_criteria_sync(
    request: CreateAnnotationCriteriaRequest,
) -> api_types.CreateAnnotationCriteriaResponse

```

Creates annotation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def create_annotation_criteria_sync(
    self, request: api_types.CreateAnnotationCriteriaRequest
) -> api_types.CreateAnnotationCriteriaResponse:
    """Creates annotation criteria based on the given request."""
    resp = self.call_sync(
        "POST",
        "/v1/annotation-criteria",
        body=request,
        response_cls=api_types.CreateAnnotationCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### create_criteria

```python
create_criteria(
    request: CreateCriteriaRequest,
) -> api_types.CreateCriteriaResponse

```

Creates evaluation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def create_criteria(self, request: api_types.CreateCriteriaRequest) -> api_types.CreateCriteriaResponse:
    """Creates evaluation criteria based on the given request."""
    resp = await self.call(
        "POST",
        "/v1/evaluator-criteria",
        body=request,
        response_cls=api_types.CreateCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### create_criteria_sync

```python
create_criteria_sync(
    request: CreateCriteriaRequest,
) -> api_types.CreateCriteriaResponse

```

Creates evaluation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def create_criteria_sync(self, request: api_types.CreateCriteriaRequest) -> api_types.CreateCriteriaResponse:
    """Creates evaluation criteria based on the given request."""
    resp = self.call_sync(
        "POST",
        "/v1/evaluator-criteria",
        body=request,
        response_cls=api_types.CreateCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### create_experiment

```python
create_experiment(
    request: CreateExperimentRequest,
) -> api_types.Experiment

```

Creates a new experiment based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def create_experiment(self, request: api_types.CreateExperimentRequest) -> api_types.Experiment:
    """Creates a new experiment based on the given request."""
    resp = await self.call(
        "POST",
        "/v1/experiments",
        body=request,
        response_cls=api_types.CreateExperimentResponse,
    )
    resp.raise_for_status()
    return resp.data.experiment

```

### create_experiment_sync

```python
create_experiment_sync(
    request: CreateExperimentRequest,
) -> api_types.Experiment

```

Creates a new experiment based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def create_experiment_sync(self, request: api_types.CreateExperimentRequest) -> api_types.Experiment:
    """Creates a new experiment based on the given request."""
    resp = self.call_sync(
        "POST",
        "/v1/experiments",
        body=request,
        response_cls=api_types.CreateExperimentResponse,
    )
    resp.raise_for_status()
    return resp.data.experiment

```

### create_project

```python
create_project(
    request: CreateProjectRequest,
) -> api_types.Project

```

Creates a new project based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def create_project(self, request: api_types.CreateProjectRequest) -> api_types.Project:
    """Creates a new project based on the given request."""
    resp = await self.call("POST", "/v1/projects", body=request, response_cls=api_types.Project)
    resp.raise_for_status()
    return resp.data

```

### create_project_sync

```python
create_project_sync(
    request: CreateProjectRequest,
) -> api_types.Project

```

Creates a new project based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def create_project_sync(self, request: api_types.CreateProjectRequest) -> api_types.Project:
    """Creates a new project based on the given request."""
    resp = self.call_sync("POST", "/v1/projects", body=request, response_cls=api_types.Project)
    resp.raise_for_status()
    return resp.data

```

### delete_annotation_criteria

```python
delete_annotation_criteria(criteria_id: str) -> None

```

Deletes annotation criteria by its ID.

Source code in `src/patronus/api/api_client.py`

```python
async def delete_annotation_criteria(self, criteria_id: str) -> None:
    """Deletes annotation criteria by its ID."""
    resp = await self.call(
        "DELETE",
        f"/v1/annotation-criteria/{criteria_id}",
        response_cls=None,
    )
    resp.raise_for_status()

```

### delete_annotation_criteria_sync

```python
delete_annotation_criteria_sync(criteria_id: str) -> None

```

Deletes annotation criteria by its ID.

Source code in `src/patronus/api/api_client.py`

```python
def delete_annotation_criteria_sync(self, criteria_id: str) -> None:
    """Deletes annotation criteria by its ID."""
    resp = self.call_sync(
        "DELETE",
        f"/v1/annotation-criteria/{criteria_id}",
        response_cls=None,
    )
    resp.raise_for_status()

```

### evaluate

```python
evaluate(
    request: EvaluateRequest,
) -> api_types.EvaluateResponse

```

Evaluates content using the specified evaluators.

Source code in `src/patronus/api/api_client.py`

```python
async def evaluate(self, request: api_types.EvaluateRequest) -> api_types.EvaluateResponse:
    """Evaluates content using the specified evaluators."""
    resp = await self.call(
        "POST",
        "/v1/evaluate",
        body=request,
        response_cls=api_types.EvaluateResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### evaluate_one

```python
evaluate_one(
    request: EvaluateRequest,
) -> api_types.EvaluationResult

```

Evaluates content using a single evaluator.

Source code in `src/patronus/api/api_client.py`

```python
async def evaluate_one(self, request: api_types.EvaluateRequest) -> api_types.EvaluationResult:
    """Evaluates content using a single evaluator."""
    if len(request.evaluators) > 1:
        raise ValueError("'evaluate_one()' cannot accept more than one evaluator in the request body")
    resp = await self.call(
        "POST",
        "/v1/evaluate",
        body=request,
        response_cls=api_types.EvaluateResponse,
    )
    return self._evaluate_one_process_resp(resp)

```

### evaluate_one_sync

```python
evaluate_one_sync(
    request: EvaluateRequest,
) -> api_types.EvaluationResult

```

Evaluates content using a single evaluator.

Source code in `src/patronus/api/api_client.py`

```python
def evaluate_one_sync(self, request: api_types.EvaluateRequest) -> api_types.EvaluationResult:
    """Evaluates content using a single evaluator."""
    if len(request.evaluators) > 1:
        raise ValueError("'evaluate_one_sync()' cannot accept more than one evaluator in the request body")
    resp = self.call_sync(
        "POST",
        "/v1/evaluate",
        body=request,
        response_cls=api_types.EvaluateResponse,
    )
    return self._evaluate_one_process_resp(resp)

```

### evaluate_sync

```python
evaluate_sync(
    request: EvaluateRequest,
) -> api_types.EvaluateResponse

```

Evaluates content using the specified evaluators.

Source code in `src/patronus/api/api_client.py`

```python
def evaluate_sync(self, request: api_types.EvaluateRequest) -> api_types.EvaluateResponse:
    """Evaluates content using the specified evaluators."""
    resp = self.call_sync(
        "POST",
        "/v1/evaluate",
        body=request,
        response_cls=api_types.EvaluateResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### export_evaluations

```python
export_evaluations(
    request: ExportEvaluationRequest,
) -> api_types.ExportEvaluationResponse

```

Exports evaluations based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def export_evaluations(
    self, request: api_types.ExportEvaluationRequest
) -> api_types.ExportEvaluationResponse:
    """Exports evaluations based on the given request."""
    resp = await self.call(
        "POST",
        "/v1/evaluation-results/batch",
        body=request,
        response_cls=api_types.ExportEvaluationResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### export_evaluations_sync

```python
export_evaluations_sync(
    request: ExportEvaluationRequest,
) -> api_types.ExportEvaluationResponse

```

Exports evaluations based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def export_evaluations_sync(self, request: api_types.ExportEvaluationRequest) -> api_types.ExportEvaluationResponse:
    """Exports evaluations based on the given request."""
    resp = self.call_sync(
        "POST",
        "/v1/evaluation-results/batch",
        body=request,
        response_cls=api_types.ExportEvaluationResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### get_experiment

```python
get_experiment(
    experiment_id: str,
) -> Optional[api_types.Experiment]

```

Fetches an experiment by its ID or returns None if not found.

Source code in `src/patronus/api/api_client.py`

```python
async def get_experiment(self, experiment_id: str) -> Optional[api_types.Experiment]:
    """Fetches an experiment by its ID or returns None if not found."""
    resp = await self.call(
        "GET",
        f"/v1/experiments/{experiment_id}",
        response_cls=api_types.GetExperimentResponse,
    )
    if resp.response.status_code == 404:
        return None
    resp.raise_for_status()
    return resp.data.experiment

```

### get_experiment_sync

```python
get_experiment_sync(
    experiment_id: str,
) -> Optional[api_types.Experiment]

```

Fetches an experiment by its ID or returns None if not found.

Source code in `src/patronus/api/api_client.py`

```python
def get_experiment_sync(self, experiment_id: str) -> Optional[api_types.Experiment]:
    """Fetches an experiment by its ID or returns None if not found."""
    resp = self.call_sync(
        "GET",
        f"/v1/experiments/{experiment_id}",
        response_cls=api_types.GetExperimentResponse,
    )
    if resp.response.status_code == 404:
        return None
    resp.raise_for_status()
    return resp.data.experiment

```

### get_project

```python
get_project(project_id: str) -> api_types.Project

```

Fetches a project by its ID.

Source code in `src/patronus/api/api_client.py`

```python
async def get_project(self, project_id: str) -> api_types.Project:
    """Fetches a project by its ID."""
    resp = await self.call(
        "GET",
        f"/v1/projects/{project_id}",
        response_cls=api_types.GetProjectResponse,
    )
    resp.raise_for_status()
    return resp.data.project

```

### get_project_sync

```python
get_project_sync(project_id: str) -> api_types.Project

```

Fetches a project by its ID.

Source code in `src/patronus/api/api_client.py`

```python
def get_project_sync(self, project_id: str) -> api_types.Project:
    """Fetches a project by its ID."""
    resp = self.call_sync(
        "GET",
        f"/v1/projects/{project_id}",
        response_cls=api_types.GetProjectResponse,
    )
    resp.raise_for_status()
    return resp.data.project

```

### list_annotation_criteria

```python
list_annotation_criteria(
    *,
    project_id: Optional[str] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
) -> api_types.ListAnnotationCriteriaResponse

```

Retrieves a list of annotation criteria with optional filtering.

Source code in `src/patronus/api/api_client.py`

```python
async def list_annotation_criteria(
    self, *, project_id: Optional[str] = None, limit: Optional[int] = None, offset: Optional[int] = None
) -> api_types.ListAnnotationCriteriaResponse:
    """Retrieves a list of annotation criteria with optional filtering."""
    params = {}
    if project_id is not None:
        params["project_id"] = project_id
    if limit is not None:
        params["limit"] = limit
    if offset is not None:
        params["offset"] = offset
    resp = await self.call(
        "GET",
        "/v1/annotation-criteria",
        params=params,
        response_cls=api_types.ListAnnotationCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### list_annotation_criteria_sync

```python
list_annotation_criteria_sync(
    *,
    project_id: Optional[str] = None,
    limit: Optional[int] = None,
    offset: Optional[int] = None,
) -> api_types.ListAnnotationCriteriaResponse

```

Retrieves a list of annotation criteria with optional filtering.

Source code in `src/patronus/api/api_client.py`

```python
def list_annotation_criteria_sync(
    self, *, project_id: Optional[str] = None, limit: Optional[int] = None, offset: Optional[int] = None
) -> api_types.ListAnnotationCriteriaResponse:
    """Retrieves a list of annotation criteria with optional filtering."""
    params = {}
    if project_id is not None:
        params["project_id"] = project_id
    if limit is not None:
        params["limit"] = limit
    if offset is not None:
        params["offset"] = offset
    resp = self.call_sync(
        "GET",
        "/v1/annotation-criteria",
        params=params,
        response_cls=api_types.ListAnnotationCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### list_criteria

```python
list_criteria(
    request: ListCriteriaRequest,
) -> api_types.ListCriteriaResponse

```

Retrieves a list of evaluation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def list_criteria(self, request: api_types.ListCriteriaRequest) -> api_types.ListCriteriaResponse:
    """Retrieves a list of evaluation criteria based on the given request."""
    params = request.model_dump(exclude_none=True)
    resp = await self.call(
        "GET",
        "/v1/evaluator-criteria",
        params=params,
        response_cls=api_types.ListCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### list_criteria_sync

```python
list_criteria_sync(
    request: ListCriteriaRequest,
) -> api_types.ListCriteriaResponse

```

Retrieves a list of evaluation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def list_criteria_sync(self, request: api_types.ListCriteriaRequest) -> api_types.ListCriteriaResponse:
    """Retrieves a list of evaluation criteria based on the given request."""
    params = request.model_dump(exclude_none=True)
    resp = self.call_sync(
        "GET",
        "/v1/evaluator-criteria",
        params=params,
        response_cls=api_types.ListCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### list_dataset_data

```python
list_dataset_data(
    dataset_id: str,
) -> api_types.ListDatasetData

```

Retrieves data from a dataset by its ID.

Source code in `src/patronus/api/api_client.py`

```python
async def list_dataset_data(self, dataset_id: str) -> api_types.ListDatasetData:
    """Retrieves data from a dataset by its ID."""
    resp = await self.call(
        "GET",
        f"/v1/datasets/{dataset_id}/data",
        response_cls=api_types.ListDatasetData,
    )
    resp.raise_for_status()
    return resp.data

```

### list_dataset_data_sync

```python
list_dataset_data_sync(
    dataset_id: str,
) -> api_types.ListDatasetData

```

Retrieves data from a dataset by its ID.

Source code in `src/patronus/api/api_client.py`

```python
def list_dataset_data_sync(self, dataset_id: str) -> api_types.ListDatasetData:
    """Retrieves data from a dataset by its ID."""
    resp = self.call_sync(
        "GET",
        f"/v1/datasets/{dataset_id}/data",
        response_cls=api_types.ListDatasetData,
    )
    resp.raise_for_status()
    return resp.data

```

### list_datasets

```python
list_datasets(
    dataset_type: Optional[str] = None,
) -> list[api_types.Dataset]

```

Retrieves a list of datasets, optionally filtered by type.

Source code in `src/patronus/api/api_client.py`

```python
async def list_datasets(self, dataset_type: Optional[str] = None) -> list[api_types.Dataset]:
    """
    Retrieves a list of datasets, optionally filtered by type.
    """
    params = {}
    if dataset_type is not None:
        params["type"] = dataset_type

    resp = await self.call(
        "GET",
        "/v1/datasets",
        params=params,
        response_cls=api_types.ListDatasetsResponse,
    )
    resp.raise_for_status()
    return resp.data.datasets

```

### list_datasets_sync

```python
list_datasets_sync(
    dataset_type: Optional[str] = None,
) -> list[api_types.Dataset]

```

Retrieves a list of datasets, optionally filtered by type.

Source code in `src/patronus/api/api_client.py`

```python
def list_datasets_sync(self, dataset_type: Optional[str] = None) -> list[api_types.Dataset]:
    """
    Retrieves a list of datasets, optionally filtered by type.
    """
    params = {}
    if dataset_type is not None:
        params["type"] = dataset_type

    resp = self.call_sync(
        "GET",
        "/v1/datasets",
        params=params,
        response_cls=api_types.ListDatasetsResponse,
    )
    resp.raise_for_status()
    return resp.data.datasets

```

### list_evaluators

```python
list_evaluators(
    by_alias_or_id: Optional[str] = None,
) -> list[api_types.Evaluator]

```

Retrieves a list of available evaluators.

Source code in `src/patronus/api/api_client.py`

```python
async def list_evaluators(self, by_alias_or_id: Optional[str] = None) -> list[api_types.Evaluator]:
    """Retrieves a list of available evaluators."""
    params = {}
    if by_alias_or_id:
        params["by_alias_or_id"] = by_alias_or_id

    resp = await self.call("GET", "/v1/evaluators", params=params, response_cls=api_types.ListEvaluatorsResponse)
    resp.raise_for_status()
    return resp.data.evaluators

```

### list_evaluators_sync

```python
list_evaluators_sync(
    by_alias_or_id: Optional[str] = None,
) -> list[api_types.Evaluator]

```

Retrieves a list of available evaluators.

Source code in `src/patronus/api/api_client.py`

```python
def list_evaluators_sync(self, by_alias_or_id: Optional[str] = None) -> list[api_types.Evaluator]:
    """Retrieves a list of available evaluators."""
    params = {}
    if by_alias_or_id:
        params["by_alias_or_id"] = by_alias_or_id

    resp = self.call_sync("GET", "/v1/evaluators", params=params, response_cls=api_types.ListEvaluatorsResponse)
    resp.raise_for_status()
    return resp.data.evaluators

```

### search_evaluations

```python
search_evaluations(
    request: SearchEvaluationsRequest,
) -> api_types.SearchEvaluationsResponse

```

Searches for evaluations based on the given criteria.

Source code in `src/patronus/api/api_client.py`

```python
async def search_evaluations(
    self, request: api_types.SearchEvaluationsRequest
) -> api_types.SearchEvaluationsResponse:
    """Searches for evaluations based on the given criteria."""
    resp = await self.call(
        "POST",
        "/v1/evaluations/search",
        body=request,
        response_cls=api_types.SearchEvaluationsResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### search_evaluations_sync

```python
search_evaluations_sync(
    request: SearchEvaluationsRequest,
) -> api_types.SearchEvaluationsResponse

```

Searches for evaluations based on the given criteria.

Source code in `src/patronus/api/api_client.py`

```python
def search_evaluations_sync(
    self, request: api_types.SearchEvaluationsRequest
) -> api_types.SearchEvaluationsResponse:
    """Searches for evaluations based on the given criteria."""
    resp = self.call_sync(
        "POST",
        "/v1/evaluations/search",
        body=request,
        response_cls=api_types.SearchEvaluationsResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### search_logs

```python
search_logs(
    request: SearchLogsRequest,
) -> api_types.SearchLogsResponse

```

Searches for logs based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def search_logs(self, request: api_types.SearchLogsRequest) -> api_types.SearchLogsResponse:
    """Searches for logs based on the given request."""
    resp = await self.call(
        "POST",
        "/v1/otel/logs/search",
        body=request,
        response_cls=api_types.SearchLogsResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### search_logs_sync

```python
search_logs_sync(
    request: SearchLogsRequest,
) -> api_types.SearchLogsResponse

```

Searches for logs based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def search_logs_sync(self, request: api_types.SearchLogsRequest) -> api_types.SearchLogsResponse:
    """Searches for logs based on the given request."""
    resp = self.call_sync(
        "POST",
        "/v1/otel/logs/search",
        body=request,
        response_cls=api_types.SearchLogsResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### update_annotation_criteria

```python
update_annotation_criteria(
    criteria_id: str,
    request: UpdateAnnotationCriteriaRequest,
) -> api_types.UpdateAnnotationCriteriaResponse

```

Creates annotation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def update_annotation_criteria(
    self, criteria_id: str, request: api_types.UpdateAnnotationCriteriaRequest
) -> api_types.UpdateAnnotationCriteriaResponse:
    """Creates annotation criteria based on the given request."""
    resp = await self.call(
        "PUT",
        f"/v1/annotation-criteria/{criteria_id}",
        body=request,
        response_cls=api_types.UpdateAnnotationCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### update_annotation_criteria_sync

```python
update_annotation_criteria_sync(
    criteria_id: str,
    request: UpdateAnnotationCriteriaRequest,
) -> api_types.UpdateAnnotationCriteriaResponse

```

Creates annotation criteria based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def update_annotation_criteria_sync(
    self, criteria_id: str, request: api_types.UpdateAnnotationCriteriaRequest
) -> api_types.UpdateAnnotationCriteriaResponse:
    """Creates annotation criteria based on the given request."""
    resp = self.call_sync(
        "PUT",
        f"/v1/annotation-criteria/{criteria_id}",
        body=request,
        response_cls=api_types.UpdateAnnotationCriteriaResponse,
    )
    resp.raise_for_status()
    return resp.data

```

### update_experiment

```python
update_experiment(
    experiment_id: str, request: UpdateExperimentRequest
) -> api_types.Experiment

```

Updates an existing experiment based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
async def update_experiment(
    self, experiment_id: str, request: api_types.UpdateExperimentRequest
) -> api_types.Experiment:
    """Updates an existing experiment based on the given request."""
    resp = await self.call(
        "POST",
        f"/v1/experiments/{experiment_id}",
        body=request,
        response_cls=api_types.UpdateExperimentResponse,
    )
    resp.raise_for_status()
    return resp.data.experiment

```

### update_experiment_sync

```python
update_experiment_sync(
    experiment_id: str, request: UpdateExperimentRequest
) -> api_types.Experiment

```

Updates an existing experiment based on the given request.

Source code in `src/patronus/api/api_client.py`

```python
def update_experiment_sync(
    self, experiment_id: str, request: api_types.UpdateExperimentRequest
) -> api_types.Experiment:
    """Updates an existing experiment based on the given request."""
    resp = self.call_sync(
        "POST",
        f"/v1/experiments{experiment_id}",
        body=request,
        response_cls=api_types.UpdateExperimentResponse,
    )
    resp.raise_for_status()
    return resp.data.experiment

```

### upload_dataset

```python
upload_dataset(
    file_path: str,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[
        dict[str, Union[str, list[str]]]
    ] = None,
) -> api_types.Dataset

```

Upload a dataset file to create a new dataset in Patronus.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `file_path` | `str` | Path to the dataset file (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` |

Returns:

| Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset |

Source code in `src/patronus/api/api_client.py`

```python
async def upload_dataset(
    self,
    file_path: str,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None,
) -> api_types.Dataset:
    """
    Upload a dataset file to create a new dataset in Patronus.

    Args:
        file_path: Path to the dataset file (CSV or JSONL format)
        dataset_name: Name for the created dataset
        dataset_description: Optional description for the dataset
        custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset

    Returns:
        Dataset object representing the created dataset
    """
    with open(file_path, "rb") as f:
        return await self.upload_dataset_from_buffer(f, dataset_name, dataset_description, custom_field_mapping)

```

### upload_dataset_from_buffer

```python
upload_dataset_from_buffer(
    file_obj: BinaryIO,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[
        dict[str, Union[str, list[str]]]
    ] = None,
) -> api_types.Dataset

```

Upload a dataset file to create a new dataset in Patronus AI Platform.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `file_obj` | `BinaryIO` | File-like object containing dataset content (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` |

Returns:

| Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset |

Source code in `src/patronus/api/api_client.py`

```python
async def upload_dataset_from_buffer(
    self,
    file_obj: typing.BinaryIO,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None,
) -> api_types.Dataset:
    """
    Upload a dataset file to create a new dataset in Patronus AI Platform.

    Args:
        file_obj: File-like object containing dataset content (CSV or JSONL format)
        dataset_name: Name for the created dataset
        dataset_description: Optional description for the dataset
        custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset

    Returns:
        Dataset object representing the created dataset
    """
    data = {
        "dataset_name": dataset_name,
    }

    if dataset_description is not None:
        data["dataset_description"] = dataset_description

    if custom_field_mapping is not None:
        data["custom_field_mapping"] = json.dumps(custom_field_mapping)

    files = {"file": (dataset_name, file_obj)}

    resp = await self.call_multipart(
        "POST",
        "/v1/datasets",
        files=files,
        data=data,
        response_cls=api_types.CreateDatasetResponse,
    )

    resp.raise_for_status()
    return resp.data.dataset

```

### upload_dataset_from_buffer_sync

```python
upload_dataset_from_buffer_sync(
    file_obj: BinaryIO,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[
        dict[str, Union[str, list[str]]]
    ] = None,
) -> api_types.Dataset

```

Upload a dataset file to create a new dataset in Patronus AI Platform.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `file_obj` | `BinaryIO` | File-like object containing dataset content (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` |

Returns:

| Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset |

Source code in `src/patronus/api/api_client.py`

```python
def upload_dataset_from_buffer_sync(
    self,
    file_obj: typing.BinaryIO,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None,
) -> api_types.Dataset:
    """
    Upload a dataset file to create a new dataset in Patronus AI Platform.

    Args:
        file_obj: File-like object containing dataset content (CSV or JSONL format)
        dataset_name: Name for the created dataset
        dataset_description: Optional description for the dataset
        custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset

    Returns:
        Dataset object representing the created dataset
    """
    data = {
        "dataset_name": dataset_name,
    }

    if dataset_description is not None:
        data["dataset_description"] = dataset_description

    if custom_field_mapping is not None:
        data["custom_field_mapping"] = json.dumps(custom_field_mapping)

    files = {"file": (dataset_name, file_obj)}

    resp = self.call_multipart_sync(
        "POST",
        "/v1/datasets",
        files=files,
        data=data,
        response_cls=api_types.CreateDatasetResponse,
    )

    resp.raise_for_status()
    return resp.data.dataset

```

### upload_dataset_sync

```python
upload_dataset_sync(
    file_path: str,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[
        dict[str, Union[str, list[str]]]
    ] = None,
) -> api_types.Dataset

```

Upload a dataset file to create a new dataset in Patronus AI Platform.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `file_path` | `str` | Path to the dataset file (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` |

Returns:

| Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset |

Source code in `src/patronus/api/api_client.py`

```python
def upload_dataset_sync(
    self,
    file_path: str,
    dataset_name: str,
    dataset_description: Optional[str] = None,
    custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None,
) -> api_types.Dataset:
    """
    Upload a dataset file to create a new dataset in Patronus AI Platform.

    Args:
        file_path: Path to the dataset file (CSV or JSONL format)
        dataset_name: Name for the created dataset
        dataset_description: Optional description for the dataset
        custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset

    Returns:
        Dataset object representing the created dataset
    """
    with open(file_path, "rb") as f:
        return self.upload_dataset_from_buffer_sync(f, dataset_name, dataset_description, custom_field_mapping)

```

### whoami

```python
whoami() -> api_types.WhoAmIResponse

```

Fetches information about the authenticated user.

Source code in `src/patronus/api/api_client.py`

```python
async def whoami(self) -> api_types.WhoAmIResponse:
    """Fetches information about the authenticated user."""
    resp = await self.call("GET", "/v1/whoami", response_cls=api_types.WhoAmIResponse)
    resp.raise_for_status()
    return resp.data

```

### whoami_sync

```python
whoami_sync() -> api_types.WhoAmIResponse

```

Fetches information about the authenticated user.

Source code in `src/patronus/api/api_client.py`

```python
def whoami_sync(self) -> api_types.WhoAmIResponse:
    """Fetches information about the authenticated user."""
    resp = self.call_sync("GET", "/v1/whoami", response_cls=api_types.WhoAmIResponse)
    resp.raise_for_status()
    return resp.data

```

## patronus.api.api_types

### SanitizedApp

```python
SanitizedApp = Annotated[
    str,
    _create_field_sanitizer(
        "[^a-zA-Z0-9-_./ -]", max_len=50, replace_with="_"
    ),
]

```

### SanitizedLocalEvaluatorID

```python
SanitizedLocalEvaluatorID = Annotated[
    Optional[str],
    _create_field_sanitizer(
        "[^a-zA-Z0-9\\-_./]", max_len=50, replace_with="-"
    ),
]

```

### SanitizedProjectName

```python
SanitizedProjectName = Annotated[
    str, project_name_sanitizer
]

```

### project_name_sanitizer

```python
project_name_sanitizer = (
    _create_field_sanitizer(
        "[^a-zA-Z0-9_ -]", max_len=50, replace_with="_"
    ),
)

```

### Account

Bases: `BaseModel`

#### id

```python
id: str

```

#### name

```python
name: str

```

### AddEvaluatorCriteriaRevisionRequest

Bases: `BaseModel`

#### config

```python
config: dict[str, Any]

```

### AddEvaluatorCriteriaRevisionResponse

Bases: `BaseModel`

#### evaluator_criteria

```python
evaluator_criteria: EvaluatorCriteria

```

### AnnotateRequest

Bases: `BaseModel`

#### annotation_criteria_id

```python
annotation_criteria_id: str

```

#### explanation

```python
explanation: Optional[str] = None

```

#### log_id

```python
log_id: str

```

#### value_pass

```python
value_pass: Optional[bool] = None

```

#### value_score

```python
value_score: Optional[float] = None

```

#### value_text

```python
value_text: Optional[str] = None

```

### AnnotateResponse

Bases: `BaseModel`

#### evaluation

```python
evaluation: Evaluation

```

### AnnotationCategory

Bases: `BaseModel`

#### label

```python
label: Optional[str] = None

```

#### score

```python
score: Optional[float] = None

```

### AnnotationCriteria

Bases: `BaseModel`

#### annotation_type

```python
annotation_type: AnnotationType

```

#### categories

```python
categories: Optional[list[AnnotationCategory]] = None

```

#### created_at

```python
created_at: datetime

```

#### description

```python
description: Optional[str] = None

```

#### id

```python
id: str

```

#### name

```python
name: str

```

#### project_id

```python
project_id: str

```

#### updated_at

```python
updated_at: datetime

```

### AnnotationType

Bases: `str`, `Enum`

#### binary

```python
binary = 'binary'

```

#### categorical

```python
categorical = 'categorical'

```

#### continuous

```python
continuous = 'continuous'

```

#### discrete

```python
discrete = 'discrete'

```

#### text_annotation

```python
text_annotation = 'text_annotation'

```

### BatchCreateEvaluationsRequest

Bases: `BaseModel`

#### evaluations

```python
evaluations: list[ClientEvaluation] = Field(
    min_length=1, max_length=1000
)

```

### BatchCreateEvaluationsResponse

Bases: `BaseModel`

#### evaluations

```python
evaluations: list[Evaluation]

```

### ClientEvaluation

Bases: `BaseModel`

#### app

```python
app: Optional[SanitizedApp] = None

```

#### created_at

```python
created_at: Optional[datetime] = None

```

#### criteria

```python
criteria: Optional[str] = None

```

#### dataset_id

```python
dataset_id: Optional[str] = None

```

#### dataset_sample_id

```python
dataset_sample_id: Optional[str] = None

```

#### evaluation_duration

```python
evaluation_duration: Optional[timedelta] = None

```

#### evaluator_id

```python
evaluator_id: SanitizedLocalEvaluatorID

```

#### experiment_id

```python
experiment_id: Optional[str] = None

```

#### explanation

```python
explanation: Optional[str] = None

```

#### explanation_duration

```python
explanation_duration: Optional[timedelta] = None

```

#### log_id

```python
log_id: UUID

```

#### metadata

```python
metadata: Optional[dict[str, Any]] = None

```

#### metric_description

```python
metric_description: Optional[str] = None

```

#### metric_name

```python
metric_name: Optional[str] = None

```

#### pass\_

```python
pass_: Optional[bool] = Field(
    default=None, serialization_alias="pass"
)

```

#### project_id

```python
project_id: Optional[str] = None

```

#### project_name

```python
project_name: Optional[SanitizedProjectName] = None

```

#### score

```python
score: Optional[float] = None

```

#### span_id

```python
span_id: Optional[str] = None

```

#### tags

```python
tags: Optional[dict[str, str]] = None

```

#### text_output

```python
text_output: Optional[str] = None

```

#### trace_id

```python
trace_id: Optional[str] = None

```

### CreateAnnotationCriteriaRequest

Bases: `BaseModel`

#### annotation_type

```python
annotation_type: AnnotationType

```

#### categories

```python
categories: Optional[list[AnnotationCategory]] = None

```

#### description

```python
description: Optional[str] = None

```

#### name

```python
name: str = Field(min_length=1, max_length=100)

```

#### project_id

```python
project_id: str

```

### CreateAnnotationCriteriaResponse

Bases: `BaseModel`

#### annotation_criteria

```python
annotation_criteria: AnnotationCriteria

```

### CreateCriteriaRequest

Bases: `BaseModel`

#### config

```python
config: dict[str, Any]

```

#### evaluator_family

```python
evaluator_family: str

```

#### name

```python
name: str

```

### CreateCriteriaResponse

Bases: `BaseModel`

#### evaluator_criteria

```python
evaluator_criteria: EvaluatorCriteria

```

### CreateDatasetResponse

Bases: `BaseModel`

#### dataset

```python
dataset: Dataset

```

#### dataset_id

```python
dataset_id: str

```

### CreateExperimentRequest

Bases: `BaseModel`

#### metadata

```python
metadata: Optional[dict[str, Any]] = None

```

#### name

```python
name: str

```

#### project_id

```python
project_id: str

```

#### tags

```python
tags: dict[str, str] = Field(default_factory=dict)

```

### CreateExperimentResponse

Bases: `BaseModel`

#### experiment

```python
experiment: Experiment

```

### CreateProjectRequest

Bases: `BaseModel`

#### name

```python
name: SanitizedProjectName

```

### Dataset

Bases: `BaseModel`

#### created_at

```python
created_at: datetime

```

#### creation_at

```python
creation_at: Optional[datetime] = None

```

#### description

```python
description: Optional[str] = None

```

#### id

```python
id: str

```

#### name

```python
name: str

```

#### samples

```python
samples: int

```

#### type

```python
type: str

```

### DatasetDatum

Bases: `BaseModel`

#### dataset_id

```python
dataset_id: str

```

#### evaluated_model_gold_answer

```python
evaluated_model_gold_answer: Optional[str] = None

```

#### evaluated_model_input

```python
evaluated_model_input: Optional[str] = None

```

#### evaluated_model_output

```python
evaluated_model_output: Optional[str] = None

```

#### evaluated_model_retrieved_context

```python
evaluated_model_retrieved_context: Optional[list[str]] = (
    None
)

```

#### evaluated_model_system_prompt

```python
evaluated_model_system_prompt: Optional[str] = None

```

#### meta_evaluated_model_name

```python
meta_evaluated_model_name: Optional[str] = None

```

#### meta_evaluated_model_params

```python
meta_evaluated_model_params: Optional[
    dict[str, Union[str, int, float]]
] = None

```

#### meta_evaluated_model_provider

```python
meta_evaluated_model_provider: Optional[str] = None

```

#### meta_evaluated_model_selected_model

```python
meta_evaluated_model_selected_model: Optional[str] = None

```

#### sid

```python
sid: int

```

### EvaluateEvaluator

Bases: `BaseModel`

#### criteria

```python
criteria: Optional[str] = None

```

#### evaluator

```python
evaluator: str

```

#### explain_strategy

```python
explain_strategy: str = 'always'

```

### EvaluateRequest

Bases: `BaseModel`

#### app

```python
app: Optional[str] = None

```

#### capture

```python
capture: str = 'all'

```

#### dataset_id

```python
dataset_id: Optional[str] = None

```

#### dataset_sample_id

```python
dataset_sample_id: Optional[str] = None

```

#### evaluated_model_attachments

```python
evaluated_model_attachments: Optional[
    list[EvaluatedModelAttachment]
] = None

```

#### evaluated_model_gold_answer

```python
evaluated_model_gold_answer: Optional[str] = None

```

#### evaluated_model_input

```python
evaluated_model_input: Optional[str] = None

```

#### evaluated_model_output

```python
evaluated_model_output: Optional[str] = None

```

#### evaluated_model_retrieved_context

```python
evaluated_model_retrieved_context: Optional[
    Union[list[str], str]
] = None

```

#### evaluated_model_system_prompt

```python
evaluated_model_system_prompt: Optional[str] = None

```

#### evaluators

```python
evaluators: list[EvaluateEvaluator] = Field(min_length=1)

```

#### experiment_id

```python
experiment_id: Optional[str] = None

```

#### log_id

```python
log_id: Optional[str] = None

```

#### project_id

```python
project_id: Optional[str] = None

```

#### project_name

```python
project_name: Optional[str] = None

```

#### span_id

```python
span_id: Optional[str] = None

```

#### tags

```python
tags: Optional[dict[str, str]] = None

```

#### trace_id

```python
trace_id: Optional[str] = None

```

### EvaluateResponse

Bases: `BaseModel`

#### results

```python
results: list[EvaluateResult]

```

### EvaluateResult

Bases: `BaseModel`

#### criteria

```python
criteria: str

```

#### error_message

```python
error_message: Optional[str]

```

#### evaluation_result

```python
evaluation_result: Optional[EvaluationResult]

```

#### evaluator_id

```python
evaluator_id: str

```

#### status

```python
status: str

```

### EvaluatedModelAttachment

Bases: `BaseModel`

#### media_type

```python
media_type: str

```

#### url

```python
url: str

```

#### usage_type

```python
usage_type: Optional[str] = 'evaluated_model_input'

```

### Evaluation

Bases: `BaseModel`

#### annotation_criteria_id

```python
annotation_criteria_id: Optional[str] = None

```

#### app

```python
app: Optional[str] = None

```

#### created_at

```python
created_at: datetime

```

#### criteria

```python
criteria: Optional[str] = None

```

#### criteria_id

```python
criteria_id: Optional[str] = None

```

#### dataset_id

```python
dataset_id: Optional[str] = None

```

#### dataset_sample_id

```python
dataset_sample_id: Optional[str] = None

```

#### evaluation_duration

```python
evaluation_duration: Optional[timedelta] = None

```

#### evaluation_type

```python
evaluation_type: Optional[str] = None

```

#### evaluator_family

```python
evaluator_family: Optional[str] = None

```

#### evaluator_id

```python
evaluator_id: Optional[str] = None

```

#### experiment_id

```python
experiment_id: Optional[int] = None

```

#### explain_strategy

```python
explain_strategy: Optional[str] = None

```

#### explanation

```python
explanation: Optional[str] = None

```

#### explanation_duration

```python
explanation_duration: Optional[timedelta] = None

```

#### id

```python
id: int

```

#### log_id

```python
log_id: str

```

#### metadata

```python
metadata: Optional[dict[str, Any]] = None

```

#### metric_description

```python
metric_description: Optional[str] = None

```

#### metric_name

```python
metric_name: Optional[str] = None

```

#### pass\_

```python
pass_: Optional[bool] = Field(default=None, alias='pass')

```

#### project_id

```python
project_id: Optional[str] = None

```

#### score

```python
score: Optional[float] = None

```

#### span_id

```python
span_id: Optional[str] = None

```

#### tags

```python
tags: Optional[dict[str, str]] = None

```

#### text_output

```python
text_output: Optional[str] = None

```

#### trace_id

```python
trace_id: Optional[str] = None

```

#### usage

```python
usage: Optional[dict[str, Any]] = None

```

### EvaluationResult

Bases: `BaseModel`

#### additional_info

```python
additional_info: Optional[dict[str, Any]] = None

```

#### app

```python
app: Optional[str] = None

```

#### created_at

```python
created_at: Optional[AwareDatetime] = None

```

#### criteria

```python
criteria: str

```

#### dataset_id

```python
dataset_id: Optional[str] = None

```

#### dataset_sample_id

```python
dataset_sample_id: Optional[int] = None

```

#### evaluated_model_gold_answer

```python
evaluated_model_gold_answer: Optional[str] = None

```

#### evaluated_model_input

```python
evaluated_model_input: Optional[str] = None

```

#### evaluated_model_output

```python
evaluated_model_output: Optional[str] = None

```

#### evaluated_model_retrieved_context

```python
evaluated_model_retrieved_context: Optional[list[str]] = (
    None
)

```

#### evaluated_model_system_prompt

```python
evaluated_model_system_prompt: Optional[str] = None

```

#### evaluation_duration

```python
evaluation_duration: Optional[timedelta] = None

```

#### evaluation_metadata

```python
evaluation_metadata: Optional[dict] = None

```

#### evaluator_family

```python
evaluator_family: str

```

#### evaluator_id

```python
evaluator_id: str

```

#### evaluator_profile_public_id

```python
evaluator_profile_public_id: str

```

#### experiment_id

```python
experiment_id: Optional[str] = None

```

#### explanation

```python
explanation: Optional[str] = None

```

#### explanation_duration

```python
explanation_duration: Optional[timedelta] = None

```

#### id

```python
id: Optional[str] = None

```

#### pass\_

```python
pass_: Optional[bool] = Field(default=None, alias='pass')

```

#### project_id

```python
project_id: Optional[str] = None

```

#### score_raw

```python
score_raw: Optional[float] = None

```

#### tags

```python
tags: Optional[dict[str, str]] = None

```

#### text_output

```python
text_output: Optional[str] = None

```

### Evaluator

Bases: `BaseModel`

#### aliases

```python
aliases: Optional[list[str]]

```

#### default_criteria

```python
default_criteria: Optional[str] = None

```

#### evaluator_family

```python
evaluator_family: Optional[str]

```

#### id

```python
id: str

```

#### name

```python
name: str

```

### EvaluatorCriteria

Bases: `BaseModel`

#### config

```python
config: Optional[dict[str, Any]]

```

#### created_at

```python
created_at: datetime

```

#### description

```python
description: Optional[str]

```

#### evaluator_family

```python
evaluator_family: str

```

#### is_patronus_managed

```python
is_patronus_managed: bool

```

#### name

```python
name: str

```

#### public_id

```python
public_id: str

```

#### revision

```python
revision: int

```

### Experiment

Bases: `BaseModel`

#### id

```python
id: str

```

#### metadata

```python
metadata: Optional[dict[str, Any]] = None

```

#### name

```python
name: str

```

#### project_id

```python
project_id: str

```

#### tags

```python
tags: Optional[dict[str, str]] = None

```

### ExportEvaluationRequest

Bases: `BaseModel`

#### evaluation_results

```python
evaluation_results: list[ExportEvaluationResult]

```

### ExportEvaluationResponse

Bases: `BaseModel`

#### evaluation_results

```python
evaluation_results: list[ExportEvaluationResultPartial]

```

### ExportEvaluationResult

Bases: `BaseModel`

#### app

```python
app: Optional[str] = None

```

#### criteria

```python
criteria: Optional[str] = None

```

#### dataset_id

```python
dataset_id: Optional[str] = None

```

#### dataset_sample_id

```python
dataset_sample_id: Optional[int] = None

```

#### evaluated_model_attachments

```python
evaluated_model_attachments: Optional[
    list[EvaluatedModelAttachment]
] = None

```

#### evaluated_model_gold_answer

```python
evaluated_model_gold_answer: Optional[str] = None

```

#### evaluated_model_input

```python
evaluated_model_input: Optional[str] = None

```

#### evaluated_model_name

```python
evaluated_model_name: Optional[str] = None

```

#### evaluated_model_output

```python
evaluated_model_output: Optional[str] = None

```

#### evaluated_model_params

```python
evaluated_model_params: Optional[
    dict[str, Union[str, int, float]]
] = None

```

#### evaluated_model_provider

```python
evaluated_model_provider: Optional[str] = None

```

#### evaluated_model_retrieved_context

```python
evaluated_model_retrieved_context: Optional[list[str]] = (
    None
)

```

#### evaluated_model_selected_model

```python
evaluated_model_selected_model: Optional[str] = None

```

#### evaluated_model_system_prompt

```python
evaluated_model_system_prompt: Optional[str] = None

```

#### evaluation_duration

```python
evaluation_duration: Optional[timedelta] = None

```

#### evaluation_metadata

```python
evaluation_metadata: Optional[dict[str, Any]] = None

```

#### evaluator_id

```python
evaluator_id: SanitizedLocalEvaluatorID

```

#### experiment_id

```python
experiment_id: Optional[str] = None

```

#### explanation

```python
explanation: Optional[str] = None

```

#### explanation_duration

```python
explanation_duration: Optional[timedelta] = None

```

#### pass\_

```python
pass_: Optional[bool] = Field(
    default=None, serialization_alias="pass"
)

```

#### score_raw

```python
score_raw: Optional[float] = None

```

#### tags

```python
tags: Optional[dict[str, str]] = None

```

#### text_output

```python
text_output: Optional[str] = None

```

### ExportEvaluationResultPartial

Bases: `BaseModel`

#### app

```python
app: Optional[str]

```

#### created_at

```python
created_at: AwareDatetime

```

#### evaluator_id

```python
evaluator_id: str

```

#### id

```python
id: str

```

### GetAnnotationCriteriaResponse

Bases: `BaseModel`

#### annotation_criteria

```python
annotation_criteria: AnnotationCriteria

```

### GetEvaluationResponse

Bases: `BaseModel`

#### evaluation

```python
evaluation: Evaluation

```

### GetExperimentResponse

Bases: `BaseModel`

#### experiment

```python
experiment: Experiment

```

### GetProjectResponse

Bases: `BaseModel`

#### project

```python
project: Project

```

### ListAnnotationCriteriaResponse

Bases: `BaseModel`

#### annotation_criteria

```python
annotation_criteria: list[AnnotationCriteria]

```

### ListCriteriaRequest

Bases: `BaseModel`

#### evaluator_family

```python
evaluator_family: Optional[str] = None

```

#### evaluator_id

```python
evaluator_id: Optional[str] = None

```

#### get_last_revision

```python
get_last_revision: bool = False

```

#### is_patronus_managed

```python
is_patronus_managed: Optional[bool] = None

```

#### limit

```python
limit: int = 1000

```

#### name

```python
name: Optional[str] = None

```

#### offset

```python
offset: int = 0

```

#### public_id

```python
public_id: Optional[str] = None

```

#### revision

```python
revision: Optional[str] = None

```

### ListCriteriaResponse

Bases: `BaseModel`

#### evaluator_criteria

```python
evaluator_criteria: list[EvaluatorCriteria]

```

### ListDatasetData

Bases: `BaseModel`

#### data

```python
data: list[DatasetDatum]

```

### ListDatasetsResponse

Bases: `BaseModel`

#### datasets

```python
datasets: list[Dataset]

```

### ListEvaluatorsResponse

Bases: `BaseModel`

#### evaluators

```python
evaluators: list[Evaluator]

```

### Log

Bases: `BaseModel`

#### body

```python
body: Any = None

```

#### log_attributes

```python
log_attributes: Optional[dict[str, str]] = None

```

#### resource_attributes

```python
resource_attributes: Optional[dict[str, str]] = None

```

#### resource_schema_url

```python
resource_schema_url: Optional[str] = None

```

#### scope_attributes

```python
scope_attributes: Optional[dict[str, str]] = None

```

#### scope_name

```python
scope_name: Optional[str] = None

```

#### scope_schema_url

```python
scope_schema_url: Optional[str] = None

```

#### scope_version

```python
scope_version: Optional[str] = None

```

#### service_name

```python
service_name: Optional[str] = None

```

#### severity_number

```python
severity_number: Optional[int] = None

```

#### severity_test

```python
severity_test: Optional[str] = None

```

#### span_id

```python
span_id: Optional[str] = None

```

#### timestamp

```python
timestamp: Optional[datetime] = None

```

#### trace_flags

```python
trace_flags: Optional[int] = None

```

#### trace_id

```python
trace_id: Optional[str] = None

```

### Project

Bases: `BaseModel`

#### id

```python
id: str

```

#### name

```python
name: str

```

### SearchEvaluationsFilter

Bases: `BaseModel`

#### and\_

```python
and_: Optional[list[SearchEvaluationsFilter]] = None

```

#### field

```python
field: Optional[str] = None

```

#### operation

```python
operation: Optional[str] = None

```

#### or\_

```python
or_: Optional[list[SearchEvaluationsFilter]] = None

```

#### value

```python
value: Optional[Any] = None

```

### SearchEvaluationsRequest

Bases: `BaseModel`

#### filters

```python
filters: Optional[list[SearchEvaluationsFilter]] = None

```

### SearchEvaluationsResponse

Bases: `BaseModel`

#### evaluations

```python
evaluations: list[Evaluation]

```

### SearchLogsFilter

Bases: `BaseModel`

#### and\_

```python
and_: Optional[list[SearchLogsFilter]] = None

```

#### field

```python
field: Optional[str] = None

```

#### op

```python
op: Optional[str] = None

```

#### or\_

```python
or_: Optional[list[SearchLogsFilter]] = None

```

#### value

```python
value: Optional[Any] = None

```

### SearchLogsRequest

Bases: `BaseModel`

#### filters

```python
filters: Optional[list[SearchLogsFilter]] = None

```

#### limit

```python
limit: int = 1000

```

#### order

```python
order: str = 'timestamp desc'

```

### SearchLogsResponse

Bases: `BaseModel`

#### logs

```python
logs: list[Log]

```

### UpdateAnnotationCriteriaRequest

Bases: `BaseModel`

#### annotation_type

```python
annotation_type: AnnotationType

```

#### categories

```python
categories: Optional[list[AnnotationCategory]] = None

```

#### description

```python
description: Optional[str] = None

```

#### name

```python
name: str = Field(min_length=1, max_length=100)

```

### UpdateAnnotationCriteriaResponse

Bases: `BaseModel`

#### annotation_criteria

```python
annotation_criteria: AnnotationCriteria

```

### UpdateExperimentRequest

Bases: `BaseModel`

#### metadata

```python
metadata: dict[str, Any]

```

### UpdateExperimentResponse

Bases: `BaseModel`

#### experiment

```python
experiment: Experiment

```

### WhoAmIAPIKey

Bases: `BaseModel`

#### account

```python
account: Account

```

#### id

```python
id: str

```

### WhoAmICaller

Bases: `BaseModel`

#### api_key

```python
api_key: WhoAmIAPIKey

```

### WhoAmIResponse

Bases: `BaseModel`

#### caller

```python
caller: WhoAmICaller

```

### sanitize_field

```python
sanitize_field(max_length: int, sub_pattern: str)

```

Source code in `src/patronus/api/api_types.py`

```python
def sanitize_field(max_length: int, sub_pattern: str):
    def wrapper(value: str) -> str:
        if not value:
            return value
        value = value[:max_length]
        return re.sub(sub_pattern, "_", value).strip()

    return wrapper

```

# Config

## patronus.config

### Config

Bases: `BaseSettings`

Configuration settings for the Patronus SDK.

This class defines all available configuration options with their default values and handles loading configuration from environment variables and YAML files.

Configuration sources are checked in this order:

1. Code-specified values
1. Environment variables (with prefix PATRONUS\_)
1. YAML configuration file (patronus.yaml)
1. Default values

Attributes:

| Name | Type | Description | | --- | --- | --- | | `service` | `str` | The name of the service or application component. Defaults to OTEL_SERVICE_NAME env var or platform.node(). | | `api_key` | `Optional[str]` | Authentication key for Patronus services. | | `api_url` | `str` | URL for the Patronus API service. Default: https://api.patronus.ai | | `otel_endpoint` | `str` | Endpoint for OpenTelemetry data collection. Default: https://otel.patronus.ai:4317 | | `otel_exporter_otlp_protocol` | `Optional[Literal['grpc', 'http/protobuf']]` | OpenTelemetry exporter protocol. Values: grpc, http/protobuf. Falls back to standard OTEL environment variables if not set. | | `ui_url` | `str` | URL for the Patronus UI. Default: https://app.patronus.ai | | `timeout_s` | `int` | Timeout in seconds for HTTP requests. Default: 300 | | `project_name` | `str` | Name of the project for organizing evaluations and experiments. Default: Global | | `app` | `str` | Name of the application within the project. Default: default |

### config

```python
config() -> Config

```

Returns the Patronus SDK configuration singleton.

Configuration is loaded from environment variables and the patronus.yaml file (if present) when this function is first called.

Returns:

| Name | Type | Description | | --- | --- | --- | | `Config` | `Config` | A singleton Config object containing all Patronus configuration settings. |

Example

```python
from patronus.config import config

# Get the configuration
cfg = config()

# Access configuration values
api_key = cfg.api_key
project_name = cfg.project_name

```

Source code in `src/patronus/config.py`

````python
@functools.lru_cache()
def config() -> Config:
    """
    Returns the Patronus SDK configuration singleton.

    Configuration is loaded from environment variables and the patronus.yaml file
    (if present) when this function is first called.

    Returns:
        Config: A singleton Config object containing all Patronus configuration settings.

    Example:
        ```python
        from patronus.config import config

        # Get the configuration
        cfg = config()

        # Access configuration values
        api_key = cfg.api_key
        project_name = cfg.project_name
        ```
    """
    cfg = Config()
    return cfg

````

# Context

## patronus.context

Context management for Patronus SDK.

This module provides classes and utility functions for managing the global Patronus context and accessing different components of the SDK like logging, tracing, and API clients.

### PatronusScope

```python
PatronusScope(
    service: Optional[str],
    project_name: Optional[str],
    app: Optional[str],
    experiment_id: Optional[str],
    experiment_name: Optional[str],
)

```

Scope information for Patronus context.

Defines the scope of the current Patronus application or experiment.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `service` | `Optional[str]` | The service name as defined in OTeL. | | `project_name` | `Optional[str]` | The project name. | | `app` | `Optional[str]` | The application name. | | `experiment_id` | `Optional[str]` | The unique identifier for the experiment. | | `experiment_name` | `Optional[str]` | The name of the experiment. |

### PromptsConfig

```python
PromptsConfig(
    directory: Path,
    providers: list[str],
    templating_engine: str,
)

```

#### directory

```python
directory: Path

```

The absolute path to a directory where prompts are stored locally.

#### providers

```python
providers: list[str]

```

List of default prompt providers.

#### templating_engine

```python
templating_engine: str

```

Default prompt templating engine.

### PatronusContext

```python
PatronusContext(
    scope: PatronusScope,
    tracer_provider: TracerProvider,
    logger_provider: LoggerProvider,
    api_client_deprecated: PatronusAPIClient,
    api_client: Client,
    async_api_client: AsyncClient,
    exporter: BatchEvaluationExporter,
    prompts: PromptsConfig,
)

```

Context object for Patronus SDK.

Contains all the necessary components for the SDK to function properly.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `scope` | `PatronusScope` | Scope information for this context. | | `tracer_provider` | `TracerProvider` | The OpenTelemetry tracer provider. | | `logger_provider` | `LoggerProvider` | The OpenTelemetry logger provider. | | `api_client_deprecated` | `PatronusAPIClient` | Client for Patronus API communication (deprecated). | | `api_client` | `Client` | Client for Patronus API communication using the modern client. | | `async_api_client` | `AsyncClient` | Asynchronous client for Patronus API communication. | | `exporter` | `BatchEvaluationExporter` | Exporter for batch evaluation results. | | `prompts` | `PromptsConfig` | Configuration for prompt management. |

### set_global_patronus_context

```python
set_global_patronus_context(ctx: PatronusContext)

```

Set the global Patronus context.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context to set globally. | *required* |

Source code in `src/patronus/context/__init__.py`

```python
def set_global_patronus_context(ctx: PatronusContext):
    """
    Set the global Patronus context.

    Args:
        ctx: The Patronus context to set globally.
    """
    _CTX_PAT.set_global(ctx)

```

### get_current_context_or_none

```python
get_current_context_or_none() -> Optional[PatronusContext]

```

Get the current Patronus context or None if not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[PatronusContext]` | The current PatronusContext if set, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_current_context_or_none() -> Optional[PatronusContext]:
    """
    Get the current Patronus context or None if not initialized.

    Returns:
        The current PatronusContext if set, otherwise None.
    """
    return _CTX_PAT.get()

```

### get_current_context

```python
get_current_context() -> PatronusContext

```

Get the current Patronus context.

Returns:

| Type | Description | | --- | --- | | `PatronusContext` | The current PatronusContext. |

Raises:

| Type | Description | | --- | --- | | `UninitializedError` | If no active Patronus context is found. |

Source code in `src/patronus/context/__init__.py`

```python
def get_current_context() -> PatronusContext:
    """
    Get the current Patronus context.

    Returns:
        The current PatronusContext.

    Raises:
        UninitializedError: If no active Patronus context is found.
    """
    ctx = get_current_context_or_none()
    if ctx is None:
        raise UninitializedError(
            "No active Patronus context found. Please initialize the library by calling patronus.init()."
        )
    return ctx

```

### get_logger

```python
get_logger(
    ctx: Optional[PatronusContext] = None,
    level: int = logging.INFO,
) -> logging.Logger

```

Get a standard Python logger configured with the Patronus context.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | | `level` | `int` | The logging level to set. Defaults to INFO. | `INFO` |

Returns:

| Type | Description | | --- | --- | | `Logger` | A configured Python logger. |

Source code in `src/patronus/context/__init__.py`

```python
def get_logger(ctx: Optional[PatronusContext] = None, level: int = logging.INFO) -> logging.Logger:
    """
    Get a standard Python logger configured with the Patronus context.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.
        level: The logging level to set. Defaults to INFO.

    Returns:
        A configured Python logger.
    """
    from patronus.tracing.logger import set_logger_handler

    ctx = ctx or get_current_context()

    logger = logging.getLogger("patronus.sdk")
    set_logger_handler(logger, ctx.scope, ctx.logger_provider)
    logger.setLevel(level)
    return logger

```

### get_logger_or_none

```python
get_logger_or_none(
    level: int = logging.INFO,
) -> Optional[logging.Logger]

```

Get a standard Python logger or None if context is not initialized.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `level` | `int` | The logging level to set. Defaults to INFO. | `INFO` |

Returns:

| Type | Description | | --- | --- | | `Optional[Logger]` | A configured Python logger if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_logger_or_none(level: int = logging.INFO) -> Optional[logging.Logger]:
    """
    Get a standard Python logger or None if context is not initialized.

    Args:
        level: The logging level to set. Defaults to INFO.

    Returns:
        A configured Python logger if context is available, otherwise None.
    """
    ctx = get_current_context()
    if ctx is None:
        return None
    return get_logger(ctx, level=level)

```

### get_pat_logger

```python
get_pat_logger(
    ctx: Optional[PatronusContext] = None,
) -> PatLogger

```

Get a Patronus logger.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `Logger` | A Patronus logger. |

Source code in `src/patronus/context/__init__.py`

```python
def get_pat_logger(ctx: Optional[PatronusContext] = None) -> "PatLogger":
    """
    Get a Patronus logger.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        A Patronus logger.
    """
    ctx = ctx or get_current_context()
    return ctx.logger_provider.get_logger("patronus.sdk")

```

### get_pat_logger_or_none

```python
get_pat_logger_or_none() -> Optional[PatLogger]

```

Get a Patronus logger or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[Logger]` | A Patronus logger if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_pat_logger_or_none() -> Optional["PatLogger"]:
    """
    Get a Patronus logger or None if context is not initialized.

    Returns:
        A Patronus logger if context is available, otherwise None.
    """
    ctx = get_current_context_or_none()
    if ctx is None:
        return None

    return ctx.logger_provider.get_logger("patronus.sdk")

```

### get_tracer

```python
get_tracer(
    ctx: Optional[PatronusContext] = None,
) -> trace.Tracer

```

Get an OpenTelemetry tracer.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `Tracer` | An OpenTelemetry tracer. |

Source code in `src/patronus/context/__init__.py`

```python
def get_tracer(ctx: Optional[PatronusContext] = None) -> trace.Tracer:
    """
    Get an OpenTelemetry tracer.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        An OpenTelemetry tracer.
    """
    ctx = ctx or get_current_context()
    return ctx.tracer_provider.get_tracer("patronus.sdk")

```

### get_tracer_or_none

```python
get_tracer_or_none() -> Optional[trace.Tracer]

```

Get an OpenTelemetry tracer or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[Tracer]` | An OpenTelemetry tracer if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_tracer_or_none() -> Optional[trace.Tracer]:
    """
    Get an OpenTelemetry tracer or None if context is not initialized.

    Returns:
        An OpenTelemetry tracer if context is available, otherwise None.
    """
    ctx = get_current_context_or_none()
    if ctx is None:
        return None
    return ctx.tracer_provider.get_tracer("patronus.sdk")

```

### get_api_client_deprecated

```python
get_api_client_deprecated(
    ctx: Optional[PatronusContext] = None,
) -> PatronusAPIClient

```

Get the Patronus API client.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `PatronusAPIClient` | The Patronus API client. |

Source code in `src/patronus/context/__init__.py`

```python
def get_api_client_deprecated(ctx: Optional[PatronusContext] = None) -> "PatronusAPIClient":
    """
    Get the Patronus API client.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        The Patronus API client.
    """
    ctx = ctx or get_current_context()
    return ctx.api_client_deprecated

```

### get_api_client_deprecated_or_none

```python
get_api_client_deprecated_or_none() -> Optional[
    PatronusAPIClient
]

```

Get the Patronus API client or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[PatronusAPIClient]` | The Patronus API client if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_api_client_deprecated_or_none() -> Optional["PatronusAPIClient"]:
    """
    Get the Patronus API client or None if context is not initialized.

    Returns:
        The Patronus API client if context is available, otherwise None.
    """
    return (ctx := get_current_context_or_none()) and ctx.api_client_deprecated

```

### get_api_client

```python
get_api_client(
    ctx: Optional[PatronusContext] = None,
) -> patronus_api.Client

```

Get the Patronus API client.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `Client` | The Patronus API client. |

Source code in `src/patronus/context/__init__.py`

```python
def get_api_client(ctx: Optional[PatronusContext] = None) -> patronus_api.Client:
    """
    Get the Patronus API client.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        The Patronus API client.
    """
    ctx = ctx or get_current_context()
    return ctx.api_client

```

### get_api_client_or_none

```python
get_api_client_or_none() -> Optional[patronus_api.Client]

```

Get the Patronus API client or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[Client]` | The Patronus API client if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_api_client_or_none() -> Optional[patronus_api.Client]:
    """
    Get the Patronus API client or None if context is not initialized.

    Returns:
        The Patronus API client if context is available, otherwise None.
    """
    return (ctx := get_current_context_or_none()) and ctx.api_client

```

### get_async_api_client

```python
get_async_api_client(
    ctx: Optional[PatronusContext] = None,
) -> patronus_api.AsyncClient

```

Get the asynchronous Patronus API client.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `AsyncClient` | The asynchronous Patronus API client. |

Source code in `src/patronus/context/__init__.py`

```python
def get_async_api_client(ctx: Optional[PatronusContext] = None) -> patronus_api.AsyncClient:
    """
    Get the asynchronous Patronus API client.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        The asynchronous Patronus API client.
    """
    ctx = ctx or get_current_context()
    return ctx.async_api_client

```

### get_async_api_client_or_none

```python
get_async_api_client_or_none() -> Optional[
    patronus_api.AsyncClient
]

```

Get the asynchronous Patronus API client or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[AsyncClient]` | The asynchronous Patronus API client if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_async_api_client_or_none() -> Optional[patronus_api.AsyncClient]:
    """
    Get the asynchronous Patronus API client or None if context is not initialized.

    Returns:
        The asynchronous Patronus API client if context is available, otherwise None.
    """
    return (ctx := get_current_context_or_none()) and ctx.async_api_client

```

### get_exporter

```python
get_exporter(
    ctx: Optional[PatronusContext] = None,
) -> BatchEvaluationExporter

```

Get the batch evaluation exporter.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `BatchEvaluationExporter` | The batch evaluation exporter. |

Source code in `src/patronus/context/__init__.py`

```python
def get_exporter(ctx: Optional[PatronusContext] = None) -> "BatchEvaluationExporter":
    """
    Get the batch evaluation exporter.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        The batch evaluation exporter.
    """
    ctx = ctx or get_current_context()
    return ctx.exporter

```

### get_exporter_or_none

```python
get_exporter_or_none() -> Optional[BatchEvaluationExporter]

```

Get the batch evaluation exporter or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[BatchEvaluationExporter]` | The batch evaluation exporter if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_exporter_or_none() -> Optional["BatchEvaluationExporter"]:
    """
    Get the batch evaluation exporter or None if context is not initialized.

    Returns:
        The batch evaluation exporter if context is available, otherwise None.
    """
    return (ctx := get_current_context_or_none()) and ctx.exporter

```

### get_scope

```python
get_scope(
    ctx: Optional[PatronusContext] = None,
) -> PatronusScope

```

Get the Patronus scope.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `PatronusScope` | The Patronus scope. |

Source code in `src/patronus/context/__init__.py`

```python
def get_scope(ctx: Optional[PatronusContext] = None) -> PatronusScope:
    """
    Get the Patronus scope.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        The Patronus scope.
    """
    ctx = ctx or get_current_context()
    return ctx.scope

```

### get_scope_or_none

```python
get_scope_or_none() -> Optional[PatronusScope]

```

Get the Patronus scope or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[PatronusScope]` | The Patronus scope if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_scope_or_none() -> Optional[PatronusScope]:
    """
    Get the Patronus scope or None if context is not initialized.

    Returns:
        The Patronus scope if context is available, otherwise None.
    """
    return (ctx := get_current_context_or_none()) and ctx.scope

```

### get_prompts_config

```python
get_prompts_config(
    ctx: Optional[PatronusContext] = None,
) -> PromptsConfig

```

Get the Patronus prompts configuration.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` |

Returns:

| Type | Description | | --- | --- | | `PromptsConfig` | The Patronus prompts configuration. |

Source code in `src/patronus/context/__init__.py`

```python
def get_prompts_config(ctx: Optional[PatronusContext] = None) -> PromptsConfig:
    """
    Get the Patronus prompts configuration.

    Args:
        ctx: The Patronus context to use. If None, uses the current context.

    Returns:
        The Patronus prompts configuration.
    """
    ctx = ctx or get_current_context()
    return ctx.prompts

```

### get_prompts_config_or_none

```python
get_prompts_config_or_none() -> Optional[PromptsConfig]

```

Get the Patronus prompts configuration or None if context is not initialized.

Returns:

| Type | Description | | --- | --- | | `Optional[PromptsConfig]` | The Patronus prompts configuration if context is available, otherwise None. |

Source code in `src/patronus/context/__init__.py`

```python
def get_prompts_config_or_none() -> Optional[PromptsConfig]:
    """
    Get the Patronus prompts configuration or None if context is not initialized.

    Returns:
        The Patronus prompts configuration if context is available, otherwise None.
    """
    return (ctx := get_current_context_or_none()) and ctx.prompts

```

# Datasets

## patronus.datasets

### datasets

#### Attachment

Bases: `TypedDict`

Represent an attachment entry. Usually used in context of multimodal evaluation.

#### Fields

Bases: `TypedDict`

A TypedDict class representing fields for a structured data entity.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `sid` | `NotRequired[Optional[str]]` | An optional identifier for the system or session. | | `system_prompt` | `NotRequired[Optional[str]]` | An optional string representing the system prompt associated with the task. | | `task_context` | `NotRequired[Union[str, list[str], None]]` | Optional contextual information for the task in the form of a string or a list of strings. | | `task_attachments` | `NotRequired[Optional[list[Attachment]]]` | Optional list of attachments associated with the task. | | `task_input` | `NotRequired[Optional[str]]` | An optional string representing the input data for the task. Usually a user input sent to an LLM. | | `task_output` | `NotRequired[Optional[str]]` | An optional string representing the output result of the task. Usually a response from an LLM. | | `gold_answer` | `NotRequired[Optional[str]]` | An optional string representing the correct or expected answer for evaluation purposes. | | `task_metadata` | `NotRequired[Optional[dict[str, Any]]]` | Optional dictionary containing metadata associated with the task. | | `tags` | `NotRequired[Optional[dict[str, str]]]` | Optional dictionary holding additional key-value pair tags relevant to the task. |

#### Row

```python
Row(_row: Series)

```

Represents a data row encapsulating access to properties in a pandas Series.

Provides attribute-based access to underlying pandas Series data with properties that ensure compatibility with structured evaluators through consistent field naming and type handling.

#### Dataset

```python
Dataset(dataset_id: Optional[str], df: DataFrame)

```

Represents a dataset.

##### from_records

```python
from_records(
    records: Union[
        Iterable[Fields], Iterable[dict[str, Any]]
    ],
    dataset_id: Optional[str] = None,
) -> te.Self

```

Creates an instance of the class by processing and sanitizing provided records and optionally associating them with a specific dataset ID.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `records` | `Union[Iterable[Fields], Iterable[dict[str, Any]]]` | A collection of records to initialize the instance. Each record can either be an instance of Fields or a dictionary containing corresponding data. | *required* | | `dataset_id` | `Optional[str]` | An optional identifier for associating the data with a specific dataset. | `None` |

Returns:

| Type | Description | | --- | --- | | `Self` | te.Self: A new instance of the class with the processed and sanitized data. |

Source code in `src/patronus/datasets/datasets.py`

```python
@classmethod
def from_records(
    cls,
    records: Union[typing.Iterable[Fields], typing.Iterable[dict[str, typing.Any]]],
    dataset_id: Optional[str] = None,
) -> te.Self:
    """
    Creates an instance of the class by processing and sanitizing provided records
    and optionally associating them with a specific dataset ID.

    Args:
        records:
            A collection of records to initialize the instance. Each record can either
            be an instance of `Fields` or a dictionary containing corresponding data.
        dataset_id:
            An optional identifier for associating the data with a specific dataset.

    Returns:
        te.Self: A new instance of the class with the processed and sanitized data.
    """
    df = pd.DataFrame.from_records(records)
    df = cls.__sanitize_df(df, dataset_id)
    return cls(df=df, dataset_id=dataset_id)

```

##### to_csv

```python
to_csv(
    path_or_buf: Union[str, Path, IO[AnyStr]], **kwargs: Any
) -> Optional[str]

```

Saves dataset to a CSV file.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `path_or_buf` | `Union[str, Path, IO[AnyStr]]` | String path or file-like object where the CSV will be saved. | *required* | | `**kwargs` | `Any` | Additional arguments passed to pandas.DataFrame.to_csv(). | `{}` |

Returns:

| Type | Description | | --- | --- | | `Optional[str]` | String path if a path was specified and return_path is True, otherwise None. |

Source code in `src/patronus/datasets/datasets.py`

```python
def to_csv(
    self, path_or_buf: Union[str, pathlib.Path, typing.IO[typing.AnyStr]], **kwargs: typing.Any
) -> Optional[str]:
    """
    Saves dataset to a CSV file.

    Args:
        path_or_buf: String path or file-like object where the CSV will be saved.
        **kwargs: Additional arguments passed to pandas.DataFrame.to_csv().

    Returns:
        String path if a path was specified and return_path is True, otherwise None.
    """
    return self.df.to_csv(path_or_buf, **kwargs)

```

#### DatasetLoader

```python
DatasetLoader(
    loader: Union[
        Awaitable[Dataset], Callable[[], Awaitable[Dataset]]
    ],
)

```

Encapsulates asynchronous loading of a dataset.

This class provides a mechanism to lazily load a dataset asynchronously only once, using a provided dataset loader function.

Source code in `src/patronus/datasets/datasets.py`

```python
def __init__(self, loader: Union[typing.Awaitable[Dataset], typing.Callable[[], typing.Awaitable[Dataset]]]):
    self.__lock = asyncio.Lock()
    self.__loader = loader
    self.dataset: Optional[Dataset] = None

```

##### load

```python
load() -> Dataset

```

Load dataset. Repeated calls will return already loaded dataset.

Source code in `src/patronus/datasets/datasets.py`

```python
async def load(self) -> Dataset:
    """
    Load dataset. Repeated calls will return already loaded dataset.
    """
    async with self.__lock:
        if self.dataset is not None:
            return self.dataset
        if inspect.iscoroutinefunction(self.__loader):
            self.dataset = await self.__loader()
        else:
            self.dataset = await self.__loader
        return self.dataset

```

#### read_csv

```python
read_csv(
    filename_or_buffer: Union[str, Path, IO[AnyStr]],
    *,
    dataset_id: Optional[str] = None,
    sid_field: str = "sid",
    system_prompt_field: str = "system_prompt",
    task_input_field: str = "task_input",
    task_context_field: str = "task_context",
    task_attachments_field: str = "task_attachments",
    task_output_field: str = "task_output",
    gold_answer_field: str = "gold_answer",
    task_metadata_field: str = "task_metadata",
    tags_field: str = "tags",
    **kwargs: Any,
) -> Dataset

```

Reads a CSV file and converts it into a Dataset object. The CSV file is transformed into a structured dataset where each field maps to a specific aspect of the dataset schema provided via function arguments. You may specify custom field mappings as per your dataset structure, while additional keyword arguments are passed directly to the underlying 'pd.read_csv' function.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `filename_or_buffer` | `Union[str, Path, IO[AnyStr]]` | Path to the CSV file or a file-like object containing the dataset to be read. | *required* | | `dataset_id` | `Optional[str]` | Optional identifier for the dataset being read. Default is None. | `None` | | `sid_field` | `str` | Name of the column containing unique sample identifiers. | `'sid'` | | `system_prompt_field` | `str` | Name of the column representing the system prompts. | `'system_prompt'` | | `task_input_field` | `str` | Name of the column containing the main input for the task. | `'task_input'` | | `task_context_field` | `str` | Name of the column describing the broader task context. | `'task_context'` | | `task_attachments_field` | `str` | Name of the column with supplementary attachments related to the task. | `'task_attachments'` | | `task_output_field` | `str` | Name of the column containing responses or outputs for the task. | `'task_output'` | | `gold_answer_field` | `str` | Name of the column detailing the expected or correct answer to the task. | `'gold_answer'` | | `task_metadata_field` | `str` | Name of the column storing metadata attributes associated with the task. | `'task_metadata'` | | `tags_field` | `str` | Name of the column containing tags or annotations related to each sample. | `'tags'` | | `**kwargs` | `Any` | Additional keyword arguments passed to 'pandas.read_csv' for fine-tuning the CSV parsing behavior, such as delimiters, encoding, etc. | `{}` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `Dataset` | `Dataset` | The parsed dataset object containing structured data from the input CSV file. |

Source code in `src/patronus/datasets/datasets.py`

```python
def read_csv(
    filename_or_buffer: Union[str, pathlib.Path, typing.IO[typing.AnyStr]],
    *,
    dataset_id: Optional[str] = None,
    sid_field: str = "sid",
    system_prompt_field: str = "system_prompt",
    task_input_field: str = "task_input",
    task_context_field: str = "task_context",
    task_attachments_field: str = "task_attachments",
    task_output_field: str = "task_output",
    gold_answer_field: str = "gold_answer",
    task_metadata_field: str = "task_metadata",
    tags_field: str = "tags",
    **kwargs: typing.Any,
) -> Dataset:
    """
    Reads a CSV file and converts it into a Dataset object. The CSV file is transformed
    into a structured dataset where each field maps to a specific aspect of the dataset
    schema provided via function arguments. You may specify custom field mappings as per
    your dataset structure, while additional keyword arguments are passed directly to the
    underlying 'pd.read_csv' function.

    Args:
        filename_or_buffer: Path to the CSV file or a file-like object containing the
            dataset to be read.
        dataset_id: Optional identifier for the dataset being read. Default is None.
        sid_field: Name of the column containing unique sample identifiers.
        system_prompt_field: Name of the column representing the system prompts.
        task_input_field: Name of the column containing the main input for the task.
        task_context_field: Name of the column describing the broader task context.
        task_attachments_field: Name of the column with supplementary attachments
            related to the task.
        task_output_field: Name of the column containing responses or outputs for the
            task.
        gold_answer_field: Name of the column detailing the expected or correct
            answer to the task.
        task_metadata_field: Name of the column storing metadata attributes
            associated with the task.
        tags_field: Name of the column containing tags or annotations related to each
            sample.
        **kwargs: Additional keyword arguments passed to 'pandas.read_csv' for fine-tuning
            the CSV parsing behavior, such as delimiters, encoding, etc.

    Returns:
        Dataset: The parsed dataset object containing structured data from the input
            CSV file.
    """
    return _read_dataframe(
        pd.read_csv,
        filename_or_buffer,
        dataset_id=dataset_id,
        sid_field=sid_field,
        system_prompt_field=system_prompt_field,
        task_context_field=task_context_field,
        task_attachments_field=task_attachments_field,
        task_input_field=task_input_field,
        task_output_field=task_output_field,
        gold_answer_field=gold_answer_field,
        task_metadata_field=task_metadata_field,
        tags_field=tags_field,
        **kwargs,
    )

```

#### read_jsonl

```python
read_jsonl(
    filename_or_buffer: Union[str, Path, IO[AnyStr]],
    *,
    dataset_id: Optional[str] = None,
    sid_field: str = "sid",
    system_prompt_field: str = "system_prompt",
    task_input_field: str = "task_input",
    task_context_field: str = "task_context",
    task_attachments_field: str = "task_attachments",
    task_output_field: str = "task_output",
    gold_answer_field: str = "gold_answer",
    task_metadata_field: str = "task_metadata",
    tags_field: str = "tags",
    **kwargs: Any,
) -> Dataset

```

Reads a JSONL (JSON Lines) file and transforms it into a Dataset object. This function parses the input data file or buffer in JSON Lines format into a structured format, extracting specified fields and additional metadata for usage in downstream tasks. The field mappings and additional keyword arguments can be customized to accommodate application-specific requirements.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `filename_or_buffer` | `Union[str, Path, IO[AnyStr]]` | The path to the file or a file-like object containing the JSONL data to be read. | *required* | | `dataset_id` | `Optional[str]` | An optional identifier for the dataset being read. Defaults to None. | `None` | | `sid_field` | `str` | The field name in the JSON lines representing the unique identifier for a sample. Defaults to "sid". | `'sid'` | | `system_prompt_field` | `str` | The field name for the system prompt in the JSON lines file. Defaults to "system_prompt". | `'system_prompt'` | | `task_input_field` | `str` | The field name for the task input data in the JSON lines file. Defaults to "task_input". | `'task_input'` | | `task_context_field` | `str` | The field name for the task context data in the JSON lines file. Defaults to "task_context". | `'task_context'` | | `task_attachments_field` | `str` | The field name for any task attachments in the JSON lines file. Defaults to "task_attachments". | `'task_attachments'` | | `task_output_field` | `str` | The field name for task output data in the JSON lines file. Defaults to "task_output". | `'task_output'` | | `gold_answer_field` | `str` | The field name for the gold (ground truth) answer in the JSON lines file. Defaults to "gold_answer". | `'gold_answer'` | | `task_metadata_field` | `str` | The field name for metadata associated with the task in the JSON lines file. Defaults to "task_metadata". | `'task_metadata'` | | `tags_field` | `str` | The field name for tags in the parsed JSON lines file. Defaults to "tags". | `'tags'` | | `**kwargs` | `Any` | Additional keyword arguments to be passed to pd.read_json for customization. The parameter "lines" will be forcibly set to True if not provided. | `{}` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `Dataset` | `Dataset` | A Dataset object containing the parsed and structured data. |

Source code in `src/patronus/datasets/datasets.py`

```python
def read_jsonl(
    filename_or_buffer: Union[str, pathlib.Path, typing.IO[typing.AnyStr]],
    *,
    dataset_id: Optional[str] = None,
    sid_field: str = "sid",
    system_prompt_field: str = "system_prompt",
    task_input_field: str = "task_input",
    task_context_field: str = "task_context",
    task_attachments_field: str = "task_attachments",
    task_output_field: str = "task_output",
    gold_answer_field: str = "gold_answer",
    task_metadata_field: str = "task_metadata",
    tags_field: str = "tags",
    **kwargs: typing.Any,
) -> Dataset:
    """
    Reads a JSONL (JSON Lines) file and transforms it into a Dataset object. This function
    parses the input data file or buffer in JSON Lines format into a structured format,
    extracting specified fields and additional metadata for usage in downstream tasks. The
    field mappings and additional keyword arguments can be customized to accommodate
    application-specific requirements.

    Args:
        filename_or_buffer: The path to the file or a file-like object containing the JSONL
            data to be read.
        dataset_id: An optional identifier for the dataset being read. Defaults to None.
        sid_field: The field name in the JSON lines representing the unique identifier for
            a sample. Defaults to "sid".
        system_prompt_field: The field name for the system prompt in the JSON lines file.
            Defaults to "system_prompt".
        task_input_field: The field name for the task input data in the JSON lines file.
            Defaults to "task_input".
        task_context_field: The field name for the task context data in the JSON lines file.
            Defaults to "task_context".
        task_attachments_field: The field name for any task attachments in the JSON lines
            file. Defaults to "task_attachments".
        task_output_field: The field name for task output data in the JSON lines file.
            Defaults to "task_output".
        gold_answer_field: The field name for the gold (ground truth) answer in the JSON
            lines file. Defaults to "gold_answer".
        task_metadata_field: The field name for metadata associated with the task in the
            JSON lines file. Defaults to "task_metadata".
        tags_field: The field name for tags in the parsed JSON lines file. Defaults to
            "tags".
        **kwargs: Additional keyword arguments to be passed to `pd.read_json` for
            customization. The parameter "lines" will be forcibly set to True if not
            provided.

    Returns:
        Dataset: A Dataset object containing the parsed and structured data.

    """
    kwargs.setdefault("lines", True)
    return _read_dataframe(
        pd.read_json,
        filename_or_buffer,
        dataset_id=dataset_id,
        sid_field=sid_field,
        system_prompt_field=system_prompt_field,
        task_context_field=task_context_field,
        task_attachments_field=task_attachments_field,
        task_input_field=task_input_field,
        task_output_field=task_output_field,
        gold_answer_field=gold_answer_field,
        task_metadata_field=task_metadata_field,
        tags_field=tags_field,
        **kwargs,
    )

```

### remote

#### DatasetNotFoundError

Bases: `Exception`

Raised when a dataset with the specified ID or name is not found

#### RemoteDatasetLoader

```python
RemoteDatasetLoader(
    by_name: Optional[str] = None,
    *,
    by_id: Optional[str] = None,
)

```

Bases: `DatasetLoader`

A loader for datasets stored remotely on the Patronus platform.

This class provides functionality to asynchronously load a dataset from the remote API by its name or identifier, handling the fetch operation lazily and ensuring it's only performed once. You can specify either the dataset name or ID, but not both.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `by_name` | `Optional[str]` | The name of the dataset to load. | `None` | | `by_id` | `Optional[str]` | The ID of the dataset to load. | `None` |

Source code in `src/patronus/datasets/remote.py`

```python
def __init__(self, by_name: Optional[str] = None, *, by_id: Optional[str] = None):
    """
    Initializes a new RemoteDatasetLoader instance.

    Args:
        by_name: The name of the dataset to load.
        by_id: The ID of the dataset to load.
    """
    if not (bool(by_name) ^ bool(by_id)):
        raise ValueError("Either by_name or by_id must be provided, but not both.")

    self._dataset_name = by_name
    self._dataset_id = by_id
    super().__init__(self._load)

```

# evals

## patronus.evals

### evaluators

#### Evaluator

```python
Evaluator(weight: Optional[Union[str, float]] = None)

```

Base Evaluator Class

Source code in `src/patronus/evals/evaluators.py`

```python
def __init__(self, weight: Optional[Union[str, float]] = None):
    if weight is not None:
        try:
            decimal.Decimal(str(weight))
        except (decimal.InvalidOperation, ValueError, TypeError):
            raise TypeError(
                f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)."
            )
    self.weight = weight

```

##### evaluate

```python
evaluate(*args, **kwargs) -> Optional[EvaluationResult]

```

Synchronous version of evaluate method. When inheriting directly from Evaluator class it's permitted to change parameters signature. Return type should stay unchanged.

Source code in `src/patronus/evals/evaluators.py`

```python
@abc.abstractmethod
def evaluate(self, *args, **kwargs) -> Optional[EvaluationResult]:
    """
    Synchronous version of evaluate method.
    When inheriting directly from Evaluator class it's permitted to change parameters signature.
    Return type should stay unchanged.
    """

```

#### AsyncEvaluator

```python
AsyncEvaluator(weight: Optional[Union[str, float]] = None)

```

Bases: `Evaluator`

Source code in `src/patronus/evals/evaluators.py`

```python
def __init__(self, weight: Optional[Union[str, float]] = None):
    if weight is not None:
        try:
            decimal.Decimal(str(weight))
        except (decimal.InvalidOperation, ValueError, TypeError):
            raise TypeError(
                f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)."
            )
    self.weight = weight

```

##### evaluate

```python
evaluate(*args, **kwargs) -> Optional[EvaluationResult]

```

Asynchronous version of evaluate method. When inheriting directly from Evaluator class it's permitted to change parameters signature. Return type should stay unchanged.

Source code in `src/patronus/evals/evaluators.py`

```python
@abc.abstractmethod
async def evaluate(self, *args, **kwargs) -> Optional[EvaluationResult]:
    """
    Asynchronous version of evaluate method.
    When inheriting directly from Evaluator class it's permitted to change parameters signature.
    Return type should stay unchanged.
    """

```

#### StructuredEvaluator

```python
StructuredEvaluator(
    weight: Optional[Union[str, float]] = None,
)

```

Bases: `Evaluator`

Base for structured evaluators

Source code in `src/patronus/evals/evaluators.py`

```python
def __init__(self, weight: Optional[Union[str, float]] = None):
    if weight is not None:
        try:
            decimal.Decimal(str(weight))
        except (decimal.InvalidOperation, ValueError, TypeError):
            raise TypeError(
                f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)."
            )
    self.weight = weight

```

#### AsyncStructuredEvaluator

```python
AsyncStructuredEvaluator(
    weight: Optional[Union[str, float]] = None,
)

```

Bases: `AsyncEvaluator`

Base for async structured evaluators

Source code in `src/patronus/evals/evaluators.py`

```python
def __init__(self, weight: Optional[Union[str, float]] = None):
    if weight is not None:
        try:
            decimal.Decimal(str(weight))
        except (decimal.InvalidOperation, ValueError, TypeError):
            raise TypeError(
                f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)."
            )
    self.weight = weight

```

#### RemoteEvaluatorMixin

```python
RemoteEvaluatorMixin(
    evaluator_id_or_alias: str,
    criteria: Optional[str] = None,
    *,
    tags: Optional[dict[str, str]] = None,
    explain_strategy: Literal[
        "never", "on-fail", "on-success", "always"
    ] = "always",
    criteria_config: Optional[dict[str, Any]] = None,
    allow_update: bool = False,
    max_attempts: int = 3,
    api_: Optional[PatronusAPIClient] = None,
    weight: Optional[Union[str, float]] = None,
)

```

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `evaluator_id_or_alias` | `str` | The ID or alias of the evaluator to use. | *required* | | `criteria` | `Optional[str]` | The criteria name to use for evaluation. If not provided, the evaluator's default criteria will be used. | `None` | | `tags` | `Optional[dict[str, str]]` | Optional tags to attach to evaluations. | `None` | | `explain_strategy` | `Literal['never', 'on-fail', 'on-success', 'always']` | When to generate explanations for evaluations. Options are "never", "on-fail", "on-success", or "always". | `'always'` | | `criteria_config` | `Optional[dict[str, Any]]` | Configuration for the criteria. (Currently unused) | `None` | | `allow_update` | `bool` | Whether to allow updates. (Currently unused) | `False` | | `max_attempts` | `int` | Maximum number of retry attempts. (Currently unused) | `3` | | `api_` | `Optional[PatronusAPIClient]` | Optional API client instance. If not provided, will use the default client from context. | `None` | | `weight` | `Optional[Union[str, float]]` | Optional weight for the evaluator. This is only used within the Patronus Experimentation Framework to indicate the relative importance of evaluators. Must be a valid decimal number (string or float). Weights are stored as experiment metadata and do not affect standalone evaluator usage. | `None` |

Source code in `src/patronus/evals/evaluators.py`

```python
def __init__(
    self,
    evaluator_id_or_alias: str,
    criteria: Optional[str] = None,
    *,
    tags: Optional[dict[str, str]] = None,
    explain_strategy: typing.Literal["never", "on-fail", "on-success", "always"] = "always",
    criteria_config: Optional[dict[str, typing.Any]] = None,
    allow_update: bool = False,
    max_attempts: int = 3,
    api_: Optional[PatronusAPIClient] = None,
    weight: Optional[Union[str, float]] = None,
):
    """Initialize a remote evaluator.

    Args:
        evaluator_id_or_alias: The ID or alias of the evaluator to use.
        criteria: The criteria name to use for evaluation. If not provided,
            the evaluator's default criteria will be used.
        tags: Optional tags to attach to evaluations.
        explain_strategy: When to generate explanations for evaluations.
            Options are "never", "on-fail", "on-success", or "always".
        criteria_config: Configuration for the criteria. (Currently unused)
        allow_update: Whether to allow updates. (Currently unused)
        max_attempts: Maximum number of retry attempts. (Currently unused)
        api_: Optional API client instance. If not provided, will use the
            default client from context.
        weight: Optional weight for the evaluator. This is only used within
            the Patronus Experimentation Framework to indicate the relative
            importance of evaluators. Must be a valid decimal number (string
            or float). Weights are stored as experiment metadata and do not
            affect standalone evaluator usage.
    """
    self.evaluator_id_or_alias = evaluator_id_or_alias
    self.evaluator_id = None
    self.criteria = criteria
    self.tags = tags or {}
    self.explain_strategy = explain_strategy
    self.criteria_config = criteria_config
    self.allow_update = allow_update
    self.max_attempts = max_attempts
    self._api = api_
    self._resolved = False
    self.weight = weight
    self._load_lock = threading.Lock()
    self._async_load_lock = asyncio.Lock()

```

#### RemoteEvaluator

```python
RemoteEvaluator(
    evaluator_id_or_alias: str,
    criteria: Optional[str] = None,
    *,
    tags: Optional[dict[str, str]] = None,
    explain_strategy: Literal[
        "never", "on-fail", "on-success", "always"
    ] = "always",
    criteria_config: Optional[dict[str, Any]] = None,
    allow_update: bool = False,
    max_attempts: int = 3,
    api_: Optional[PatronusAPIClient] = None,
    weight: Optional[Union[str, float]] = None,
)

```

Bases: `RemoteEvaluatorMixin`, `StructuredEvaluator`

Synchronous remote evaluator

Source code in `src/patronus/evals/evaluators.py`

```python
def __init__(
    self,
    evaluator_id_or_alias: str,
    criteria: Optional[str] = None,
    *,
    tags: Optional[dict[str, str]] = None,
    explain_strategy: typing.Literal["never", "on-fail", "on-success", "always"] = "always",
    criteria_config: Optional[dict[str, typing.Any]] = None,
    allow_update: bool = False,
    max_attempts: int = 3,
    api_: Optional[PatronusAPIClient] = None,
    weight: Optional[Union[str, float]] = None,
):
    """Initialize a remote evaluator.

    Args:
        evaluator_id_or_alias: The ID or alias of the evaluator to use.
        criteria: The criteria name to use for evaluation. If not provided,
            the evaluator's default criteria will be used.
        tags: Optional tags to attach to evaluations.
        explain_strategy: When to generate explanations for evaluations.
            Options are "never", "on-fail", "on-success", or "always".
        criteria_config: Configuration for the criteria. (Currently unused)
        allow_update: Whether to allow updates. (Currently unused)
        max_attempts: Maximum number of retry attempts. (Currently unused)
        api_: Optional API client instance. If not provided, will use the
            default client from context.
        weight: Optional weight for the evaluator. This is only used within
            the Patronus Experimentation Framework to indicate the relative
            importance of evaluators. Must be a valid decimal number (string
            or float). Weights are stored as experiment metadata and do not
            affect standalone evaluator usage.
    """
    self.evaluator_id_or_alias = evaluator_id_or_alias
    self.evaluator_id = None
    self.criteria = criteria
    self.tags = tags or {}
    self.explain_strategy = explain_strategy
    self.criteria_config = criteria_config
    self.allow_update = allow_update
    self.max_attempts = max_attempts
    self._api = api_
    self._resolved = False
    self.weight = weight
    self._load_lock = threading.Lock()
    self._async_load_lock = asyncio.Lock()

```

##### evaluate

```python
evaluate(
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_attachments: Union[list[Any], None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> EvaluationResult

```

Evaluates data using remote Patronus Evaluator

Source code in `src/patronus/evals/evaluators.py`

```python
def evaluate(
    self,
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_attachments: Union[list[Any], None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[typing.Dict[str, typing.Any]] = None,
    **kwargs: Any,
) -> EvaluationResult:
    """Evaluates data using remote Patronus Evaluator"""
    kws = {
        "system_prompt": system_prompt,
        "task_context": task_context,
        "task_attachments": task_attachments,
        "task_input": task_input,
        "task_output": task_output,
        "gold_answer": gold_answer,
        "task_metadata": task_metadata,
        **kwargs,
    }
    log_id = get_current_log_id(bound_arguments=kws)

    attrs = get_context_evaluation_attributes()
    tags = {**self.tags}
    if t := attrs["tags"]:
        tags.update(t)
    tags = merge_tags(tags, kwargs.get("tags"), attrs["experiment_tags"])
    if tags:
        kws["tags"] = tags
    if did := attrs["dataset_id"]:
        kws["dataset_id"] = did
    if sid := attrs["dataset_sample_id"]:
        kws["dataset_sample_id"] = sid

    resp = retry()(self._evaluate)(log_id=log_id, **kws)
    return self._translate_response(resp)

```

#### AsyncRemoteEvaluator

```python
AsyncRemoteEvaluator(
    evaluator_id_or_alias: str,
    criteria: Optional[str] = None,
    *,
    tags: Optional[dict[str, str]] = None,
    explain_strategy: Literal[
        "never", "on-fail", "on-success", "always"
    ] = "always",
    criteria_config: Optional[dict[str, Any]] = None,
    allow_update: bool = False,
    max_attempts: int = 3,
    api_: Optional[PatronusAPIClient] = None,
    weight: Optional[Union[str, float]] = None,
)

```

Bases: `RemoteEvaluatorMixin`, `AsyncStructuredEvaluator`

Asynchronous remote evaluator

Source code in `src/patronus/evals/evaluators.py`

```python
def __init__(
    self,
    evaluator_id_or_alias: str,
    criteria: Optional[str] = None,
    *,
    tags: Optional[dict[str, str]] = None,
    explain_strategy: typing.Literal["never", "on-fail", "on-success", "always"] = "always",
    criteria_config: Optional[dict[str, typing.Any]] = None,
    allow_update: bool = False,
    max_attempts: int = 3,
    api_: Optional[PatronusAPIClient] = None,
    weight: Optional[Union[str, float]] = None,
):
    """Initialize a remote evaluator.

    Args:
        evaluator_id_or_alias: The ID or alias of the evaluator to use.
        criteria: The criteria name to use for evaluation. If not provided,
            the evaluator's default criteria will be used.
        tags: Optional tags to attach to evaluations.
        explain_strategy: When to generate explanations for evaluations.
            Options are "never", "on-fail", "on-success", or "always".
        criteria_config: Configuration for the criteria. (Currently unused)
        allow_update: Whether to allow updates. (Currently unused)
        max_attempts: Maximum number of retry attempts. (Currently unused)
        api_: Optional API client instance. If not provided, will use the
            default client from context.
        weight: Optional weight for the evaluator. This is only used within
            the Patronus Experimentation Framework to indicate the relative
            importance of evaluators. Must be a valid decimal number (string
            or float). Weights are stored as experiment metadata and do not
            affect standalone evaluator usage.
    """
    self.evaluator_id_or_alias = evaluator_id_or_alias
    self.evaluator_id = None
    self.criteria = criteria
    self.tags = tags or {}
    self.explain_strategy = explain_strategy
    self.criteria_config = criteria_config
    self.allow_update = allow_update
    self.max_attempts = max_attempts
    self._api = api_
    self._resolved = False
    self.weight = weight
    self._load_lock = threading.Lock()
    self._async_load_lock = asyncio.Lock()

```

##### evaluate

```python
evaluate(
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_attachments: Union[list[Any], None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[Dict[str, Any]] = None,
    **kwargs: Any,
) -> EvaluationResult

```

Evaluates data using remote Patronus Evaluator

Source code in `src/patronus/evals/evaluators.py`

```python
async def evaluate(
    self,
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_attachments: Union[list[Any], None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[typing.Dict[str, typing.Any]] = None,
    **kwargs: Any,
) -> EvaluationResult:
    """Evaluates data using remote Patronus Evaluator"""
    kws = {
        "system_prompt": system_prompt,
        "task_context": task_context,
        "task_attachments": task_attachments,
        "task_input": task_input,
        "task_output": task_output,
        "gold_answer": gold_answer,
        "task_metadata": task_metadata,
        **kwargs,
    }
    log_id = get_current_log_id(bound_arguments=kws)

    attrs = get_context_evaluation_attributes()
    tags = {**self.tags}
    if t := attrs["tags"]:
        tags.update(t)
    tags = merge_tags(tags, kwargs.get("tags"), attrs["experiment_tags"])
    if tags:
        kws["tags"] = tags
    if did := attrs["dataset_id"]:
        kws["dataset_id"] = did
    if sid := attrs["dataset_sample_id"]:
        kws["dataset_sample_id"] = sid

    resp = await retry()(self._evaluate)(log_id=log_id, **kws)
    return self._translate_response(resp)

```

#### get_current_log_id

```python
get_current_log_id(
    bound_arguments: dict[str, Any],
) -> Optional[LogID]

```

Return log_id for given arguments in current context. Returns None if there is no context - most likely SDK is not initialized.

Source code in `src/patronus/evals/evaluators.py`

```python
def get_current_log_id(bound_arguments: dict[str, Any]) -> Optional[LogID]:
    """
    Return log_id for given arguments in current context.
    Returns None if there is no context - most likely SDK is not initialized.
    """
    eval_group = _ctx_evaluation_log_group.get(None)
    if eval_group is None:
        return None
    log_id = eval_group.find_log(bound_arguments)
    if log_id is None:
        raise ValueError("Log not found for provided arguments")
    return log_id

```

#### bundled_eval

```python
bundled_eval(
    span_name: str = "Evaluation bundle",
    attributes: Optional[dict[str, str]] = None,
)

```

Start a span that would automatically bundle evaluations.

Evaluations are passed by arguments passed to the evaluators called inside the context manager.

The following example would create two bundles:

- fist with arguments `x=10, y=20`
- second with arguments `spam="abc123"`

```python
with bundled_eval():
    foo_evaluator(x=10, y=20)
    bar_evaluator(x=10, y=20)
    tar_evaluator(spam="abc123")

```

Source code in `src/patronus/evals/evaluators.py`

````python
@contextlib.contextmanager
def bundled_eval(span_name: str = "Evaluation bundle", attributes: Optional[dict[str, str]] = None):
    """
    Start a span that would automatically bundle evaluations.

    Evaluations are passed by arguments passed to the evaluators called inside the context manager.

    The following example would create two bundles:

    - fist with arguments `x=10, y=20`
    - second with arguments `spam="abc123"`

    ```python
    with bundled_eval():
        foo_evaluator(x=10, y=20)
        bar_evaluator(x=10, y=20)
        tar_evaluator(spam="abc123")
    ```

    """
    tracer = context.get_tracer_or_none()
    if tracer is None:
        yield
        return

    attributes = {
        **(attributes or {}),
        Attributes.span_type.value: SpanTypes.eval.value,
    }
    with tracer.start_as_current_span(span_name, attributes=attributes):
        with _start_evaluation_log_group():
            yield

````

#### evaluator

```python
evaluator(
    _fn: Optional[Callable[..., Any]] = None,
    *,
    evaluator_id: Union[
        str, Callable[[], str], None
    ] = None,
    criteria: Union[str, Callable[[], str], None] = None,
    metric_name: Optional[str] = None,
    metric_description: Optional[str] = None,
    is_method: bool = False,
    span_name: Optional[str] = None,
    log_none_arguments: bool = False,
    **kwargs: Any,
) -> typing.Callable[..., typing.Any]

```

Decorator for creating functional-style evaluators that log execution and results.

This decorator works with both synchronous and asynchronous functions. The decorator doesn't modify the function's return value, but records it after converting to an EvaluationResult.

Evaluators can return different types which are automatically converted to `EvaluationResult` objects:

- `bool`: `True`/`False` indicating pass/fail.
- `float`/`int`: Numerical scores (typically between 0-1).
- `str`: Text output categorizing the result.
- EvaluationResult: Complete evaluation with scores, explanations, etc.
- `None`: Indicates evaluation was skipped and no result will be recorded.

Evaluation results are exported in the background without blocking execution. The SDK must be initialized with `patronus.init()` for evaluations to be recorded, though decorated functions will still execute even without initialization.

The evaluator integrates with a context-based system to identify and handle shared evaluation logging and tracing spans.

**Example:**

```python
from patronus import init, evaluator
from patronus.evals import EvaluationResult

# Initialize the SDK to record evaluations
init()

# Simple evaluator function
@evaluator()
def exact_match(actual: str, expected: str) -> bool:
    return actual.strip() == expected.strip()

# More complex evaluator with detailed result
@evaluator()
def semantic_match(actual: str, expected: str) -> EvaluationResult:
    similarity = calculate_similarity(actual, expected)  # Your similarity function
    return EvaluationResult(
        score=similarity,
        pass_=similarity > 0.8,
        text_output="High similarity" if similarity > 0.8 else "Low similarity",
        explanation=f"Calculated similarity: {similarity}"
    )

# Use the evaluators
result = exact_match("Hello world", "Hello world")
print(f"Match: {result}")  # Output: Match: True

```

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `_fn` | `Optional[Callable[..., Any]]` | The function to be decorated. | `None` | | `evaluator_id` | `Union[str, Callable[[], str], None]` | Name for the evaluator. Defaults to function name (or class name in case of class based evaluators). | `None` | | `criteria` | `Union[str, Callable[[], str], None]` | Name of the criteria used by the evaluator. The use of the criteria is only recommended in more complex evaluator setups where evaluation algorithm changes depending on a criteria (think strategy pattern). | `None` | | `metric_name` | `Optional[str]` | Name for the evaluation metric. Defaults to evaluator_id value. | `None` | | `metric_description` | `Optional[str]` | The description of the metric used for evaluation. If not provided then the docstring of the wrapped function is used for this value. | `None` | | `is_method` | `bool` | Whether the wrapped function is a method. This value is used to determine whether to remove "self" argument from the log. It also allows for dynamic evaluator_id and criteria discovery based on get_evaluator_id() and get_criteria_id() methods. User-code usually shouldn't use it as long as user defined class-based evaluators inherit from the library provided Evaluator base classes. | `False` | | `span_name` | `Optional[str]` | Name of the span to represent this evaluation in the tracing system. Defaults to None, in which case a default name is generated based on the evaluator. | `None` | | `log_none_arguments` | `bool` | Controls whether arguments with None values are included in log output. This setting affects only logging behavior and has no impact on function execution. Note: Only applies to top-level arguments. For nested structures like dictionaries, None values will always be logged regardless of this setting. | `False` | | `**kwargs` | `Any` | Additional keyword arguments that may be passed to the decorator or its internal methods. | `{}` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `Callable` | `Callable[..., Any]` | Returns the decorated function with additional evaluation behavior, suitable for synchronous or asynchronous usage. |

Note

For evaluations that need to be compatible with experiments, consider using StructuredEvaluator or AsyncStructuredEvaluator classes instead.

Source code in `src/patronus/evals/evaluators.py`

````python
def evaluator(
    _fn: Optional[typing.Callable[..., typing.Any]] = None,
    *,
    evaluator_id: Union[str, typing.Callable[[], str], None] = None,
    criteria: Union[str, typing.Callable[[], str], None] = None,
    metric_name: Optional[str] = None,
    metric_description: Optional[str] = None,
    is_method: bool = False,
    span_name: Optional[str] = None,
    log_none_arguments: bool = False,
    **kwargs: typing.Any,
) -> typing.Callable[..., typing.Any]:
    """
    Decorator for creating functional-style evaluators that log execution and results.

    This decorator works with both synchronous and asynchronous functions. The decorator doesn't
    modify the function's return value, but records it after converting to an EvaluationResult.

    Evaluators can return different types which are automatically converted to `EvaluationResult` objects:

    * `bool`: `True`/`False` indicating pass/fail.
    * `float`/`int`: Numerical scores (typically between 0-1).
    * `str`: Text output categorizing the result.
    * [EvaluationResult][patronus.evals.types.EvaluationResult]: Complete evaluation with scores, explanations, etc.
    * `None`: Indicates evaluation was skipped and no result will be recorded.

    Evaluation results are exported in the background without blocking execution. The SDK must be
    initialized with `patronus.init()` for evaluations to be recorded, though decorated functions
    will still execute even without initialization.

    The evaluator integrates with a context-based system to identify and handle shared evaluation
    logging and tracing spans.

    **Example:**

    ```python
    from patronus import init, evaluator
    from patronus.evals import EvaluationResult

    # Initialize the SDK to record evaluations
    init()

    # Simple evaluator function
    @evaluator()
    def exact_match(actual: str, expected: str) -> bool:
        return actual.strip() == expected.strip()

    # More complex evaluator with detailed result
    @evaluator()
    def semantic_match(actual: str, expected: str) -> EvaluationResult:
        similarity = calculate_similarity(actual, expected)  # Your similarity function
        return EvaluationResult(
            score=similarity,
            pass_=similarity > 0.8,
            text_output="High similarity" if similarity > 0.8 else "Low similarity",
            explanation=f"Calculated similarity: {similarity}"
        )

    # Use the evaluators
    result = exact_match("Hello world", "Hello world")
    print(f"Match: {result}")  # Output: Match: True
    ```

    Args:
        _fn: The function to be decorated.
        evaluator_id: Name for the evaluator.
            Defaults to function name (or class name in case of class based evaluators).
        criteria: Name of the criteria used by the evaluator.
            The use of the criteria is only recommended in more complex evaluator setups
            where evaluation algorithm changes depending on a criteria (think strategy pattern).
        metric_name: Name for the evaluation metric. Defaults to evaluator_id value.
        metric_description: The description of the metric used for evaluation.
            If not provided then the docstring of the wrapped function is used for this value.
        is_method: Whether the wrapped function is a method.
            This value is used to determine whether to remove "self" argument from the log.
            It also allows for dynamic evaluator_id and criteria discovery
            based on `get_evaluator_id()` and `get_criteria_id()` methods.
            User-code usually shouldn't use it as long as user defined class-based evaluators inherit from
            the library provided Evaluator base classes.
        span_name: Name of the span to represent this evaluation in the tracing system.
            Defaults to None, in which case a default name is generated based on the evaluator.
        log_none_arguments: Controls whether arguments with None values are included in log output.
            This setting affects only logging behavior and has no impact on function execution.
            Note: Only applies to top-level arguments. For nested structures like dictionaries,
            None values will always be logged regardless of this setting.
        **kwargs: Additional keyword arguments that may be passed to the decorator or its internal methods.

    Returns:
        Callable: Returns the decorated function with additional evaluation behavior, suitable for
            synchronous or asynchronous usage.

    Note:
        For evaluations that need to be compatible with experiments, consider using
        [StructuredEvaluator][patronus.evals.evaluators.StructuredEvaluator] or
        [AsyncStructuredEvaluator][patronus.evals.evaluators.AsyncStructuredEvaluator] classes instead.

    """
    if _fn is not None:
        return evaluator()(_fn)

    def decorator(fn):
        fn_sign = inspect.signature(fn)

        def _get_eval_id():
            return (callable(evaluator_id) and evaluator_id()) or evaluator_id or fn.__name__

        def _get_criteria():
            return (callable(criteria) and criteria()) or criteria or None

        def _prep(*fn_args, **fn_kwargs):
            bound_args = fn_sign.bind(*fn_args, **fn_kwargs)
            arguments_to_log = _as_applied_argument(fn_sign, bound_args)
            bound_args.apply_defaults()
            self_key_name = None
            instance = None
            if is_method:
                self_key_name = next(iter(fn_sign.parameters.keys()))
                instance = bound_args.arguments[self_key_name]

            eval_id = None
            eval_criteria = None
            if isinstance(instance, Evaluator):
                eval_id = instance.get_evaluator_id()
                eval_criteria = instance.get_criteria()

            if eval_id is None:
                eval_id = _get_eval_id()
            if eval_criteria is None:
                eval_criteria = _get_criteria()

            met_name = metric_name or eval_id
            met_description = metric_description or inspect.getdoc(fn) or None

            disable_export = isinstance(instance, RemoteEvaluatorMixin) and instance._disable_export

            return PrepEval(
                span_name=span_name,
                evaluator_id=eval_id,
                criteria=eval_criteria,
                metric_name=met_name,
                metric_description=met_description,
                self_key_name=self_key_name,
                arguments=arguments_to_log,
                disable_export=disable_export,
            )

        attributes = {
            Attributes.span_type.value: SpanTypes.eval.value,
            GenAIAttributes.operation_name.value: OperationNames.eval.value,
        }

        @functools.wraps(fn)
        async def wrapper_async(*fn_args, **fn_kwargs):
            ctx = context.get_current_context_or_none()
            if ctx is None:
                return await fn(*fn_args, **fn_kwargs)

            prep = _prep(*fn_args, **fn_kwargs)

            start = time.perf_counter()
            try:
                with start_span(prep.display_name(), attributes=attributes):
                    with _get_or_start_evaluation_log_group() as log_group:
                        log_id = log_group.log(
                            logger=context.get_pat_logger(ctx),
                            is_method=is_method,
                            self_key_name=prep.self_key_name,
                            bound_arguments=prep.arguments,
                            log_none_arguments=log_none_arguments,
                        )
                        ret = await fn(*fn_args, **fn_kwargs)
            except Exception as e:
                context.get_logger(ctx).exception(f"Evaluator raised an exception: {e}")
                raise e
            if prep.disable_export:
                return ret
            elapsed = time.perf_counter() - start
            handle_eval_output(
                ctx=ctx,
                log_id=log_id,
                evaluator_id=prep.evaluator_id,
                criteria=prep.criteria,
                metric_name=prep.metric_name,
                metric_description=prep.metric_description,
                ret_value=ret,
                duration=datetime.timedelta(seconds=elapsed),
                qualname=fn.__qualname__,
            )
            return ret

        @functools.wraps(fn)
        def wrapper_sync(*fn_args, **fn_kwargs):
            ctx = context.get_current_context_or_none()
            if ctx is None:
                return fn(*fn_args, **fn_kwargs)

            prep = _prep(*fn_args, **fn_kwargs)

            start = time.perf_counter()
            try:
                with start_span(prep.display_name(), attributes=attributes):
                    with _get_or_start_evaluation_log_group() as log_group:
                        log_id = log_group.log(
                            logger=context.get_pat_logger(ctx),
                            is_method=is_method,
                            self_key_name=prep.self_key_name,
                            bound_arguments=prep.arguments,
                            log_none_arguments=log_none_arguments,
                        )
                        ret = fn(*fn_args, **fn_kwargs)
            except Exception as e:
                context.get_logger(ctx).exception(f"Evaluator raised an exception: {e}")
                raise e
            if prep.disable_export:
                return ret
            elapsed = time.perf_counter() - start
            handle_eval_output(
                ctx=ctx,
                log_id=log_id,
                evaluator_id=prep.evaluator_id,
                criteria=prep.criteria,
                metric_name=prep.metric_name,
                metric_description=prep.metric_description,
                ret_value=ret,
                duration=datetime.timedelta(seconds=elapsed),
                qualname=fn.__qualname__,
            )
            return ret

        def _set_attrs(wrapper: Any):
            wrapper._pat_evaluator = True

            # _pat_evaluator_id and _pat_criteria_id may be a bit misleading since
            # may not be correct since actually values for evaluator_id and criteria
            # are dynamically dispatched for class-based evaluators.
            # These values will be correct for function evaluators though.
            wrapper._pat_evaluator_id = _get_eval_id()
            wrapper._pat_criteria = _get_criteria()

        if inspect.iscoroutinefunction(fn):
            _set_attrs(wrapper_async)
            return wrapper_async
        else:
            _set_attrs(wrapper_sync)
            return wrapper_sync

    return decorator

````

### types

#### EvaluationResult

Bases: `BaseModel`

Container for evaluation outcomes including score, pass/fail status, explanations, and metadata.

This class stores complete evaluation results with numeric scores, boolean pass/fail statuses, textual outputs, explanations, and arbitrary metadata. Evaluator functions can return instances of this class directly or return simpler types (bool, float, str) which will be automatically converted to EvaluationResult objects during recording.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `score` | `Optional[float]` | Score of the evaluation. Can be any numerical value, though typically ranges from 0 to 1, where 1 represents the best possible score. | | `pass_` | `Optional[bool]` | Whether the evaluation is considered to pass or fail. | | `text_output` | `Optional[str]` | Text output of the evaluation. Usually used for discrete human-readable category evaluation or as a label for score value. | | `metadata` | `Optional[dict[str, Any]]` | Arbitrary json-serializable metadata about evaluation. | | `explanation` | `Optional[str]` | Human-readable explanation of the evaluation. | | `tags` | `Optional[dict[str, str]]` | Key-value pair metadata. | | `dataset_id` | `Optional[str]` | ID of the dataset associated with evaluated sample. | | `dataset_sample_id` | `Optional[str]` | ID of the sample in a dataset associated with evaluated sample. | | `evaluation_duration` | `Optional[timedelta]` | Duration of the evaluation. In case value is not set, @evaluator decorator and Evaluator classes will set this value automatically. | | `explanation_duration` | `Optional[timedelta]` | Duration of the evaluation explanation. |

##### format

```python
format() -> str

```

Format the evaluation result into a readable summary.

Source code in `src/patronus/evals/types.py`

```python
def format(self) -> str:
    """
    Format the evaluation result into a readable summary.
    """
    md = self.model_dump(exclude_none=True, mode="json")
    return yaml.dump(md)

```

##### pretty_print

```python
pretty_print(file=None) -> None

```

Pretty prints the formatted content to the specified file or standard output.

Source code in `src/patronus/evals/types.py`

```python
def pretty_print(self, file=None) -> None:
    """
    Pretty prints the formatted content to the specified file or standard output.
    """
    f = self.format()
    print(f, file=file)

```

# experiments

## patronus.experiments

### adapters

#### BaseEvaluatorAdapter

Bases: `ABC`

Abstract base class for all evaluator adapters.

Evaluator adapters provide a standardized interface between the experiment framework and various types of evaluators (function-based, class-based, etc.).

All concrete adapter implementations must inherit from this class and implement the required abstract methods.

#### EvaluatorAdapter

```python
EvaluatorAdapter(evaluator: Evaluator)

```

Bases: `BaseEvaluatorAdapter`

Adapter for class-based evaluators conforming to the Evaluator or AsyncEvaluator protocol.

This adapter enables the use of evaluator classes that implement either the Evaluator or AsyncEvaluator interface within the experiment framework.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `evaluator` | `Union[Evaluator, AsyncEvaluator]` | The evaluator instance to adapt. |

**Examples:**

```python
import typing
from typing import Optional

from patronus import datasets
from patronus.evals import Evaluator, EvaluationResult
from patronus.experiments import run_experiment
from patronus.experiments.adapters import EvaluatorAdapter
from patronus.experiments.types import TaskResult, EvalParent


class MatchEvaluator(Evaluator):
    def __init__(self, sanitizer=None):
        if sanitizer is None:
            sanitizer = lambda x: x
        self.sanitizer = sanitizer

    def evaluate(self, actual: str, expected: str) -> EvaluationResult:
        matched = self.sanitizer(actual) == self.sanitizer(expected)
        return EvaluationResult(pass_=matched, score=int(matched))


exact_match = MatchEvaluator()
fuzzy_match = MatchEvaluator(lambda x: x.strip().lower())


class MatchAdapter(EvaluatorAdapter):
    def __init__(self, evaluator: MatchEvaluator):
        super().__init__(evaluator)

    def transform(
        self,
        row: datasets.Row,
        task_result: Optional[TaskResult],
        parent: EvalParent,
        **kwargs
    ) -> tuple[list[typing.Any], dict[str, typing.Any]]:
        args = [row.task_output, row.gold_answer]
        kwargs = {}
        # Passing arguments via kwargs would also work in this case.
        # kwargs = {"actual": row.task_output, "expected": row.gold_answer}
        return args, kwargs


run_experiment(
    dataset=[{"task_output": "string        ", "gold_answer": "string"}],
    evaluators=[MatchAdapter(exact_match), MatchAdapter(fuzzy_match)],
)

```

Source code in `src/patronus/experiments/adapters.py`

```python
def __init__(self, evaluator: evals.Evaluator):
    if not isinstance(evaluator, evals.Evaluator):
        raise TypeError(f"{evaluator} is not {evals.Evaluator.__name__}.")
    self.evaluator = evaluator

```

##### transform

```python
transform(
    row: Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: Any,
) -> tuple[list[typing.Any], dict[str, typing.Any]]

```

Transform experiment framework arguments to evaluation method arguments.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` |

Returns:

| Type | Description | | --- | --- | | `list[Any]` | A list of positional arguments to pass to the evaluator function. | | `dict[str, Any]` | A dictionary of keyword arguments to pass to the evaluator function. |

Source code in `src/patronus/experiments/adapters.py`

```python
def transform(
    self,
    row: datasets.Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: typing.Any,
) -> tuple[list[typing.Any], dict[str, typing.Any]]:
    """
    Transform experiment framework arguments to evaluation method arguments.

    Args:
        row: The data row being evaluated.
        task_result: The result of the task execution, if available.
        parent: The parent evaluation context.
        **kwargs: Additional keyword arguments from the experiment.

    Returns:
        A list of positional arguments to pass to the evaluator function.
        A dictionary of keyword arguments to pass to the evaluator function.
    """

    return (
        [],
        {"row": row, "task_result": task_result, "parent": parent, **kwargs},
    )

```

##### evaluate

```python
evaluate(
    row: Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: Any,
) -> EvaluationResult

```

Evaluate the given row and task result using the adapted evaluator function.

This method implements the BaseEvaluatorAdapter.evaluate() protocol.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` |

Returns:

| Type | Description | | --- | --- | | `EvaluationResult` | An EvaluationResult containing the evaluation outcome. |

Source code in `src/patronus/experiments/adapters.py`

```python
async def evaluate(
    self,
    row: datasets.Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: typing.Any,
) -> EvaluationResult:
    """
    Evaluate the given row and task result using the adapted evaluator function.

    This method implements the BaseEvaluatorAdapter.evaluate() protocol.

    Args:
        row: The data row being evaluated.
        task_result: The result of the task execution, if available.
        parent: The parent evaluation context.
        **kwargs: Additional keyword arguments from the experiment.

    Returns:
        An EvaluationResult containing the evaluation outcome.
    """
    ev_args, ev_kwargs = self.transform(row, task_result, parent, **kwargs)
    return await self._evaluate(*ev_args, **ev_kwargs)

```

#### StructuredEvaluatorAdapter

```python
StructuredEvaluatorAdapter(
    evaluator: Union[
        StructuredEvaluator, AsyncStructuredEvaluator
    ],
)

```

Bases: `EvaluatorAdapter`

Adapter for structured evaluators.

Source code in `src/patronus/experiments/adapters.py`

```python
def __init__(
    self,
    evaluator: Union[evals.StructuredEvaluator, evals.AsyncStructuredEvaluator],
):
    if not isinstance(evaluator, (evals.StructuredEvaluator, evals.AsyncStructuredEvaluator)):
        raise TypeError(
            f"{type(evaluator)} is not "
            f"{evals.AsyncStructuredEvaluator.__name__} nor {evals.StructuredEvaluator.__name__}."
        )
    super().__init__(evaluator)

```

#### FuncEvaluatorAdapter

```python
FuncEvaluatorAdapter(
    fn: Callable[..., Any],
    weight: Optional[Union[str, float]] = None,
)

```

Bases: `BaseEvaluatorAdapter`

Adapter class that allows using function-based evaluators with the experiment framework.

This adapter serves as a bridge between function-based evaluators decorated with `@evaluator()` and the experiment framework's evaluation system. It handles both synchronous and asynchronous evaluator functions.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `fn` | `Callable` | The evaluator function to be adapted. |

Notes

- The function passed to this adapter must be decorated with `@evaluator()`.
- The adapter automatically handles the conversion between function results and proper evaluation result objects.

Examples:

````text
Direct usage with a compatible evaluator function:

```python
from patronus import evaluator
from patronus.experiments import FuncEvaluatorAdapter, run_experiment
from patronus.datasets import Row


@evaluator()
def exact_match(row: Row, **kwargs):
    return row.task_output == row.gold_answer

run_experiment(
    dataset=[{"task_output": "string", "gold_answer": "string"}],
    evaluators=[FuncEvaluatorAdapter(exact_match)]
)
````

Customized usage by overriding the `transform()` method:

```python
from typing import Optional
import typing

from patronus import evaluator, datasets
from patronus.experiments import FuncEvaluatorAdapter, run_experiment
from patronus.experiments.types import TaskResult, EvalParent


@evaluator()
def exact_match(actual, expected):
    return actual == expected


class AdaptedExactMatch(FuncEvaluatorAdapter):
    def __init__(self):
        super().__init__(exact_match)

    def transform(
        self,
        row: datasets.Row,
        task_result: Optional[TaskResult],
        parent: EvalParent,
        **kwargs
    ) -> tuple[list[typing.Any], dict[str, typing.Any]]:
        args = [row.task_output, row.gold_answer]
        kwargs = {}

        # Alternative: passing arguments via kwargs instead of args
        # args = []
        # kwargs = {"actual": row.task_output, "expected": row.gold_answer}

        return args, kwargs


run_experiment(
    dataset=[{"task_output": "string", "gold_answer": "string"}],
    evaluators=[AdaptedExactMatch()],
)
```

````

Source code in `src/patronus/experiments/adapters.py`

```python
def __init__(self, fn: typing.Callable[..., typing.Any], weight: Optional[Union[str, float]] = None):
    if not hasattr(fn, "_pat_evaluator"):
        raise ValueError(
            f"Passed function {fn.__qualname__} is not an evaluator. "
            "Hint: add @evaluator decorator to the function."
        )

    if weight is not None:
        try:
            Decimal(str(weight))
        except (decimal.InvalidOperation, ValueError, TypeError):
            raise TypeError(
                f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)."
            )

    self.fn = fn
    self._weight = weight

````

##### transform

```python
transform(
    row: Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: Any,
) -> tuple[list[typing.Any], dict[str, typing.Any]]

```

Transform experiment framework parameters to evaluator function parameters.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` |

Returns:

| Type | Description | | --- | --- | | `list[Any]` | A list of positional arguments to pass to the evaluator function. | | `dict[str, Any]` | A dictionary of keyword arguments to pass to the evaluator function. |

Source code in `src/patronus/experiments/adapters.py`

```python
def transform(
    self,
    row: datasets.Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: typing.Any,
) -> tuple[list[typing.Any], dict[str, typing.Any]]:
    """
    Transform experiment framework parameters to evaluator function parameters.

    Args:
        row: The data row being evaluated.
        task_result: The result of the task execution, if available.
        parent: The parent evaluation context.
        **kwargs: Additional keyword arguments from the experiment.

    Returns:
        A list of positional arguments to pass to the evaluator function.
        A dictionary of keyword arguments to pass to the evaluator function.
    """

    return (
        [],
        {"row": row, "task_result": task_result, "parent": parent, **kwargs},
    )

```

##### evaluate

```python
evaluate(
    row: Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: Any,
) -> EvaluationResult

```

Evaluate the given row and task result using the adapted evaluator function.

This method implements the BaseEvaluatorAdapter.evaluate() protocol.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` |

Returns:

| Type | Description | | --- | --- | | `EvaluationResult` | An EvaluationResult containing the evaluation outcome. |

Source code in `src/patronus/experiments/adapters.py`

```python
async def evaluate(
    self,
    row: datasets.Row,
    task_result: Optional[TaskResult],
    parent: EvalParent,
    **kwargs: typing.Any,
) -> EvaluationResult:
    """
    Evaluate the given row and task result using the adapted evaluator function.

    This method implements the BaseEvaluatorAdapter.evaluate() protocol.

    Args:
        row: The data row being evaluated.
        task_result: The result of the task execution, if available.
        parent: The parent evaluation context.
        **kwargs: Additional keyword arguments from the experiment.

    Returns:
        An EvaluationResult containing the evaluation outcome.
    """
    ev_args, ev_kwargs = self.transform(row, task_result, parent, **kwargs)
    return await self._evaluate(*ev_args, **ev_kwargs)

```

### experiment

#### Tags

```python
Tags = dict[str, str]

```

Tags are key-value pairs applied to experiments, task results and evaluation results.

#### Task

```python
Task = Union[
    TaskProtocol[Union[TaskResult, str, None]],
    TaskProtocol[Awaitable[Union[TaskResult, str, None]]],
]

```

A function that processes each dataset row and produces output for evaluation.

#### ExperimentDataset

```python
ExperimentDataset = Union[
    Dataset,
    DatasetLoader,
    list[dict[str, Any]],
    tuple[dict[str, Any], ...],
    DataFrame,
    Awaitable,
    Callable[[], Awaitable],
]

```

Any object that would "resolve" into Dataset.

#### TaskProtocol

Bases: `Protocol[T]`

Defines an interface for a task.

Task is a function that processes each dataset row and produces output for evaluation.

##### __call__

```python
__call__(*, row: Row, parent: EvalParent, tags: Tags) -> T

```

Processes a dataset row, using the provided context to produce task output.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The dataset row to process. | *required* | | `parent` | `EvalParent` | Reference to the parent task's output and evaluation results. | *required* | | `tags` | `Tags` | Key-value pairs. | *required* |

Returns:

| Type | Description | | --- | --- | | `T` | Task output of type T or None to skip the row processing. |

Example

```python
def simple_task(row: datasets.Row, parent: EvalParent, tags: Tags) -> TaskResult:
    # Process input from the dataset row
    input_text = row.task_input

    # Generate output
    output = f"Processed: {input_text}"

    # Return result
    return TaskResult(
        output=output,
        metadata={"processing_time_ms": 42},
        tags={"model": "example-model"}
    )

```

Source code in `src/patronus/experiments/experiment.py`

````python
def __call__(self, *, row: datasets.Row, parent: EvalParent, tags: Tags) -> T:
    """
    Processes a dataset row, using the provided context to produce task output.

    Args:
        row: The dataset row to process.
        parent: Reference to the parent task's output and evaluation results.
        tags: Key-value pairs.

    Returns:
        Task output of type T or None to skip the row processing.

    Example:
        ```python
        def simple_task(row: datasets.Row, parent: EvalParent, tags: Tags) -> TaskResult:
            # Process input from the dataset row
            input_text = row.task_input

            # Generate output
            output = f"Processed: {input_text}"

            # Return result
            return TaskResult(
                output=output,
                metadata={"processing_time_ms": 42},
                tags={"model": "example-model"}
            )
        ```
    """

````

#### ChainLink

Bases: `TypedDict`

Represents a single stage in an experiment's processing chain.

Each ChainLink contains an optional task function that processes dataset rows and a list of evaluators that assess the task's output.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `task` | `Optional[Task]` | Function that processes a dataset row and produces output. | | `evaluators` | `list[AdaptableEvaluators]` | List of evaluators to assess the task's output. |

#### Experiment

```python
Experiment(
    *,
    dataset: Any,
    task: Optional[Task] = None,
    evaluators: Optional[list[AdaptableEvaluators]] = None,
    chain: Optional[list[ChainLink]] = None,
    tags: Optional[dict[str, str]] = None,
    metadata: Optional[dict[str, Any]] = None,
    max_concurrency: int = 10,
    project_name: Optional[str] = None,
    experiment_name: Optional[str] = None,
    service: Optional[str] = None,
    api_key: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    ui_url: Optional[str] = None,
    timeout_s: Optional[int] = None,
    integrations: Optional[list[Any]] = None,
    **kwargs,
)

```

Manages evaluation experiments across datasets using tasks and evaluators.

An experiment represents a complete evaluation pipeline that processes a dataset using defined tasks, applies evaluators to the outputs, and collects the results. Experiments track progress, create reports, and interface with the Patronus platform.

Create experiment instances using the create() class method or through the run_experiment() convenience function.

Source code in `src/patronus/experiments/experiment.py`

```python
def __init__(
    self,
    *,
    dataset: typing.Any,
    task: Optional[Task] = None,
    evaluators: Optional[list[AdaptableEvaluators]] = None,
    chain: Optional[list[ChainLink]] = None,
    tags: Optional[dict[str, str]] = None,
    metadata: Optional[dict[str, Any]] = None,
    max_concurrency: int = 10,
    project_name: Optional[str] = None,
    experiment_name: Optional[str] = None,
    service: Optional[str] = None,
    api_key: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    ui_url: Optional[str] = None,
    timeout_s: Optional[int] = None,
    integrations: Optional[list[typing.Any]] = None,
    **kwargs,
):
    if chain and evaluators:
        raise ValueError("Cannot specify both chain and evaluators")

    self._raw_dataset = dataset

    if not chain:
        chain = [{"task": task, "evaluators": evaluators}]
    self._chain = [
        {"task": _trace_task(link["task"]), "evaluators": _adapt_evaluators(link["evaluators"])} for link in chain
    ]
    self._started = False
    self._finished = False

    self._project_name = project_name
    self.project = None

    self._experiment_name = experiment_name
    self.experiment = None

    self.tags = tags or {}
    self.metadata = metadata

    self.max_concurrency = max_concurrency

    self._service = service
    self._api_key = api_key
    self._api_url = api_url
    self._otel_endpoint = otel_endpoint
    self._otel_exporter_otlp_protocol = otel_exporter_otlp_protocol
    self._ui_url = ui_url
    self._timeout_s = timeout_s

    self._prepared = False

    self.reporter = Reporter()

    self._integrations = integrations

```

##### create

```python
create(
    dataset: ExperimentDataset,
    task: Optional[Task] = None,
    evaluators: Optional[list[AdaptableEvaluators]] = None,
    chain: Optional[list[ChainLink]] = None,
    tags: Optional[Tags] = None,
    metadata: Optional[dict[str, Any]] = None,
    max_concurrency: int = 10,
    project_name: Optional[str] = None,
    experiment_name: Optional[str] = None,
    service: Optional[str] = None,
    api_key: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    ui_url: Optional[str] = None,
    timeout_s: Optional[int] = None,
    integrations: Optional[list[Any]] = None,
    **kwargs: Any,
) -> te.Self

```

Creates an instance of the class asynchronously with the specified parameters while performing necessary preparations. This method initializes various attributes including dataset, task, evaluators, chain, and additional configurations for managing concurrency, project details, service information, API keys, timeout settings, and integrations.

Use run_experiment for more convenient usage.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `dataset` | `ExperimentDataset` | The dataset to run evaluations against. | *required* | | `task` | `Optional[Task]` | A function that processes each dataset row and produces output for evaluation. Mutually exclusive with the chain parameter. | `None` | | `evaluators` | `Optional[list[AdaptableEvaluators]]` | A list of evaluators to assess the task output. Mutually exclusive with the chain parameter. | `None` | | `chain` | `Optional[list[ChainLink]]` | A list of processing stages, each containing a task and associated evaluators. Use this for multi-stage evaluation pipelines. | `None` | | `tags` | `Optional[Tags]` | Key-value pairs. All evaluations created by the experiment will contain these tags. | `None` | | `metadata` | `Optional[dict[str, Any]]` | Arbitrary dict. Metadata associated with the experiment. | `None` | | `max_concurrency` | `int` | Maximum number of concurrent task and evaluation operations. | `10` | | `project_name` | `Optional[str]` | Name of the project to create or use. Falls back to configuration or environment variables if not provided. | `None` | | `experiment_name` | `Optional[str]` | Custom name for this experiment run. A timestamp will be appended. | `None` | | `service` | `Optional[str]` | OpenTelemetry service name for tracing. Falls back to configuration or environment variables if not provided. | `None` | | `api_key` | `Optional[str]` | API key for Patronus services. Falls back to configuration or environment variables if not provided. | `None` | | `api_url` | `Optional[str]` | URL for the Patronus API. Falls back to configuration or environment variables if not provided. | `None` | | `otel_endpoint` | `Optional[str]` | OpenTelemetry collector endpoint. Falls back to configuration or environment variables if not provided. | `None` | | `otel_exporter_otlp_protocol` | `Optional[str]` | OpenTelemetry exporter protocol (grpc or http/protobuf). Falls back to configuration or environment variables if not provided. | `None` | | `ui_url` | `Optional[str]` | URL for the Patronus UI. Falls back to configuration or environment variables if not provided. | `None` | | `timeout_s` | `Optional[int]` | Timeout in seconds for API operations. Falls back to configuration or environment variables if not provided. | `None` | | `integrations` | `Optional[list[Any]]` | A list of OpenTelemetry instrumentors for additional tracing capabilities. | `None` | | `**kwargs` | `Any` | Additional keyword arguments passed to the experiment. | `{}` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `Experiment` | `Self` | ... |

Source code in `src/patronus/experiments/experiment.py`

```python
@classmethod
async def create(
    cls,
    dataset: ExperimentDataset,
    task: Optional[Task] = None,
    evaluators: Optional[list[AdaptableEvaluators]] = None,
    chain: Optional[list[ChainLink]] = None,
    tags: Optional[Tags] = None,
    metadata: Optional[dict[str, Any]] = None,
    max_concurrency: int = 10,
    project_name: Optional[str] = None,
    experiment_name: Optional[str] = None,
    service: Optional[str] = None,
    api_key: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    ui_url: Optional[str] = None,
    timeout_s: Optional[int] = None,
    integrations: Optional[list[typing.Any]] = None,
    **kwargs: typing.Any,
) -> te.Self:
    """
    Creates an instance of the class asynchronously with the specified parameters while performing
    necessary preparations. This method initializes various attributes including dataset, task,
    evaluators, chain, and additional configurations for managing concurrency, project details,
    service information, API keys, timeout settings, and integrations.

    Use [run_experiment][patronus.experiments.experiment.run_experiment] for more convenient usage.

    Args:
        dataset: The dataset to run evaluations against.
        task: A function that processes each dataset row and produces output for evaluation.
            Mutually exclusive with the `chain` parameter.
        evaluators: A list of evaluators to assess the task output. Mutually exclusive with
            the `chain` parameter.
        chain: A list of processing stages, each containing a task and associated evaluators.
            Use this for multi-stage evaluation pipelines.
        tags: Key-value pairs.
            All evaluations created by the experiment will contain these tags.
        metadata: Arbitrary dict.
            Metadata associated with the experiment.
        max_concurrency: Maximum number of concurrent task and evaluation operations.
        project_name: Name of the project to create or use. Falls back to configuration or
            environment variables if not provided.
        experiment_name: Custom name for this experiment run. A timestamp will be appended.
        service: OpenTelemetry service name for tracing. Falls back to configuration or
            environment variables if not provided.
        api_key: API key for Patronus services. Falls back to configuration or environment
            variables if not provided.
        api_url: URL for the Patronus API. Falls back to configuration or environment
            variables if not provided.
        otel_endpoint: OpenTelemetry collector endpoint. Falls back to configuration or
            environment variables if not provided.
        otel_exporter_otlp_protocol: OpenTelemetry exporter protocol (grpc or http/protobuf).
            Falls back to configuration or environment variables if not provided.
        ui_url: URL for the Patronus UI. Falls back to configuration or environment
            variables if not provided.
        timeout_s: Timeout in seconds for API operations. Falls back to configuration or
            environment variables if not provided.
        integrations: A list of OpenTelemetry instrumentors for additional tracing capabilities.
        **kwargs: Additional keyword arguments passed to the experiment.

    Returns:
        Experiment: ...

    """
    ex = cls(
        dataset=dataset,
        task=task,
        evaluators=evaluators,
        chain=chain,
        tags=tags,
        metadata=metadata,
        max_concurrency=max_concurrency,
        project_name=project_name,
        experiment_name=experiment_name,
        service=service,
        api_key=api_key,
        api_url=api_url,
        otel_endpoint=otel_endpoint,
        otel_exporter_otlp_protocol=otel_exporter_otlp_protocol,
        ui_url=ui_url,
        timeout_s=timeout_s,
        integrations=integrations,
        **kwargs,
    )
    ex._ctx = await ex._prepare()

    return ex

```

##### run

```python
run() -> te.Self

```

Executes the experiment by processing all dataset items.

Runs the experiment's task chain on each dataset row, applying evaluators to the results and collecting metrics. Progress is displayed with a progress bar and results are logged to the Patronus platform.

Returns:

| Type | Description | | --- | --- | | `Self` | The experiment instance. |

Source code in `src/patronus/experiments/experiment.py`

```python
async def run(self) -> te.Self:
    """
    Executes the experiment by processing all dataset items.

    Runs the experiment's task chain on each dataset row, applying evaluators
    to the results and collecting metrics. Progress is displayed with a progress
    bar and results are logged to the Patronus platform.

    Returns:
        The experiment instance.
    """
    if self._started:
        raise RuntimeError("Experiment already started")
    if self._prepared is False:
        raise ValueError(
            "Experiment must be prepared before starting. "
            "Seems that Experiment was not created using Experiment.create() classmethod."
        )
    self._started = True

    with context._CTX_PAT.using(self._ctx):
        await self._run()
        self._finished = True
        self.reporter.summary()

    await asyncio.to_thread(self._ctx.exporter.force_flush)
    await asyncio.to_thread(self._ctx.tracer_provider.force_flush)

    return self

```

##### to_dataframe

```python
to_dataframe() -> pd.DataFrame

```

Converts experiment results to a pandas DataFrame.

Creates a tabular representation of all evaluation results with dataset identifiers, task information, evaluation scores, and metadata.

Returns:

| Type | Description | | --- | --- | | `DataFrame` | A pandas DataFrame containing all experiment results. |

Source code in `src/patronus/experiments/experiment.py`

```python
def to_dataframe(self) -> pd.DataFrame:
    """
    Converts experiment results to a pandas DataFrame.

    Creates a tabular representation of all evaluation results with
    dataset identifiers, task information, evaluation scores, and metadata.

    Returns:
        A pandas DataFrame containing all experiment results.
    """
    if self._finished is not True:
        raise RuntimeError("Experiment has to be in finished state")
    return self.reporter.to_dataframe()

```

##### to_csv

```python
to_csv(
    path_or_buf: Union[str, Path, IO[AnyStr]], **kwargs: Any
) -> Optional[str]

```

Saves experiment results to a CSV file.

Converts experiment results to a DataFrame and saves them as a CSV file.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `path_or_buf` | `Union[str, Path, IO[AnyStr]]` | String path or file-like object where the CSV will be saved. | *required* | | `**kwargs` | `Any` | Additional arguments passed to pandas.DataFrame.to_csv(). | `{}` |

Returns:

| Type | Description | | --- | --- | | `Optional[str]` | String path if a path was specified and return_path is True, otherwise None. |

Source code in `src/patronus/experiments/experiment.py`

```python
def to_csv(
    self, path_or_buf: Union[str, pathlib.Path, typing.IO[typing.AnyStr]], **kwargs: typing.Any
) -> Optional[str]:
    """
    Saves experiment results to a CSV file.

    Converts experiment results to a DataFrame and saves them as a CSV file.

    Args:
        path_or_buf: String path or file-like object where the CSV will be saved.
        **kwargs: Additional arguments passed to pandas.DataFrame.to_csv().

    Returns:
        String path if a path was specified and return_path is True, otherwise None.

    """
    return self.to_dataframe().to_csv(path_or_buf, **kwargs)

```

#### run_experiment

```python
run_experiment(
    dataset: ExperimentDataset,
    task: Optional[Task] = None,
    evaluators: Optional[list[AdaptableEvaluators]] = None,
    chain: Optional[list[ChainLink]] = None,
    tags: Optional[Tags] = None,
    max_concurrency: int = 10,
    project_name: Optional[str] = None,
    experiment_name: Optional[str] = None,
    service: Optional[str] = None,
    api_key: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    ui_url: Optional[str] = None,
    timeout_s: Optional[int] = None,
    integrations: Optional[list[Any]] = None,
    **kwargs,
) -> Union[Experiment, typing.Awaitable[Experiment]]

```

Create and run an experiment.

This function creates an experiment with the specified configuration and runs it to completion. The execution handling is context-aware:

- When called from an asynchronous context (with a running event loop), it returns an awaitable that must be awaited.
- When called from a synchronous context (no running event loop), it blocks until the experiment completes and returns the Experiment object.

**Examples:**

Synchronous execution:

```python
experiment = run_experiment(dataset, task=some_task)
# Blocks until the experiment finishes.

```

Asynchronous execution (e.g., in a Jupyter Notebook):

```python
experiment = await run_experiment(dataset, task=some_task)
# Must be awaited within an async function or event loop.

```

**Parameters:**

See Experiment.create for list of arguments.

Returns:

| Name | Type | Description | | --- | --- | --- | | `Experiment` | `Experiment` | In a synchronous context: the completed Experiment object. | | `Experiment` | `Awaitable[Experiment]` | In an asynchronous context: an awaitable that resolves to the Experiment object. |

Notes

For manual control of the event loop, you can create and run the experiment as follows:

```python
experiment = await Experiment.create(...)
await experiment.run()

```

Source code in `src/patronus/experiments/experiment.py`

````python
def run_experiment(
    dataset: ExperimentDataset,
    task: Optional[Task] = None,
    evaluators: Optional[list[AdaptableEvaluators]] = None,
    chain: Optional[list[ChainLink]] = None,
    tags: Optional[Tags] = None,
    max_concurrency: int = 10,
    project_name: Optional[str] = None,
    experiment_name: Optional[str] = None,
    service: Optional[str] = None,
    api_key: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    ui_url: Optional[str] = None,
    timeout_s: Optional[int] = None,
    integrations: Optional[list[typing.Any]] = None,
    **kwargs,
) -> Union["Experiment", typing.Awaitable["Experiment"]]:
    """
    Create and run an experiment.

    This function creates an experiment with the specified configuration and runs it to completion.
    The execution handling is context-aware:

    - When called from an asynchronous context (with a running event loop), it returns an
      awaitable that must be awaited.
    - When called from a synchronous context (no running event loop), it blocks until the
      experiment completes and returns the Experiment object.


    **Examples:**

    Synchronous execution:

    ```python
    experiment = run_experiment(dataset, task=some_task)
    # Blocks until the experiment finishes.
    ```

    Asynchronous execution (e.g., in a Jupyter Notebook):

    ```python
    experiment = await run_experiment(dataset, task=some_task)
    # Must be awaited within an async function or event loop.
    ```

    **Parameters:**

    See [Experiment.create][patronus.experiments.experiment.Experiment.create] for list of arguments.

    Returns:
        Experiment (Experiment): In a synchronous context: the completed Experiment object.
        Experiment (Awaitable[Experiment]): In an asynchronous context:
            an awaitable that resolves to the Experiment object.

    Notes:
        For manual control of the event loop, you can create and run the experiment as follows:

        ```python
        experiment = await Experiment.create(...)
        await experiment.run()
        ```

    """

    async def _run_experiment() -> Union[Experiment, typing.Awaitable[Experiment]]:
        ex = await Experiment.create(
            dataset=dataset,
            task=task,
            evaluators=evaluators,
            chain=chain,
            tags=tags,
            max_concurrency=max_concurrency,
            project_name=project_name,
            experiment_name=experiment_name,
            service=service,
            api_key=api_key,
            api_url=api_url,
            otel_endpoint=otel_endpoint,
            otel_exporter_otlp_protocol=otel_exporter_otlp_protocol,
            ui_url=ui_url,
            timeout_s=timeout_s,
            integrations=integrations,
            **kwargs,
        )
        return await ex.run()

    return run_until_complete(_run_experiment())

````

### types

#### EvalParent

```python
EvalParent = Optional[_EvalParent]

```

Type alias representing an optional reference to an evaluation parent, used to track the hierarchy of evaluations and their results

#### TaskResult

Bases: `BaseModel`

Represents the result of a task with optional output, metadata, and tags.

This class is used to encapsulate the result of a task, including optional fields for the output of the task, metadata related to the task, and any tags that can provide additional information or context about the task.

Attributes:

| Name | Type | Description | | --- | --- | --- | | `output` | `Optional[str]` | The output of the task, if any. | | `metadata` | `Optional[dict[str, Any]]` | Additional information or metadata associated with the task. | | `tags` | `Optional[dict[str, str]]` | Key-value pairs used to tag and describe the task. |

#### EvalsMap

Bases: `dict`

A specialized dictionary for storing evaluation results with flexible key handling.

This class extends dict to provide automatic key normalization for evaluation results, allowing lookup by evaluator objects, strings, or any object with a canonical_name attribute.

# Init

## patronus.init

### init

```python
init(
    project_name: Optional[str] = None,
    app: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    api_key: Optional[str] = None,
    service: Optional[str] = None,
    resource_dir: Optional[str] = None,
    prompt_providers: Optional[list[str]] = None,
    prompt_templating_engine: Optional[str] = None,
    integrations: Optional[list[Any]] = None,
    **kwargs: Any,
) -> context.PatronusContext

```

Initializes the Patronus SDK with the specified configuration.

This function sets up the SDK with project details, API connections, and telemetry. It must be called before using evaluators or experiments to ensure proper recording of results and metrics.

Note

`init()` should not be used for running experiments. Experiments have its own initialization process. You can configure them by passing configuration options to run_experiment() or using configuration file.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `project_name` | `Optional[str]` | Name of the project for organizing evaluations and experiments. Falls back to configuration file, then defaults to "Global" if not provided. | `None` | | `app` | `Optional[str]` | Name of the application within the project. Falls back to configuration file, then defaults to "default" if not provided. | `None` | | `api_url` | `Optional[str]` | URL for the Patronus API service. Falls back to configuration file or environment variables if not provided. | `None` | | `otel_endpoint` | `Optional[str]` | Endpoint for OpenTelemetry data collection. Falls back to configuration file or environment variables if not provided. | `None` | | `otel_exporter_otlp_protocol` | `Optional[str]` | OpenTelemetry exporter protocol (grpc or http/protobuf). Falls back to configuration file or environment variables if not provided. | `None` | | `api_key` | `Optional[str]` | Authentication key for Patronus services. Falls back to configuration file or environment variables if not provided. | `None` | | `service` | `Optional[str]` | Service name for OpenTelemetry traces. Falls back to configuration file or environment variables if not provided. | `None` | | `integrations` | `Optional[list[Any]]` | List of integration to use. | `None` | | `**kwargs` | `Any` | Additional configuration options for the SDK. | `{}` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `PatronusContext` | `PatronusContext` | The initialized context object. |

Example

```python
import patronus

# Load configuration from configuration file or environment variables
patronus.init()

# Custom initialization
patronus.init(
    project_name="my-project",
    app="recommendation-service",
    api_key="your-api-key"
)

```

Source code in `src/patronus/init.py`

````python
def init(
    project_name: Optional[str] = None,
    app: Optional[str] = None,
    api_url: Optional[str] = None,
    otel_endpoint: Optional[str] = None,
    otel_exporter_otlp_protocol: Optional[str] = None,
    api_key: Optional[str] = None,
    service: Optional[str] = None,
    resource_dir: Optional[str] = None,
    prompt_providers: Optional[list[str]] = None,
    prompt_templating_engine: Optional[str] = None,
    integrations: Optional[list[typing.Any]] = None,
    **kwargs: typing.Any,
) -> context.PatronusContext:
    """
    Initializes the Patronus SDK with the specified configuration.

    This function sets up the SDK with project details, API connections, and telemetry.
    It must be called before using evaluators or experiments to ensure proper recording
    of results and metrics.

    Note:
        `init()` should not be used for running experiments.
        Experiments have its own initialization process.
        You can configure them by passing configuration options to [`run_experiment()`][patronus.experiments.experiment.run_experiment]
        or using configuration file.

    Args:
        project_name: Name of the project for organizing evaluations and experiments.
            Falls back to configuration file, then defaults to "Global" if not provided.
        app: Name of the application within the project.
            Falls back to configuration file, then defaults to "default" if not provided.
        api_url: URL for the Patronus API service.
            Falls back to configuration file or environment variables if not provided.
        otel_endpoint: Endpoint for OpenTelemetry data collection.
            Falls back to configuration file or environment variables if not provided.
        otel_exporter_otlp_protocol: OpenTelemetry exporter protocol (grpc or http/protobuf).
            Falls back to configuration file or environment variables if not provided.
        api_key: Authentication key for Patronus services.
            Falls back to configuration file or environment variables if not provided.
        service: Service name for OpenTelemetry traces.
            Falls back to configuration file or environment variables if not provided.
        integrations: List of integration to use.
        **kwargs: Additional configuration options for the SDK.

    Returns:
        PatronusContext: The initialized context object.

    Example:
        ```python
        import patronus

        # Load configuration from configuration file or environment variables
        patronus.init()

        # Custom initialization
        patronus.init(
            project_name="my-project",
            app="recommendation-service",
            api_key="your-api-key"
        )
        ```
    """
    if api_url != config.DEFAULT_API_URL and otel_endpoint == config.DEFAULT_OTEL_ENDPOINT:
        raise ValueError(
            "'api_url' is set to non-default value, "
            "but 'otel_endpoint' is a default. Change 'otel_endpoint' to point to the same environment as 'api_url'"
        )

    def build_and_set():
        cfg = config.config()
        ctx = build_context(
            service=service or cfg.service,
            project_name=project_name or cfg.project_name,
            app=app or cfg.app,
            experiment_id=None,
            experiment_name=None,
            api_url=api_url or cfg.api_url,
            otel_endpoint=otel_endpoint or cfg.otel_endpoint,
            otel_exporter_otlp_protocol=otel_exporter_otlp_protocol or cfg.otel_exporter_otlp_protocol,
            api_key=api_key or cfg.api_key,
            resource_dir=resource_dir or cfg.resource_dir,
            prompt_providers=prompt_providers or cfg.prompt_providers,
            prompt_templating_engine=cfg.prompt_templating_engine,
            timeout_s=cfg.timeout_s,
            integrations=integrations,
            **kwargs,
        )
        context.set_global_patronus_context(ctx)

    inited_now = _INIT_ONCE.do_once(build_and_set)
    if not inited_now:
        warnings.warn(
            ("The Patronus SDK has already been initialized. Duplicate initialization attempts are ignored."),
            UserWarning,
            stacklevel=2,
        )
    return context.get_current_context()

````

### build_context

```python
build_context(
    service: str,
    project_name: str,
    app: Optional[str],
    experiment_id: Optional[str],
    experiment_name: Optional[str],
    api_url: Optional[str],
    otel_endpoint: str,
    otel_exporter_otlp_protocol: Optional[str],
    api_key: str,
    resource_dir: Optional[str] = None,
    prompt_providers: Optional[list[str]] = None,
    prompt_templating_engine: Optional[str] = None,
    client_http: Optional[Client] = None,
    client_http_async: Optional[AsyncClient] = None,
    timeout_s: int = 60,
    integrations: Optional[list[Any]] = None,
    **kwargs: Any,
) -> context.PatronusContext

```

Builds a Patronus context with the specified configuration parameters.

This function creates the context object that contains all necessary components for the SDK operation, including loggers, tracers, and API clients. It is used internally by the init() function but can also be used directly for more advanced configuration scenarios.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `service` | `str` | Service name for OpenTelemetry traces. | *required* | | `project_name` | `str` | Name of the project for organizing evaluations and experiments. | *required* | | `app` | `Optional[str]` | Name of the application within the project. | *required* | | `experiment_id` | `Optional[str]` | Unique identifier for an experiment when running in experiment mode. | *required* | | `experiment_name` | `Optional[str]` | Display name for an experiment when running in experiment mode. | *required* | | `api_url` | `Optional[str]` | URL for the Patronus API service. | *required* | | `otel_endpoint` | `str` | Endpoint for OpenTelemetry data collection. | *required* | | `otel_exporter_otlp_protocol` | `Optional[str]` | OpenTelemetry exporter protocol (grpc or http/protobuf). | *required* | | `api_key` | `str` | Authentication key for Patronus services. | *required* | | `client_http` | `Optional[Client]` | Custom HTTP client for synchronous API requests. If not provided, a new client will be created. | `None` | | `client_http_async` | `Optional[AsyncClient]` | Custom HTTP client for asynchronous API requests. If not provided, a new client will be created. | `None` | | `timeout_s` | `int` | Timeout in seconds for HTTP requests (default: 60). | `60` | | `integrations` | `Optional[list[Any]]` | List of PatronusIntegrator instances. | `None` | | `**kwargs` | `Any` | Additional configuration options, including: - integrations: List of OpenTelemetry instrumentors to enable. | `{}` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `PatronusContext` | `PatronusContext` | The initialized context object containing all necessary components for SDK operation. |

Source code in `src/patronus/init.py`

```python
def build_context(
    service: str,
    project_name: str,
    app: Optional[str],
    experiment_id: Optional[str],
    experiment_name: Optional[str],
    api_url: Optional[str],
    otel_endpoint: str,
    otel_exporter_otlp_protocol: Optional[str],
    api_key: str,
    resource_dir: Optional[str] = None,
    prompt_providers: Optional[list[str]] = None,
    prompt_templating_engine: Optional[str] = None,
    client_http: Optional[httpx.Client] = None,
    client_http_async: Optional[httpx.AsyncClient] = None,
    timeout_s: int = 60,
    integrations: Optional[list[typing.Any]] = None,
    **kwargs: typing.Any,
) -> context.PatronusContext:
    """
    Builds a Patronus context with the specified configuration parameters.

    This function creates the context object that contains all necessary components
    for the SDK operation, including loggers, tracers, and API clients. It is used
    internally by the [`init()`][patronus.init.init] function but can also be used directly for more
    advanced configuration scenarios.

    Args:
        service: Service name for OpenTelemetry traces.
        project_name: Name of the project for organizing evaluations and experiments.
        app: Name of the application within the project.
        experiment_id: Unique identifier for an experiment when running in experiment mode.
        experiment_name: Display name for an experiment when running in experiment mode.
        api_url: URL for the Patronus API service.
        otel_endpoint: Endpoint for OpenTelemetry data collection.
        otel_exporter_otlp_protocol: OpenTelemetry exporter protocol (grpc or http/protobuf).
        api_key: Authentication key for Patronus services.
        client_http: Custom HTTP client for synchronous API requests.
            If not provided, a new client will be created.
        client_http_async: Custom HTTP client for asynchronous API requests.
            If not provided, a new client will be created.
        timeout_s: Timeout in seconds for HTTP requests (default: 60).
        integrations: List of PatronusIntegrator instances.
        **kwargs: Additional configuration options, including:
            - integrations: List of OpenTelemetry instrumentors to enable.

    Returns:
        PatronusContext: The initialized context object containing all necessary
            components for SDK operation.
    """
    if client_http is None:
        client_http = httpx.Client(timeout=timeout_s)
    if client_http_async is None:
        client_http_async = httpx.AsyncClient(timeout=timeout_s)

    integrations = prepare_integrations(integrations)

    scope = context.PatronusScope(
        service=service,
        project_name=project_name,
        app=app,
        experiment_id=experiment_id,
        experiment_name=experiment_name,
    )
    api_deprecated = PatronusAPIClient(
        client_http_async=client_http_async,
        client_http=client_http,
        base_url=api_url,
        api_key=api_key,
    )
    api_client = patronus_api.Client(api_key=api_key, base_url=api_url)
    async_api_client = patronus_api.AsyncClient(api_key=api_key, base_url=api_url)

    logger_provider = create_logger_provider(
        exporter_endpoint=otel_endpoint,
        api_key=api_key,
        scope=scope,
        protocol=otel_exporter_otlp_protocol,
    )

    tracer_provider = create_tracer_provider(
        exporter_endpoint=otel_endpoint,
        api_key=api_key,
        scope=scope,
        protocol=otel_exporter_otlp_protocol,
    )

    eval_exporter = BatchEvaluationExporter(client=api_deprecated)
    ctx = context.PatronusContext(
        scope=scope,
        tracer_provider=tracer_provider,
        logger_provider=logger_provider,
        api_client_deprecated=api_deprecated,
        api_client=api_client,
        async_api_client=async_api_client,
        exporter=eval_exporter,
        prompts=context.PromptsConfig(
            directory=resource_dir and pathlib.Path(resource_dir, "prompts"),
            providers=prompt_providers,
            templating_engine=prompt_templating_engine,
        ),
    )
    apply_integrations(ctx, integrations)
    return ctx

```

# Integrations

## patronus.integrations

This package provides integration points for connecting various third-party libraries and tools with the Patronus SDK.

### instrumenter

#### BasePatronusIntegrator

Bases: `ABC`

Abstract base class for Patronus integrations.

This class defines the interface for integrating external libraries and tools with the Patronus context. All specific integrators should inherit from this class and implement the required methods.

##### apply

```python
apply(ctx: PatronusContext, **kwargs: Any)

```

Apply the integration to the given Patronus context.

This method must be implemented by subclasses to define how the integration is applied to a Patronus context instance.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context to apply the integration to. | *required* | | `**kwargs` | `Any` | Additional keyword arguments specific to the implementation. | `{}` |

Source code in `src/patronus/integrations/instrumenter.py`

```python
@abc.abstractmethod
def apply(self, ctx: "context.PatronusContext", **kwargs: typing.Any):
    """
    Apply the integration to the given Patronus context.

    This method must be implemented by subclasses to define how the
    integration is applied to a Patronus context instance.

    Args:
        ctx: The Patronus context to apply the integration to.
        **kwargs: Additional keyword arguments specific to the implementation.
    """

```

### otel

#### OpenTelemetryIntegrator

```python
OpenTelemetryIntegrator(instrumentor: BaseInstrumentor)

```

Bases: `BasePatronusIntegrator`

Integration for OpenTelemetry instrumentors with Patronus.

This class provides an adapter between OpenTelemetry instrumentors and the Patronus context, allowing for easy integration of OpenTelemetry instrumentation in Patronus-managed applications.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `instrumentor` | `BaseInstrumentor` | An OpenTelemetry instrumentor instance that will be applied to the Patronus context. | *required* |

Source code in `src/patronus/integrations/otel.py`

```python
def __init__(self, instrumentor: "BaseInstrumentor"):
    """
    Initialize the OpenTelemetry integrator.

    Args:
        instrumentor: An OpenTelemetry instrumentor instance that will be
            applied to the Patronus context.
    """
    self.instrumentor = instrumentor

```

##### apply

```python
apply(ctx: PatronusContext, **kwargs: Any)

```

Apply OpenTelemetry instrumentation to the Patronus context.

This method configures the OpenTelemetry instrumentor with the tracer provider from the Patronus context.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context containing the tracer provider. | *required* | | `**kwargs` | `Any` | Additional keyword arguments (unused). | `{}` |

Source code in `src/patronus/integrations/otel.py`

```python
def apply(self, ctx: "context.PatronusContext", **kwargs: typing.Any):
    """
    Apply OpenTelemetry instrumentation to the Patronus context.

    This method configures the OpenTelemetry instrumentor with the
    tracer provider from the Patronus context.

    Args:
        ctx: The Patronus context containing the tracer provider.
        **kwargs: Additional keyword arguments (unused).
    """
    self.instrumentor.instrument(tracer_provider=ctx.tracer_provider)

```

### pydantic_ai

#### PydanticAIIntegrator

```python
PydanticAIIntegrator(
    event_mode: Literal["attributes", "logs"] = "logs",
)

```

Bases: `BasePatronusIntegrator`

Integration for Pydantic-AI with Patronus.

This class provides integration between Pydantic-AI agents and the Patronus observability stack, enabling tracing and logging of Pydantic-AI agent operations.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `event_mode` | `Literal['attributes', 'logs']` | The mode for capturing events, either as span attributes or as logs. Default is "logs". | `'logs'` |

Source code in `src/patronus/integrations/pydantic_ai.py`

```python
def __init__(self, event_mode: Literal["attributes", "logs"] = "logs"):
    """
    Initialize the Pydantic-AI integrator.

    Args:
        event_mode: The mode for capturing events, either as span attributes
            or as logs. Default is "logs".
    """
    self._instrumentation_settings = {"event_mode": event_mode}

```

##### apply

```python
apply(ctx: PatronusContext, **kwargs: Any)

```

Apply Pydantic-AI instrumentation to the Patronus context.

This method configures all Pydantic-AI agents to use the tracer and logger providers from the Patronus context.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context containing the tracer and logger providers. | *required* | | `**kwargs` | `Any` | Additional keyword arguments (unused). | `{}` |

Source code in `src/patronus/integrations/pydantic_ai.py`

```python
def apply(self, ctx: "context.PatronusContext", **kwargs: Any):
    """
    Apply Pydantic-AI instrumentation to the Patronus context.

    This method configures all Pydantic-AI agents to use the tracer and logger
    providers from the Patronus context.

    Args:
        ctx: The Patronus context containing the tracer and logger providers.
        **kwargs: Additional keyword arguments (unused).
    """
    from pydantic_ai.agent import Agent, InstrumentationSettings

    settings_kwargs = {
        **self._instrumentation_settings,
        "tracer_provider": ctx.tracer_provider,
        "event_logger_provider": EventLoggerProvider(ctx.logger_provider),
    }
    settings = InstrumentationSettings(**settings_kwargs)
    Agent.instrument_all(instrument=settings)

```

# Patronus Objects

## client_async

### AsyncPatronus

```python
AsyncPatronus(max_workers: int = 10)

```

Source code in `src/patronus/pat_client/client_async.py`

```python
def __init__(self, max_workers: int = 10):
    self._pending_tasks = collections.deque()
    self._executor = ThreadPoolExecutor(max_workers=max_workers)
    self._semaphore = asyncio.Semaphore(max_workers)

```

#### evaluate

```python
evaluate(
    evaluators: Union[List[Evaluator], Evaluator],
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[dict] = None,
    return_exceptions: bool = False,
) -> EvaluationContainer

```

Run multiple evaluators in parallel.

Source code in `src/patronus/pat_client/client_async.py`

```python
async def evaluate(
    self,
    evaluators: Union[List[Evaluator], Evaluator],
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[dict] = None,
    return_exceptions: bool = False,
) -> EvaluationContainer:
    """
    Run multiple evaluators in parallel.
    """
    singular_eval = not isinstance(evaluators, list)
    if singular_eval:
        evaluators = [evaluators]
    evaluators = self._map_evaluators(evaluators)

    def into_coro(fn, **kwargs):
        if inspect.iscoroutinefunction(fn):
            coro = fn(**kwargs)
        else:
            coro = asyncio.to_thread(fn, **kwargs)
        return with_semaphore(self._semaphore, coro)

    with bundled_eval():
        results = await asyncio.gather(
            *(
                into_coro(
                    ev.evaluate,
                    system_prompt=system_prompt,
                    task_context=task_context,
                    task_input=task_input,
                    task_output=task_output,
                    gold_answer=gold_answer,
                    task_metadata=task_metadata,
                )
                for ev in evaluators
            ),
            return_exceptions=return_exceptions,
        )
    return EvaluationContainer(results)

```

#### evaluate_bg

```python
evaluate_bg(
    evaluators: Union[List[Evaluator], Evaluator],
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[dict] = None,
) -> Task[EvaluationContainer]

```

Run multiple evaluators in parallel. The returned task will be a background task.

Source code in `src/patronus/pat_client/client_async.py`

```python
def evaluate_bg(
    self,
    evaluators: Union[List[Evaluator], Evaluator],
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[dict] = None,
) -> Task[EvaluationContainer]:
    """
    Run multiple evaluators in parallel. The returned task will be a background task.
    """
    loop = asyncio.get_running_loop()
    task = loop.create_task(
        self.evaluate(
            evaluators=evaluators,
            system_prompt=system_prompt,
            task_context=task_context,
            task_input=task_input,
            task_output=task_output,
            gold_answer=gold_answer,
            task_metadata=task_metadata,
            return_exceptions=True,
        ),
        name="evaluate_bg",
    )
    self._pending_tasks.append(task)
    task.add_done_callback(self._consume_tasks)
    return task

```

#### close

```python
close()

```

Gracefully close the client. This will wait for all background tasks to finish.

Source code in `src/patronus/pat_client/client_async.py`

```python
async def close(self):
    """
    Gracefully close the client. This will wait for all background tasks to finish.
    """
    while len(self._pending_tasks) != 0:
        await self._pending_tasks.popleft()

```

## client_sync

### Patronus

```python
Patronus(workers: int = 10, shutdown_on_exit: bool = True)

```

Source code in `src/patronus/pat_client/client_sync.py`

```python
def __init__(self, workers: int = 10, shutdown_on_exit: bool = True):
    self._worker_pool = ThreadPool(workers)
    self._supervisor_pool = ThreadPool(workers)

    self._at_exit_handler = None
    if shutdown_on_exit:
        self._at_exit_handler = atexit.register(self.close)

```

#### evaluate

```python
evaluate(
    evaluators: Union[list[Evaluator], Evaluator],
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[dict[str, Any]] = None,
    return_exceptions: bool = False,
) -> EvaluationContainer

```

Run multiple evaluators in parallel.

Source code in `src/patronus/pat_client/client_sync.py`

```python
def evaluate(
    self,
    evaluators: typing.Union[list[Evaluator], Evaluator],
    *,
    system_prompt: typing.Optional[str] = None,
    task_context: typing.Union[list[str], str, None] = None,
    task_input: typing.Optional[str] = None,
    task_output: typing.Optional[str] = None,
    gold_answer: typing.Optional[str] = None,
    task_metadata: typing.Optional[dict[str, typing.Any]] = None,
    return_exceptions: bool = False,
) -> EvaluationContainer:
    """
    Run multiple evaluators in parallel.
    """
    if not isinstance(evaluators, list):
        evaluators = [evaluators]
    evaluators = self._map_evaluators(evaluators)

    with bundled_eval():
        callables = [
            _into_thread_run_fn(
                ev.evaluate,
                system_prompt=system_prompt,
                task_context=task_context,
                task_input=task_input,
                task_output=task_output,
                gold_answer=gold_answer,
                task_metadata=task_metadata,
            )
            for ev in evaluators
        ]
        results = self._process_batch(callables, return_exceptions=return_exceptions)
        return EvaluationContainer(results)

```

#### evaluate_bg

```python
evaluate_bg(
    evaluators: list[StructuredEvaluator],
    *,
    system_prompt: Optional[str] = None,
    task_context: Union[list[str], str, None] = None,
    task_input: Optional[str] = None,
    task_output: Optional[str] = None,
    gold_answer: Optional[str] = None,
    task_metadata: Optional[dict[str, Any]] = None,
) -> TypedAsyncResult[EvaluationContainer]

```

Run multiple evaluators in parallel. The returned task will be a background task.

Source code in `src/patronus/pat_client/client_sync.py`

```python
def evaluate_bg(
    self,
    evaluators: list[StructuredEvaluator],
    *,
    system_prompt: typing.Optional[str] = None,
    task_context: typing.Union[list[str], str, None] = None,
    task_input: typing.Optional[str] = None,
    task_output: typing.Optional[str] = None,
    gold_answer: typing.Optional[str] = None,
    task_metadata: typing.Optional[dict[str, typing.Any]] = None,
) -> TypedAsyncResult[EvaluationContainer]:
    """
    Run multiple evaluators in parallel. The returned task will be a background task.
    """

    def _run():
        with bundled_eval():
            callables = [
                _into_thread_run_fn(
                    ev.evaluate,
                    system_prompt=system_prompt,
                    task_context=task_context,
                    task_input=task_input,
                    task_output=task_output,
                    gold_answer=gold_answer,
                    task_metadata=task_metadata,
                )
                for ev in evaluators
            ]
            results = self._process_batch(callables, return_exceptions=True)
            return EvaluationContainer(results)

    return typing.cast(
        TypedAsyncResult[EvaluationContainer], self._supervisor_pool.apply_async(_into_thread_run_fn(_run))
    )

```

#### close

```python
close()

```

Gracefully close the client. This will wait for all background tasks to finish.

Source code in `src/patronus/pat_client/client_sync.py`

```python
def close(self):
    """
    Gracefully close the client. This will wait for all background tasks to finish.
    """
    self._close()
    if self._at_exit_handler:
        atexit.unregister(self._at_exit_handler)

```

## container

### EvaluationContainer

```python
EvaluationContainer(
    results: list[Union[EvaluationResult, None, Exception]],
)

```

#### format

```python
format() -> str

```

Format the evaluation results into a readable summary.

Source code in `src/patronus/pat_client/container.py`

```python
def format(self) -> str:
    """
    Format the evaluation results into a readable summary.
    """
    buf = StringIO()

    total = len(self.results)
    exceptions_count = sum(1 for r in self.results if isinstance(r, Exception))
    successes_count = sum(1 for r in self.results if isinstance(r, EvaluationResult) and r.pass_ is True)
    failures_count = sum(1 for r in self.results if isinstance(r, EvaluationResult) and r.pass_ is False)

    buf.write(f"Total evaluations: {total}\n")
    buf.write(f"Successes: {successes_count}\n")
    buf.write(f"Failures: {failures_count}\n")
    buf.write(f"Exceptions: {exceptions_count}\n\n")
    buf.write("Evaluation Details:\n")
    buf.write("---\n")

    # Add detailed evaluation results
    for result in self.results:
        if result is None:
            buf.write("None\n")
        elif isinstance(result, Exception):
            buf.write(str(result))
            buf.write("\n")
        else:
            buf.write(result.format())
        buf.write("---\n")

    return buf.getvalue()

```

#### pretty_print

```python
pretty_print(file: Optional[IO] = None) -> None

```

Formats and prints the current object in a human-readable form.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `file` | `Optional[IO]` | | `None` |

Source code in `src/patronus/pat_client/container.py`

```python
def pretty_print(self, file: Optional[IO] = None) -> None:
    """
    Formats and prints the current object in a human-readable form.

    Args:
        file:
    """
    f = self.format()
    print(f, file=file)

```

#### has_exception

```python
has_exception() -> bool

```

Checks if the results contain any exception.

Source code in `src/patronus/pat_client/container.py`

```python
def has_exception(self) -> bool:
    """
    Checks if the results contain any exception.
    """
    return any(isinstance(r, Exception) for r in self.results)

```

#### raise_on_exception

```python
raise_on_exception() -> None

```

Checks the results for any exceptions and raises them accordingly.

Source code in `src/patronus/pat_client/container.py`

```python
def raise_on_exception(self) -> None:
    """
    Checks the results for any exceptions and raises them accordingly.
    """
    if not self.has_exception():
        return None
    exceptions = list(r for r in self.results if isinstance(r, Exception))
    if len(exceptions) == 1:
        raise exceptions[0]
    raise MultiException(exceptions)

```

#### all_succeeded

```python
all_succeeded(ignore_exceptions: bool = False) -> bool

```

Check if all evaluations that were actually evaluated passed.

Evaluations are only considered if they:

- Have a non-None pass\_ flag set
- Are not None (skipped)
- Are not exceptions (unless ignore_exceptions=True)

Note: Returns True if no evaluations met the above criteria (empty case).

Source code in `src/patronus/pat_client/container.py`

```python
def all_succeeded(self, ignore_exceptions: bool = False) -> bool:
    """
    Check if all evaluations that were actually evaluated passed.

    Evaluations are only considered if they:
    - Have a non-None pass_ flag set
    - Are not None (skipped)
    - Are not exceptions (unless ignore_exceptions=True)

    Note: Returns True if no evaluations met the above criteria (empty case).
    """
    for r in self.results:
        if isinstance(r, Exception) and not ignore_exceptions:
            self.raise_on_exception()
        if r is not None and r.pass_ is False:
            return False
    return True

```

#### any_failed

```python
any_failed(ignore_exceptions: bool = False) -> bool

```

Check if any evaluation that was actually evaluated failed.

Evaluations are only considered if they:

- Have a non-None pass\_ flag set
- Are not None (skipped)
- Are not exceptions (unless ignore_exceptions=True)

Note: Returns False if no evaluations met the above criteria (empty case).

Source code in `src/patronus/pat_client/container.py`

```python
def any_failed(self, ignore_exceptions: bool = False) -> bool:
    """
    Check if any evaluation that was actually evaluated failed.

    Evaluations are only considered if they:
    - Have a non-None pass_ flag set
    - Are not None (skipped)
    - Are not exceptions (unless ignore_exceptions=True)

    Note: Returns False if no evaluations met the above criteria (empty case).
    """
    for r in self.results:
        if isinstance(r, Exception) and not ignore_exceptions:
            self.raise_on_exception()
        if r is not None and r.pass_ is False:
            return True
    return False

```

#### failed_evaluations

```python
failed_evaluations() -> Generator[
    EvaluationResult, None, None
]

```

Generates all failed evaluations from the results.

Source code in `src/patronus/pat_client/container.py`

```python
def failed_evaluations(self) -> Generator[EvaluationResult, None, None]:
    """
    Generates all failed evaluations from the results.
    """
    return (r for r in self.results if not isinstance(r, (Exception, type(None))) and r.pass_ is False)

```

#### succeeded_evaluations

```python
succeeded_evaluations() -> Generator[
    EvaluationResult, None, None
]

```

Generates all successfully passed evaluations from the `results` attribute.

Source code in `src/patronus/pat_client/container.py`

```python
def succeeded_evaluations(self) -> Generator[EvaluationResult, None, None]:
    """
    Generates all successfully passed evaluations from the `results` attribute.
    """
    return (r for r in self.results if not isinstance(r, (Exception, type(None))) and r.pass_ is True)

```

# Prompts

## patronus.prompts

### clients

#### load_prompt

```python
load_prompt = get

```

Alias for PromptClient.get.

#### aload_prompt

```python
aload_prompt = get

```

Alias for AsyncPromptClient.get.

#### push_prompt

```python
push_prompt = push

```

Alias for PromptClient.push.

#### apush_prompt

```python
apush_prompt = push

```

Alias for AsyncPromptClient.push.

#### PromptNotFoundError

```python
PromptNotFoundError(
    name: str,
    project: Optional[str] = None,
    revision: Optional[int] = None,
    label: Optional[str] = None,
)

```

Bases: `Exception`

Raised when a prompt could not be found.

Source code in `src/patronus/prompts/clients.py`

```python
def __init__(
    self, name: str, project: Optional[str] = None, revision: Optional[int] = None, label: Optional[str] = None
):
    self.name = name
    self.project = project
    self.revision = revision
    self.label = label
    message = f"Prompt not found (name={name!r}, project={project!r}, revision={revision!r}, label={label!r})"
    super().__init__(message)

```

#### PromptProviderError

Bases: `Exception`

Base class for prompt provider errors.

#### PromptProviderConnectionError

Bases: `PromptProviderError`

Raised when there's a connectivity issue with the prompt provider.

#### PromptProviderAuthenticationError

Bases: `PromptProviderError`

Raised when there's an authentication issue with the prompt provider.

#### PromptProvider

Bases: `ABC`

##### get_prompt

```python
get_prompt(
    name: str,
    revision: Optional[int],
    label: Optional[str],
    project: str,
    engine: TemplateEngine,
) -> Optional[LoadedPrompt]

```

Get prompts, returns None if prompt was not found

Source code in `src/patronus/prompts/clients.py`

```python
@abc.abstractmethod
def get_prompt(
    self, name: str, revision: Optional[int], label: Optional[str], project: str, engine: TemplateEngine
) -> Optional[LoadedPrompt]:
    """Get prompts, returns None if prompt was not found"""

```

##### aget_prompt

```python
aget_prompt(
    name: str,
    revision: Optional[int],
    label: Optional[str],
    project: str,
    engine: TemplateEngine,
) -> Optional[LoadedPrompt]

```

Get prompts, returns None if prompt was not found

Source code in `src/patronus/prompts/clients.py`

```python
@abc.abstractmethod
async def aget_prompt(
    self, name: str, revision: Optional[int], label: Optional[str], project: str, engine: TemplateEngine
) -> Optional[LoadedPrompt]:
    """Get prompts, returns None if prompt was not found"""

```

#### PromptClientMixin

#### PromptClient

```python
PromptClient(
    provider_factory: Optional[ProviderFactory] = None,
)

```

Bases: `PromptClientMixin`

Source code in `src/patronus/prompts/clients.py`

```python
def __init__(self, provider_factory: Optional[ProviderFactory] = None) -> None:
    self._cache: PromptCache = PromptCache()
    self._provider_factory: ProviderFactory = provider_factory or {
        "local": lambda: LocalPromptProvider(),
        "api": lambda: APIPromptProvider(),
    }
    self._api_provider = APIPromptProvider()

```

##### get

```python
get(
    name: str,
    revision: Optional[int] = None,
    label: Optional[str] = None,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    disable_cache: bool = False,
    provider: Union[
        PromptProvider,
        _DefaultProviders,
        Sequence[Union[PromptProvider, _DefaultProviders]],
        Type[NOT_GIVEN],
    ] = NOT_GIVEN,
    engine: Union[
        TemplateEngine,
        DefaultTemplateEngines,
        Type[NOT_GIVEN],
    ] = NOT_GIVEN,
) -> LoadedPrompt

```

Get the prompt. If neither revision nor label is specified then the prompt with latest revision is returned.

Project is loaded from the config by default. You can specify the project name of the prompt if you want to override the value from the config.

By default, once a prompt is retrieved it's cached. You can disable caching.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the prompt to retrieve. | *required* | | `revision` | `Optional[int]` | Optional specific revision number to retrieve. If not specified, the latest revision is used. | `None` | | `label` | `Optional[str]` | Optional label to filter by. If specified, only prompts with this label will be returned. | `None` | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `disable_cache` | `bool` | If True, bypasses the cache for both reading and writing. | `False` | | `provider` | `Union[PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN]]` | The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'), a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2') or a TemplateEngine instance. If not specified, defaults to config setting. | `NOT_GIVEN` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The retrieved prompt object. |

Raises:

| Type | Description | | --- | --- | | `PromptNotFoundError` | If the prompt could not be found with the specified parameters. | | `ValueError` | If the provided provider or engine is invalid. | | `PromptProviderError` | If there was an error communicating with the prompt provider. |

Source code in `src/patronus/prompts/clients.py`

```python
def get(
    self,
    name: str,
    revision: Optional[int] = None,
    label: Optional[str] = None,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    disable_cache: bool = False,
    provider: Union[
        PromptProvider,
        _DefaultProviders,
        Sequence[Union[PromptProvider, _DefaultProviders]],
        Type[NOT_GIVEN],
    ] = NOT_GIVEN,
    engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN,
) -> LoadedPrompt:
    """
    Get the prompt.
    If neither revision nor label is specified then the prompt with latest revision is returned.

    Project is loaded from the config by default.
    You can specify the project name of the prompt if you want to override the value from the config.

    By default, once a prompt is retrieved it's cached. You can disable caching.

    Args:
        name: The name of the prompt to retrieve.
        revision: Optional specific revision number to retrieve. If not specified, the latest revision is used.
        label: Optional label to filter by. If specified, only prompts with this label will be returned.
        project: Optional project name override. If not specified, the project name from config is used.
        disable_cache: If True, bypasses the cache for both reading and writing.
        provider: The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'),
                 a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting.
        engine: The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2')
               or a TemplateEngine instance. If not specified, defaults to config setting.

    Returns:
        LoadedPrompt: The retrieved prompt object.

    Raises:
        PromptNotFoundError: If the prompt could not be found with the specified parameters.
        ValueError: If the provided provider or engine is invalid.
        PromptProviderError: If there was an error communicating with the prompt provider.
    """
    project_name: str = self._resolve_project(project)
    resolved_providers: list[PromptProvider] = self._resolve_providers(provider, self._provider_factory)
    resolved_engine: TemplateEngine = self._resolve_engine(engine)

    cache_key: _CacheKey = _CacheKey(project_name=project_name, prompt_name=name, revision=revision, label=label)
    if not disable_cache:
        cached_prompt: Optional[LoadedPrompt] = self._cache.get(cache_key)
        if cached_prompt is not None:
            return cached_prompt

    prompt: Optional[LoadedPrompt] = None
    provider_errors: list[str] = []

    for i, prompt_provider in enumerate(resolved_providers):
        log.debug("Trying prompt provider %d (%s)", i + 1, prompt_provider.__class__.__name__)
        try:
            prompt = prompt_provider.get_prompt(name, revision, label, project_name, engine=resolved_engine)
            if prompt is not None:
                log.debug("Prompt found using provider %s", prompt_provider.__class__.__name__)
                break
        except PromptProviderConnectionError as e:
            provider_errors.append(str(e))
            continue
        except PromptProviderAuthenticationError as e:
            provider_errors.append(str(e))
            continue
        except Exception as e:
            provider_errors.append(f"Unexpected error from provider {prompt_provider.__class__.__name__}: {str(e)}")
            continue

    if prompt is None:
        if provider_errors:
            error_msg: str = self._format_provider_errors(provider_errors)
            raise PromptNotFoundError(
                name=name, project=project_name, revision=revision, label=label
            ) from Exception(error_msg)
        else:
            raise PromptNotFoundError(name=name, project=project_name, revision=revision, label=label)

    if not disable_cache:
        self._cache.put(cache_key, prompt)

    return prompt

```

##### push

```python
push(
    prompt: Prompt,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    engine: Union[
        TemplateEngine,
        DefaultTemplateEngines,
        Type[NOT_GIVEN],
    ] = NOT_GIVEN,
) -> LoadedPrompt

```

Push a prompt to the API, creating a new revision only if needed.

If a prompt revision with the same normalized body and metadata already exists, the existing revision will be returned. If the metadata differs, a new revision will be created.

The engine parameter is only used to set property on output LoadedPrompt object. It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform.

Note that when a new prompt definition is created, the description is used as provided. However, when creating a new revision for an existing prompt definition, the description parameter doesn't update the existing prompt definition's description.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `prompt` | `Prompt` | The prompt to push | *required* | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering the returned prompt. If not specified, defaults to config setting. | `NOT_GIVEN` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The created or existing prompt revision |

Raises:

| Type | Description | | --- | --- | | `PromptProviderError` | If there was an error communicating with the prompt provider. |

Source code in `src/patronus/prompts/clients.py`

```python
def push(
    self,
    prompt: Prompt,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN,
) -> LoadedPrompt:
    """
    Push a prompt to the API, creating a new revision only if needed.

    If a prompt revision with the same normalized body and metadata already exists,
    the existing revision will be returned. If the metadata differs, a new revision will be created.

    The engine parameter is only used to set property on output LoadedPrompt object.
    It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform.

    Note that when a new prompt definition is created, the description is used as provided.
    However, when creating a new revision for an existing prompt definition, the
    description parameter doesn't update the existing prompt definition's description.

    Args:
        prompt: The prompt to push
        project: Optional project name override. If not specified, the project name from config is used.
        engine: The template engine to use for rendering the returned prompt. If not specified, defaults to config setting.

    Returns:
        LoadedPrompt: The created or existing prompt revision

    Raises:
        PromptProviderError: If there was an error communicating with the prompt provider.
    """
    project_name: str = self._resolve_project(project)
    resolved_engine: TemplateEngine = self._resolve_engine(engine)

    normalized_body_sha256 = calculate_normalized_body_hash(prompt.body)

    cli = context.get_api_client().prompts
    # Try to find existing revision with same hash
    resp = cli.list_revisions(
        prompt_name=prompt.name,
        project_name=project_name,
        normalized_body_sha256=normalized_body_sha256,
    )

    # Variables for create_revision parameters
    prompt_id = patronus_api.NOT_GIVEN
    prompt_name = prompt.name
    create_new_prompt = True
    prompt_def = None

    # If we found a matching revision, check if metadata is the same
    if resp.prompt_revisions:
        log.debug("Found %d revisions with matching body hash", len(resp.prompt_revisions))
        prompt_id = resp.prompt_revisions[0].prompt_definition_id
        create_new_prompt = False

        resp_pd = cli.list_definitions(prompt_id=prompt_id, limit=1)
        if not resp_pd.prompt_definitions:
            raise PromptProviderError(
                "Prompt revision has been found but prompt definition was not found. This should not happen"
            )
        prompt_def = resp_pd.prompt_definitions[0]

        # Check if the provided description is different from existing one and warn if so
        if prompt.description is not None and prompt.description != prompt_def.description:
            warnings.warn(
                f"Prompt description ({prompt.description!r}) differs from the existing one "
                f"({prompt_def.description!r}). The description won't be updated."
            )

        new_metadata_cmp = json.dumps(prompt.metadata, sort_keys=True)
        for rev in resp.prompt_revisions:
            metadata_cmp = json.dumps(rev.metadata, sort_keys=True)
            if new_metadata_cmp == metadata_cmp:
                log.debug("Found existing revision with matching metadata, returning revision %d", rev.revision)
                return self._api_provider._create_loaded_prompt(
                    prompt_revision=rev,
                    prompt_def=prompt_def,
                    engine=resolved_engine,
                )

        # For existing prompt, don't need name/project
        prompt_name = patronus_api.NOT_GIVEN
        project_name = patronus_api.NOT_GIVEN
    else:
        # No matching revisions found, will create new prompt
        log.debug("No revisions with matching body hash found, creating new prompt and revision")

    # Create a new revision with appropriate parameters
    log.debug(
        "Creating new revision (new_prompt=%s, prompt_id=%s, prompt_name=%s)",
        create_new_prompt,
        prompt_id if prompt_id != patronus_api.NOT_GIVEN else "NOT_GIVEN",
        prompt_name if prompt_name != patronus_api.NOT_GIVEN else "NOT_GIVEN",
    )
    resp = cli.create_revision(
        body=prompt.body,
        prompt_id=prompt_id,
        prompt_name=prompt_name,
        project_name=project_name if create_new_prompt else patronus_api.NOT_GIVEN,
        prompt_description=prompt.description,
        metadata=prompt.metadata,
    )

    prompt_revision = resp.prompt_revision

    # If we created a new prompt, we need to fetch the definition
    if create_new_prompt:
        resp_pd = cli.list_definitions(prompt_id=prompt_revision.prompt_definition_id, limit=1)
        if not resp_pd.prompt_definitions:
            raise PromptProviderError(
                "Prompt revision has been created but prompt definition was not found. This should not happen"
            )
        prompt_def = resp_pd.prompt_definitions[0]

    return self._api_provider._create_loaded_prompt(prompt_revision, prompt_def, resolved_engine)

```

#### AsyncPromptClient

```python
AsyncPromptClient(
    provider_factory: Optional[ProviderFactory] = None,
)

```

Bases: `PromptClientMixin`

Source code in `src/patronus/prompts/clients.py`

```python
def __init__(self, provider_factory: Optional[ProviderFactory] = None) -> None:
    self._cache: AsyncPromptCache = AsyncPromptCache()
    self._provider_factory: ProviderFactory = provider_factory or {
        "local": lambda: LocalPromptProvider(),
        "api": lambda: APIPromptProvider(),
    }
    self._api_provider = APIPromptProvider()

```

##### get

```python
get(
    name: str,
    revision: Optional[int] = None,
    label: Optional[str] = None,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    disable_cache: bool = False,
    provider: Union[
        PromptProvider,
        _DefaultProviders,
        Sequence[Union[PromptProvider, _DefaultProviders]],
        Type[NOT_GIVEN],
    ] = NOT_GIVEN,
    engine: Union[
        TemplateEngine,
        DefaultTemplateEngines,
        Type[NOT_GIVEN],
    ] = NOT_GIVEN,
) -> LoadedPrompt

```

Get the prompt asynchronously. If neither revision nor label is specified then the prompt with latest revision is returned.

Project is loaded from the config by default. You can specify the project name of the prompt if you want to override the value from the config.

By default, once a prompt is retrieved it's cached. You can disable caching.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the prompt to retrieve. | *required* | | `revision` | `Optional[int]` | Optional specific revision number to retrieve. If not specified, the latest revision is used. | `None` | | `label` | `Optional[str]` | Optional label to filter by. If specified, only prompts with this label will be returned. | `None` | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `disable_cache` | `bool` | If True, bypasses the cache for both reading and writing. | `False` | | `provider` | `Union[PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN]]` | The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'), a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2') or a TemplateEngine instance. If not specified, defaults to config setting. | `NOT_GIVEN` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The retrieved prompt object. |

Raises:

| Type | Description | | --- | --- | | `PromptNotFoundError` | If the prompt could not be found with the specified parameters. | | `ValueError` | If the provided provider or engine is invalid. | | `PromptProviderError` | If there was an error communicating with the prompt provider. |

Source code in `src/patronus/prompts/clients.py`

```python
async def get(
    self,
    name: str,
    revision: Optional[int] = None,
    label: Optional[str] = None,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    disable_cache: bool = False,
    provider: Union[
        PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN]
    ] = NOT_GIVEN,
    engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN,
) -> LoadedPrompt:
    """
    Get the prompt asynchronously.
    If neither revision nor label is specified then the prompt with latest revision is returned.

    Project is loaded from the config by default.
    You can specify the project name of the prompt if you want to override the value from the config.

    By default, once a prompt is retrieved it's cached. You can disable caching.

    Args:
        name: The name of the prompt to retrieve.
        revision: Optional specific revision number to retrieve. If not specified, the latest revision is used.
        label: Optional label to filter by. If specified, only prompts with this label will be returned.
        project: Optional project name override. If not specified, the project name from config is used.
        disable_cache: If True, bypasses the cache for both reading and writing.
        provider: The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'),
                 a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting.
        engine: The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2')
               or a TemplateEngine instance. If not specified, defaults to config setting.

    Returns:
        LoadedPrompt: The retrieved prompt object.

    Raises:
        PromptNotFoundError: If the prompt could not be found with the specified parameters.
        ValueError: If the provided provider or engine is invalid.
        PromptProviderError: If there was an error communicating with the prompt provider.
    """
    project_name: str = self._resolve_project(project)
    resolved_providers: list[PromptProvider] = self._resolve_providers(provider, self._provider_factory)
    resolved_engine: TemplateEngine = self._resolve_engine(engine)

    cache_key: _CacheKey = _CacheKey(project_name=project_name, prompt_name=name, revision=revision, label=label)
    if not disable_cache:
        cached_prompt: Optional[LoadedPrompt] = await self._cache.get(cache_key)
        if cached_prompt is not None:
            return cached_prompt

    prompt: Optional[LoadedPrompt] = None
    provider_errors: list[str] = []

    for i, prompt_provider in enumerate(resolved_providers):
        log.debug("Trying prompt provider %d (%s) async", i + 1, prompt_provider.__class__.__name__)
        try:
            prompt = await prompt_provider.aget_prompt(name, revision, label, project_name, engine=resolved_engine)
            if prompt is not None:
                log.debug("Prompt found using async provider %s", prompt_provider.__class__.__name__)
                break
        except PromptProviderConnectionError as e:
            provider_errors.append(str(e))
            continue
        except PromptProviderAuthenticationError as e:
            provider_errors.append(str(e))
            continue
        except Exception as e:
            provider_errors.append(f"Unexpected error from provider {prompt_provider.__class__.__name__}: {str(e)}")
            continue

    if prompt is None:
        if provider_errors:
            error_msg: str = self._format_provider_errors(provider_errors)
            raise PromptNotFoundError(
                name=name, project=project_name, revision=revision, label=label
            ) from Exception(error_msg)
        else:
            raise PromptNotFoundError(name=name, project=project_name, revision=revision, label=label)

    if not disable_cache:
        await self._cache.put(cache_key, prompt)

    return prompt

```

##### push

```python
push(
    prompt: Prompt,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    engine: Union[
        TemplateEngine,
        DefaultTemplateEngines,
        Type[NOT_GIVEN],
    ] = NOT_GIVEN,
) -> LoadedPrompt

```

Push a prompt to the API asynchronously, creating a new revision only if needed.

If a prompt revision with the same normalized body and metadata already exists, the existing revision will be returned. If the metadata differs, a new revision will be created.

The engine parameter is only used to set property on output LoadedPrompt object. It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform.

Note that when a new prompt definition is created, the description is used as provided. However, when creating a new revision for an existing prompt definition, the description parameter doesn't update the existing prompt definition's description.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `prompt` | `Prompt` | The prompt to push | *required* | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering the returned prompt. If not specified, defaults to config setting. | `NOT_GIVEN` |

Returns:

| Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The created or existing prompt revision |

Raises:

| Type | Description | | --- | --- | | `PromptProviderError` | If there was an error communicating with the prompt provider. |

Source code in `src/patronus/prompts/clients.py`

```python
async def push(
    self,
    prompt: Prompt,
    project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN,
    engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN,
) -> LoadedPrompt:
    """
    Push a prompt to the API asynchronously, creating a new revision only if needed.

    If a prompt revision with the same normalized body and metadata already exists,
    the existing revision will be returned. If the metadata differs, a new revision will be created.

    The engine parameter is only used to set property on output LoadedPrompt object.
    It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform.

    Note that when a new prompt definition is created, the description is used as provided.
    However, when creating a new revision for an existing prompt definition, the
    description parameter doesn't update the existing prompt definition's description.

    Args:
        prompt: The prompt to push
        project: Optional project name override. If not specified, the project name from config is used.
        engine: The template engine to use for rendering the returned prompt. If not specified, defaults to config setting.

    Returns:
        LoadedPrompt: The created or existing prompt revision

    Raises:
        PromptProviderError: If there was an error communicating with the prompt provider.
    """
    project_name: str = self._resolve_project(project)
    resolved_engine: TemplateEngine = self._resolve_engine(engine)

    normalized_body_sha256 = calculate_normalized_body_hash(prompt.body)

    cli = context.get_async_api_client().prompts
    # Try to find existing revision with same hash
    resp = await cli.list_revisions(
        prompt_name=prompt.name,
        project_name=project_name,
        normalized_body_sha256=normalized_body_sha256,
    )

    # Variables for create_revision parameters
    prompt_id = patronus_api.NOT_GIVEN
    prompt_name = prompt.name
    create_new_prompt = True
    prompt_def = None

    # If we found a matching revision, check if metadata is the same
    if resp.prompt_revisions:
        log.debug("Found %d revisions with matching body hash", len(resp.prompt_revisions))
        prompt_id = resp.prompt_revisions[0].prompt_definition_id
        create_new_prompt = False

        resp_pd = await cli.list_definitions(prompt_id=prompt_id, limit=1)
        if not resp_pd.prompt_definitions:
            raise PromptProviderError(
                "Prompt revision has been found but prompt definition was not found. This should not happen"
            )
        prompt_def = resp_pd.prompt_definitions[0]

        # Check if the provided description is different from existing one and warn if so
        if prompt.description is not None and prompt.description != prompt_def.description:
            warnings.warn(
                f"Prompt description ({prompt.description!r}) differs from the existing one "
                f"({prompt_def.description!r}). The description won't be updated."
            )

        new_metadata_cmp = json.dumps(prompt.metadata, sort_keys=True)
        for rev in resp.prompt_revisions:
            metadata_cmp = json.dumps(rev.metadata, sort_keys=True)
            if new_metadata_cmp == metadata_cmp:
                log.debug("Found existing revision with matching metadata, returning revision %d", rev.revision)
                return self._api_provider._create_loaded_prompt(
                    prompt_revision=rev,
                    prompt_def=prompt_def,
                    engine=resolved_engine,
                )

        # For existing prompt, don't need name/project
        prompt_name = patronus_api.NOT_GIVEN
        project_name = patronus_api.NOT_GIVEN
    else:
        # No matching revisions found, will create new prompt
        log.debug("No revisions with matching body hash found, creating new prompt and revision")

    # Create a new revision with appropriate parameters
    log.debug(
        "Creating new revision (new_prompt=%s, prompt_id=%s, prompt_name=%s)",
        create_new_prompt,
        prompt_id if prompt_id != patronus_api.NOT_GIVEN else "NOT_GIVEN",
        prompt_name if prompt_name != patronus_api.NOT_GIVEN else "NOT_GIVEN",
    )
    resp = await cli.create_revision(
        body=prompt.body,
        prompt_id=prompt_id,
        prompt_name=prompt_name,
        project_name=project_name if create_new_prompt else patronus_api.NOT_GIVEN,
        prompt_description=prompt.description,
        metadata=prompt.metadata,
    )

    prompt_revision = resp.prompt_revision

    # If we created a new prompt, we need to fetch the definition
    if create_new_prompt:
        resp_pd = await cli.list_definitions(prompt_id=prompt_revision.prompt_definition_id, limit=1)
        if not resp_pd.prompt_definitions:
            raise PromptProviderError(
                "Prompt revision has been created but prompt definition was not found. This should not happen"
            )
        prompt_def = resp_pd.prompt_definitions[0]

    return self._api_provider._create_loaded_prompt(prompt_revision, prompt_def, resolved_engine)

```

### models

#### BasePrompt

##### with_engine

```python
with_engine(
    engine: Union[TemplateEngine, DefaultTemplateEngines],
) -> typing.Self

```

Create a new prompt with the specified template engine.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines]` | Either a TemplateEngine instance or a string identifier ('f-string', 'mustache', 'jinja2') | *required* |

Returns:

| Type | Description | | --- | --- | | `Self` | A new prompt instance with the specified engine |

Source code in `src/patronus/prompts/models.py`

```python
def with_engine(self, engine: Union[TemplateEngine, DefaultTemplateEngines]) -> typing.Self:
    """
    Create a new prompt with the specified template engine.

    Args:
        engine: Either a TemplateEngine instance or a string identifier ('f-string', 'mustache', 'jinja2')

    Returns:
        A new prompt instance with the specified engine
    """
    resolved_engine = get_template_engine(engine)
    return dataclasses.replace(self, _engine=resolved_engine)

```

##### render

```python
render(**kwargs: Any) -> str

```

Render the prompt template with the provided arguments.

If no engine is set on the prompt, the default engine from context/config will be used. If no arguments are provided, the template body is returned as-is.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `**kwargs` | `Any` | Template arguments to be rendered in the prompt body | `{}` |

Returns:

| Type | Description | | --- | --- | | `str` | The rendered prompt |

Source code in `src/patronus/prompts/models.py`

```python
def render(self, **kwargs: Any) -> str:
    """
    Render the prompt template with the provided arguments.

    If no engine is set on the prompt, the default engine from context/config will be used.
    If no arguments are provided, the template body is returned as-is.

    Args:
        **kwargs: Template arguments to be rendered in the prompt body

    Returns:
        The rendered prompt
    """
    if not kwargs:
        return self.body

    engine = self._engine
    if engine is None:
        # Get default engine from context
        engine_name = context.get_prompts_config().templating_engine
        engine = get_template_engine(engine_name)

    return engine.render(self.body, **kwargs)

```

#### calculate_normalized_body_hash

```python
calculate_normalized_body_hash(body: str) -> str

```

Calculate the SHA-256 hash of normalized prompt body.

Normalization is done by stripping whitespace from the start and end of the body.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `body` | `str` | The prompt body | *required* |

Returns:

| Type | Description | | --- | --- | | `str` | SHA-256 hash of the normalized body |

Source code in `src/patronus/prompts/models.py`

```python
def calculate_normalized_body_hash(body: str) -> str:
    """Calculate the SHA-256 hash of normalized prompt body.

    Normalization is done by stripping whitespace from the start and end of the body.

    Args:
        body: The prompt body

    Returns:
        SHA-256 hash of the normalized body
    """
    normalized_body = body.strip()
    return hashlib.sha256(normalized_body.encode()).hexdigest()

```

### templating

#### TemplateEngine

Bases: `ABC`

##### render

```python
render(template: str, **kwargs) -> str

```

Render the template with the given arguments.

Source code in `src/patronus/prompts/templating.py`

```python
@abc.abstractmethod
def render(self, template: str, **kwargs) -> str:
    """Render the template with the given arguments."""

```

#### get_template_engine

```python
get_template_engine(
    engine: Union[TemplateEngine, DefaultTemplateEngines],
) -> TemplateEngine

```

Convert a template engine name to an actual engine instance.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines]` | Either a template engine instance or a string identifier ('f-string', 'mustache', 'jinja2') | *required* |

Returns:

| Type | Description | | --- | --- | | `TemplateEngine` | A template engine instance |

Raises:

| Type | Description | | --- | --- | | `ValueError` | If the provided engine string is not recognized |

Source code in `src/patronus/prompts/templating.py`

```python
def get_template_engine(engine: Union[TemplateEngine, DefaultTemplateEngines]) -> TemplateEngine:
    """
    Convert a template engine name to an actual engine instance.

    Args:
        engine: Either a template engine instance or a string identifier ('f-string', 'mustache', 'jinja2')

    Returns:
        A template engine instance

    Raises:
        ValueError: If the provided engine string is not recognized
    """
    if isinstance(engine, TemplateEngine):
        return engine

    if engine == "f-string":
        return FStringTemplateEngine()
    elif engine == "mustache":
        return MustacheTemplateEngine()
    elif engine == "jinja2":
        return Jinja2TemplateEngine()

    raise ValueError(
        "Provided engine must be an instance of TemplateEngine or "
        "one of the default engines ('f-string', 'mustache', 'jinja2'). "
        f"Instead got {engine!r}"
    )

```

# Tracing

## patronus.tracing

### decorators

#### start_span

```python
start_span(
    name: str,
    *,
    record_exception: bool = True,
    attributes: Optional[Attributes] = None,
) -> Iterator[Optional[typing.Any]]

```

Context manager for creating and managing a trace span.

This function is used to create a span within the current context using the tracer, allowing you to track execution timing or events within a specific block of code. The context is set by `patronus.init()` function. If SDK was not initialized, yielded value will be None.

Example:

```python
import patronus

patronus.init()

# Use context manager for finer-grained tracing
def complex_operation():
    with patronus.start_span("Data preparation"):
        # Prepare data
        pass

```

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the span. | *required* | | `record_exception` | `bool` | Whether to record exceptions that occur within the span. Default is True. | `True` | | `attributes` | `Optional[Attributes]` | Attributes to associate with the span, providing additional metadata. | `None` |

Source code in `src/patronus/tracing/decorators.py`

````python
@contextlib.contextmanager
def start_span(
    name: str, *, record_exception: bool = True, attributes: Optional[Attributes] = None
) -> Iterator[Optional[typing.Any]]:
    """
    Context manager for creating and managing a trace span.

    This function is used to create a span within the current context using the tracer,
    allowing you to track execution timing or events within a specific block of code.
    The context is set by `patronus.init()` function. If SDK was not initialized, yielded value will be None.

    Example:

    ```python
    import patronus

    patronus.init()

    # Use context manager for finer-grained tracing
    def complex_operation():
        with patronus.start_span("Data preparation"):
            # Prepare data
            pass
    ```


    Args:
        name (str): The name of the span.
        record_exception (bool): Whether to record exceptions that occur within the span. Default is True.
        attributes (Optional[Attributes]): Attributes to associate with the span, providing additional metadata.
    """
    tracer = context.get_tracer_or_none()
    if tracer is None:
        yield
        return
    with tracer.start_as_current_span(
        name,
        record_exception=record_exception,
        attributes=attributes,
    ) as span:
        yield span

````

#### traced

```python
traced(
    span_name: Optional[str] = None,
    *,
    log_args: bool = True,
    log_results: bool = True,
    log_exceptions: bool = True,
    disable_log: bool = False,
    attributes: Attributes = None,
    **kwargs: Any,
)

```

A decorator to trace function execution by recording a span for the traced function.

Example:

```python
import patronus

patronus.init()

# Trace a function with the @traced decorator
@patronus.traced()
def process_input(user_query):
    # Process the input

```

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `span_name` | `Optional[str]` | The name of the traced span. Defaults to the function name if not provided. | `None` | | `log_args` | `bool` | Whether to log the arguments passed to the function. Default is True. | `True` | | `log_results` | `bool` | Whether to log the function's return value. Default is True. | `True` | | `log_exceptions` | `bool` | Whether to log any exceptions raised while executing the function. Default is True. | `True` | | `disable_log` | `bool` | Whether to disable logging the trace information. Default is False. | `False` | | `attributes` | `Attributes` | Attributes to attach to the traced span. Default is None. | `None` | | `**kwargs` | `Any` | Additional arguments for the decorator. | `{}` |

Source code in `src/patronus/tracing/decorators.py`

````python
def traced(
    # Give name for the traced span. Defaults to a function name if not provided.
    span_name: Optional[str] = None,
    *,
    # Whether to log function arguments.
    log_args: bool = True,
    # Whether to log function output.
    log_results: bool = True,
    # Whether to log an exception if one was raised.
    log_exceptions: bool = True,
    # Whether to prevent a log message to be created.
    disable_log: bool = False,
    attributes: Attributes = None,
    **kwargs: typing.Any,
):
    """
    A decorator to trace function execution by recording a span for the traced function.

    Example:

    ```python
    import patronus

    patronus.init()

    # Trace a function with the @traced decorator
    @patronus.traced()
    def process_input(user_query):
        # Process the input
    ```

    Args:
        span_name (Optional[str]): The name of the traced span. Defaults to the function name if not provided.
        log_args (bool): Whether to log the arguments passed to the function. Default is True.
        log_results (bool): Whether to log the function's return value. Default is True.
        log_exceptions (bool): Whether to log any exceptions raised while executing the function. Default is True.
        disable_log (bool): Whether to disable logging the trace information. Default is False.
        attributes (Attributes): Attributes to attach to the traced span. Default is None.
        **kwargs: Additional arguments for the decorator.
    """

    def decorator(func):
        name = span_name or func.__qualname__
        sig = inspect.signature(func)
        record_exception = not disable_log and log_exceptions

        def log_call(fn_args: typing.Any, fn_kwargs: typing.Any, ret: typing.Any, exc: Exception):
            if disable_log:
                return

            logger = context.get_pat_logger()
            severity = SeverityNumber.INFO
            body = {"function.name": name}
            if log_args:
                bound_args = sig.bind(*fn_args, **fn_kwargs)
                body["function.arguments"] = {**bound_args.arguments, **bound_args.arguments}
            if log_results is not None and exc is None:
                body["function.output"] = ret
            if log_exceptions and exc is not None:
                module = type(exc).__module__
                qualname = type(exc).__qualname__
                exception_type = f"{module}.{qualname}" if module and module != "builtins" else qualname
                body["exception.type"] = exception_type
                body["exception.message"] = str(exc)
                severity = SeverityNumber.ERROR
            logger.log(body, log_type=LogTypes.trace, severity=severity)

        @functools.wraps(func)
        def wrapper_sync(*f_args, **f_kwargs):
            tracer = context.get_tracer_or_none()
            if tracer is None:
                return func(*f_args, **f_kwargs)

            exc = None
            ret = None
            with tracer.start_as_current_span(name, record_exception=record_exception, attributes=attributes):
                try:
                    ret = func(*f_args, **f_kwargs)
                except Exception as e:
                    exc = e
                    raise exc
                finally:
                    log_call(f_args, f_kwargs, ret, exc)

                return ret

        @functools.wraps(func)
        async def wrapper_async(*f_args, **f_kwargs):
            tracer = context.get_tracer_or_none()
            if tracer is None:
                return await func(*f_args, **f_kwargs)

            exc = None
            ret = None
            with tracer.start_as_current_span(name, record_exception=record_exception, attributes=attributes):
                try:
                    ret = await func(*f_args, **f_kwargs)
                except Exception as e:
                    exc = e
                    raise exc
                finally:
                    log_call(f_args, f_kwargs, ret, exc)

                return ret

        if inspect.iscoroutinefunction(func):
            wrapper_async._pat_traced = True
            return wrapper_async
        else:
            wrapper_async._pat_traced = True
            return wrapper_sync

    return decorator

````

### exporters

This module provides exporter selection functionality for OpenTelemetry traces and logs. It handles protocol resolution based on Patronus configuration and standard OTEL environment variables.

#### create_trace_exporter

```python
create_trace_exporter(
    endpoint: str,
    api_key: str,
    protocol: Optional[str] = None,
) -> SpanExporter

```

Create a configured trace exporter instance.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `endpoint` | `str` | The OTLP endpoint URL | *required* | | `api_key` | `str` | Authentication key for Patronus services | *required* | | `protocol` | `Optional[str]` | OTLP protocol override from Patronus configuration | `None` |

Returns:

| Type | Description | | --- | --- | | `SpanExporter` | Configured trace exporter instance |

Source code in `src/patronus/tracing/exporters.py`

```python
def create_trace_exporter(endpoint: str, api_key: str, protocol: Optional[str] = None) -> SpanExporter:
    """
    Create a configured trace exporter instance.

    Args:
        endpoint: The OTLP endpoint URL
        api_key: Authentication key for Patronus services
        protocol: OTLP protocol override from Patronus configuration

    Returns:
        Configured trace exporter instance
    """
    resolved_protocol = _resolve_otlp_protocol(protocol)

    if resolved_protocol == "http/protobuf":
        # For HTTP exporter, ensure endpoint has the correct path
        if not endpoint.endswith("/v1/traces"):
            endpoint = endpoint.rstrip("/") + "/v1/traces"
        return OTLPSpanExporterHTTP(endpoint=endpoint, headers={"x-api-key": api_key})
    else:
        # For gRPC exporter, determine if connection should be insecure based on URL scheme
        is_insecure = endpoint.startswith("http://")
        return OTLPSpanExporterGRPC(endpoint=endpoint, headers={"x-api-key": api_key}, insecure=is_insecure)

```

#### create_log_exporter

```python
create_log_exporter(
    endpoint: str,
    api_key: str,
    protocol: Optional[str] = None,
) -> LogExporter

```

Create a configured log exporter instance.

Parameters:

| Name | Type | Description | Default | | --- | --- | --- | --- | | `endpoint` | `str` | The OTLP endpoint URL | *required* | | `api_key` | `str` | Authentication key for Patronus services | *required* | | `protocol` | `Optional[str]` | OTLP protocol override from Patronus configuration | `None` |

Returns:

| Type | Description | | --- | --- | | `LogExporter` | Configured log exporter instance |

Source code in `src/patronus/tracing/exporters.py`

```python
def create_log_exporter(endpoint: str, api_key: str, protocol: Optional[str] = None) -> LogExporter:
    """
    Create a configured log exporter instance.

    Args:
        endpoint: The OTLP endpoint URL
        api_key: Authentication key for Patronus services
        protocol: OTLP protocol override from Patronus configuration

    Returns:
        Configured log exporter instance
    """
    resolved_protocol = _resolve_otlp_protocol(protocol)

    if resolved_protocol == "http/protobuf":
        # For HTTP exporter, ensure endpoint has the correct path
        if not endpoint.endswith("/v1/logs"):
            endpoint = endpoint.rstrip("/") + "/v1/logs"
        return OTLPLogExporterHTTP(endpoint=endpoint, headers={"x-api-key": api_key})
    else:
        # For gRPC exporter, determine if connection should be insecure based on URL scheme
        is_insecure = endpoint.startswith("http://")
        return OTLPLogExporterGRPC(endpoint=endpoint, headers={"x-api-key": api_key}, insecure=is_insecure)

```

### tracer

This module provides the implementation for tracing support using the OpenTelemetry SDK.

#### PatronusAttributesSpanProcessor

```python
PatronusAttributesSpanProcessor(
    project_name: str,
    app: Optional[str] = None,
    experiment_id: Optional[str] = None,
)

```

Bases: `SpanProcessor`

Processor that adds Patronus-specific attributes to all spans.

This processor ensures that each span includes the mandatory attributes: `project_name`, and optionally adds `app` or `experiment_id` attributes if they are provided during initialization.

Source code in `src/patronus/tracing/tracer.py`

```python
def __init__(self, project_name: str, app: Optional[str] = None, experiment_id: Optional[str] = None):
    self.project_name = project_name
    self.experiment_id = None
    self.app = None

    if experiment_id is not None:
        self.experiment_id = experiment_id
    else:
        self.app = app

```

#### create_tracer_provider

```python
create_tracer_provider(
    exporter_endpoint: str,
    api_key: str,
    scope: PatronusScope,
    protocol: Optional[str] = None,
) -> TracerProvider

```

Creates and returns a cached TracerProvider configured with the specified exporter.

The function utilizes an OpenTelemetry BatchSpanProcessor and an OTLPSpanExporter to initialize the tracer. The configuration is cached for reuse.

Source code in `src/patronus/tracing/tracer.py`

```python
@functools.lru_cache()
def create_tracer_provider(
    exporter_endpoint: str,
    api_key: str,
    scope: context.PatronusScope,
    protocol: Optional[str] = None,
) -> TracerProvider:
    """
    Creates and returns a cached TracerProvider configured with the specified exporter.

    The function utilizes an OpenTelemetry BatchSpanProcessor and an
    OTLPSpanExporter to initialize the tracer. The configuration is cached for reuse.
    """
    resource = None
    if scope.service is not None:
        resource = Resource.create({"service.name": scope.service})
    provider = TracerProvider(resource=resource)
    provider.add_span_processor(
        PatronusAttributesSpanProcessor(
            project_name=scope.project_name,
            app=scope.app,
            experiment_id=scope.experiment_id,
        )
    )
    provider.add_span_processor(
        BatchSpanProcessor(_create_exporter(endpoint=exporter_endpoint, api_key=api_key, protocol=protocol))
    )
    return provider

```

#### create_tracer

```python
create_tracer(
    scope: PatronusScope,
    exporter_endpoint: str,
    api_key: str,
    protocol: Optional[str] = None,
) -> trace.Tracer

```

Creates an OpenTelemetry (OTeL) tracer tied to the specified scope.

Source code in `src/patronus/tracing/tracer.py`

```python
def create_tracer(
    scope: context.PatronusScope,
    exporter_endpoint: str,
    api_key: str,
    protocol: Optional[str] = None,
) -> trace.Tracer:
    """
    Creates an OpenTelemetry (OTeL) tracer tied to the specified scope.
    """
    provider = create_tracer_provider(
        exporter_endpoint=exporter_endpoint,
        api_key=api_key,
        scope=scope,
        protocol=protocol,
    )
    return provider.get_tracer("patronus.sdk")

```