# PatronusAI SDK > PatronusAI Python SDK for systematic LLM evaluation - Build, test, and improve AI applications with evaluations, experiments, and prompt management The Patronus SDK provides tools for observability, evaluation, experimentation, and prompt management for Large Language Models (LLMs), helping you build reliable and high-quality AI applications. # Getting Started ## API Key To use the Patronus SDK, you'll need an API key from the Patronus platform. If you don't have one yet: 1. Sign up at 1. Navigate to "API Keys" 1. Create a new API key ## Configuration There are several ways to configure the Patronus SDK: ### Environment Variables Set your API key as an environment variable: ```bash export PATRONUS_API_KEY="your-api-key" ``` ### Configuration File Create a `patronus.yaml` file in your project directory: ```yaml api_key: "your-api-key" project_name: "Global" app: "default" ``` ### Direct Configuration Pass configuration values directly when initializing the SDK: ```python import patronus patronus.init( api_key="your-api-key", project_name="Global", app="default", ) ``` ## Verification To verify your installation and configuration: ```python import patronus patronus.init() # Create a simple tracer @patronus.traced() def test_function(): return "Installation successful!" # Call the function to test tracing result = test_function() print(result) ``` If no errors occur, your Patronus SDK is correctly installed and configured. ## Advanced ### Return Value The `patronus.init()` function returns a PatronusContext object that serves as the central access point for all SDK components and functionality. Additionally, `patronus.init()` automatically sets this context globally, making it accessible throughout your application: ```python import patronus # Capture the returned context patronus_context = patronus.init() # Also sets context globally # Direct access is possible but not typically needed tracer_provider = patronus_context.tracer_provider api_client = patronus_context.api_client scope = patronus_context.scope ``` See the PatronusContext API reference for the complete list of available components and their descriptions. This context is particularly useful when integrating with OpenTelemetry instrumentation libraries that require explicit tracer provider configuration, such as in [distributed tracing scenarios](../../observability/tracing/#distributed-tracing). ### Manual Context Management For advanced use cases, you can build and manage contexts manually using build_context() and the context manager pattern: ```python from patronus.init import build_context from patronus import context # Build a context manually with custom configuration custom_context = build_context(...) # Use the context temporarily without setting it globally with context._CTX_PAT.using(custom_context): # All Patronus SDK operations within this block use custom_context result = some_patronus_operation() # Context reverts to previous state after exiting the block ``` This pattern is particularly useful when you need to send data to multiple projects within the same process, or when building testing frameworks that require isolated contexts. ## Next Steps Now that you've installed the Patronus SDK, proceed to the [Quickstart](../quickstart/) guide to learn how to use it effectively. # Installation The Patronus SDK provides tools for evaluating, monitoring, and improving LLM applications. ## Requirements - Python 3.9 or higher - A package manager (uv or pip) ## Basic Installation ### Using uv (Recommended) [uv](https://github.com/astral-sh/uv) is a fast Python package installer and resolver: ```bash uv add patronus ``` ### Using pip ```bash pip install patronus ``` ## Optional Dependencies ### For Experiments To use Patronus experiments functionality (including pandas support): ```bash # Using uv uv add "patronus[experiments]" # Using pip pip install "patronus[experiments]" ``` ## Quick Start with Examples If you'd like to see Patronus in action quickly, check out our [examples](../../examples/). These examples demonstrate how to use Patronus with various LLM frameworks and APIs. For instance, to run the Smolagents weather example: ```bash # Export required API keys export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key # Run the example with uv uv run --no-cache --with "patronus-examples[smolagents]" \ -m patronus_examples.tracking.smolagents_weather ``` See the [examples documentation](../../examples/) for more detailed information on running and understanding the available examples. # Quickstart This guide will help you get started with the Patronus SDK through three practical examples. We'll explore tracing, evaluation, and experimentation to give you a hands-on introduction to the core features. ## Initialization Before running any of the examples, initialize the Patronus SDK: ```python import os import patronus # Initialize with your API key patronus.init( # This is the default and can be omitted api_key=os.environ.get("PATRONUS_API_KEY") ) ``` You can also use a configuration file instead of direct initialization: ```yaml # patronus.yaml api_key: "your-api-key" project_name: "Global" app: "default" ``` For experiments, you don't need to explicitly call init() as run_experiment() handles initialization automatically. ## Example 1: Tracing with a Functional Evaluator This example demonstrates how to trace function execution and create a simple functional evaluator. ```python import patronus from patronus import evaluator, traced patronus.init() @evaluator() def exact_match(expected: str, actual: str) -> bool: return expected.strip() == actual.strip() @traced() def process_query(query: str) -> str: # In a real application, this would call an LLM return f"Processed response for: {query}" # Use the traced function and evaluator together @traced() def main(): query = "What is machine learning?" response = process_query(query) print(f"Response: {response}") expected_response = "Processed response for: What is machine learning?" result = exact_match(expected_response, response) print(f"Evaluation result: {result}") if __name__ == "__main__": main() ``` In this example: 1. We created a simple `exact_match` evaluator using the `@evaluator()` decorator 1. We traced the `process_query` function using the `@traced()` decorator 1. We ran an evaluation by calling the evaluator function directly The tracing will automatically capture execution details, timing, and results, making them available in the Patronus platform. ## Example 2: Using a Patronus Evaluator This example shows how to use a Patronus Evaluator to assess model outputs for hallucinations. ```python import patronus from patronus import traced from patronus.evals import RemoteEvaluator patronus.init() @traced() def generate_insurance_response(query: str) -> str: # In a real application, this would call an LLM return "To even qualify for our car insurance policy, you need to have a valid driver's license that expires later than 2028." @traced("Quickstart: detect hallucination") def main(): check_hallucinates = RemoteEvaluator("lynx", "patronus:hallucination") context = """ To qualify for our car insurance policy, you need a way to show competence in driving which can be accomplished through a valid driver's license. You must have multiple years of experience and cannot be graduating from driving school before or on 2028. """ query = "What is the car insurance policy?" response = generate_insurance_response(query) print(f"Query: {query}") print(f"Response: {response}") # Evaluate the response for hallucinations resp = check_hallucinates.evaluate( task_input=query, task_context=context, task_output=response ) # Print the evaluation results print(f""" Hallucination evaluation: Passed: {resp.pass_} Score: {resp.score} Explanation: {resp.explanation} """) if __name__ == "__main__": main() ``` In this example: 1. We created a traced function generate_insurance_response to simulate an LLM response 1. We used the Patronus Lynx Evaluator 1. We evaluated whether the response contains information not supported by the context 1. We displayed the detailed evaluation results Patronus Evaluators run on Patronus infrastructure and provide sophisticated assessment capabilities without requiring you to implement complex evaluation logic. ## Example 3: Running an Experiment with OpenAI This example demonstrates how to run a comprehensive experiment to evaluate OpenAI model performance across multiple samples and criteria. Before running Example 3, you'll need to install Pandas and the OpenAI SDK and OpenInference instrumentation: ```shell pip install pandas openai openinference-instrumentation-openai ``` The OpenInference instrumentation automatically adds spans for all OpenAI API calls, capturing prompts, responses, and model parameters without any code changes. These details will appear in your Patronus traces for complete visibility into model interactions. ```python from typing import Optional import os import patronus from patronus.evals import evaluator, RemoteEvaluator, EvaluationResult from patronus.experiments import run_experiment, FuncEvaluatorAdapter, Row, TaskResult from openai import OpenAI from openinference.instrumentation.openai import OpenAIInstrumentor oai = OpenAI(api_key=os.environ.get("OPENAI_API_KEY")) patronus.init() @evaluator() def fuzzy_match(row: Row, task_result: TaskResult, **kwargs) -> Optional[EvaluationResult]: if not row.gold_answer or not task_result: return None gold_answer = row.gold_answer.lower() response = task_result.output.lower() key_terms = [term.strip() for term in gold_answer.split(',')] matches = sum(1 for term in key_terms if term in response) match_ratio = matches / len(key_terms) if key_terms else 0 # Return a score between 0-1 indicating match quality return EvaluationResult( pass_=match_ratio > 0.7, score=match_ratio, ) def rag_task(row, **kwargs): # In a real RAG system, this would retrieve context before calling the LLM prompt = f""" Based on the following context, answer the question. Context: {row.task_context} Question: {row.task_input} Answer: """ # Call OpenAI to generate a response response = oai.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful assistant that answers questions based only on the provided context."}, {"role": "user", "content": prompt} ], temperature=0.3, max_tokens=150 ) return response.choices[0].message.content test_data = [ { "task_input": "What is the main impact of climate change on coral reefs?", "task_context": """ Climate change affects coral reefs through several mechanisms. Rising sea temperatures can cause coral bleaching, where corals expel their symbiotic algae and turn white, often leading to death. Ocean acidification, caused by increased CO2 absorption, makes it harder for corals to build their calcium carbonate structures. Sea level rise can reduce light availability for photosynthesis. More frequent and intense storms damage reef structures. The combination of these stressors is devastating to coral reef ecosystems worldwide. """, "gold_answer": "coral bleaching, ocean acidification, reduced calcification, habitat destruction" }, { "task_input": "How do quantum computers differ from classical computers?", "task_context": """ Classical computers process information in bits (0s and 1s), while quantum computers use quantum bits or qubits. Qubits can exist in multiple states simultaneously thanks to superposition, allowing quantum computers to process vast amounts of information in parallel. Quantum entanglement enables qubits to be correlated in ways impossible for classical bits. While classical computers excel at everyday tasks, quantum computers potentially have advantages for specific problems like cryptography, simulation of quantum systems, and certain optimization tasks. However, quantum computers face significant challenges including qubit stability, error correction, and scaling up to useful sizes. """, "gold_answer": "qubits instead of bits, superposition, entanglement, parallel processing" } ] evaluators = [ FuncEvaluatorAdapter(fuzzy_match), RemoteEvaluator("answer-relevance", "patronus:answer-relevance") ] # Run the experiment with OpenInference instrumentation print("Running RAG evaluation experiment...") experiment = run_experiment( dataset=test_data, task=rag_task, evaluators=evaluators, tags={"system": "rag-prototype", "model": "gpt-3.5-turbo"}, integrations=[OpenAIInstrumentor()] ) # Export results to CSV (optional) # experiment.to_csv("rag_evaluation_results.csv") ``` In this example: 1. We defined a task function `answer_questions` that generates responses for our experiment 1. We created a custom evaluator `contains_key_information` to check for specific content 1. We set up an experiment with multiple evaluators (both remote and custom) 1. We ran the experiment across a dataset of questions Experiments provide a powerful way to systematically evaluate your LLM applications across multiple samples and criteria, helping you identify strengths and weaknesses in your models. # Observability # Observability Configuration ## Exporter Protocols The SDK supports two OTLP exporter protocols: | Protocol | Value | Default Endpoint | Available Ports | | --- | --- | --- | --- | | gRPC | `grpc` | `https://otel.patronus.ai:4317` | 4317 | | HTTP | `http/protobuf` | `https://otel.patronus.ai:4318` | 4318, 443 | ## Configuration Methods ### 1. Patronus Configuration ```python patronus.init( otel_endpoint="https://otel.patronus.ai:4318", otel_exporter_otlp_protocol="http/protobuf" ) ``` ```yaml # patronus.yaml otel_endpoint: "https://otel.patronus.ai:4318" otel_exporter_otlp_protocol: "http/protobuf" ``` ```bash export PATRONUS_OTEL_ENDPOINT="https://otel.patronus.ai:4318" export PATRONUS_OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" ``` ### 2. OpenTelemetry Environment Variables ```bash # General (applies to all signals) export OTEL_EXPORTER_OTLP_PROTOCOL="grpc" # Signal-specific export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" export OTEL_EXPORTER_OTLP_LOGS_PROTOCOL="grpc" ``` ## Configuration Priority 1. Function parameters 1. Environment variables (`PATRONUS_OTEL_EXPORTER_OTLP_PROTOCOL`) 1. Configuration file (`patronus.yaml`) 1. `OTEL_EXPORTER_OTLP_TRACES_PROTOCOL` / `OTEL_EXPORTER_OTLP_LOGS_PROTOCOL` 1. `OTEL_EXPORTER_OTLP_PROTOCOL` 1. Default: `grpc` ## Endpoint Configuration ### Custom Endpoints ```python patronus.init( otel_endpoint="https://collector.example.com:4317", otel_exporter_otlp_protocol="grpc" ) ``` ### Connection Security Security is determined by the URL scheme for both gRPC and HTTP protocols: - `https://` - Secure connection (TLS) - `http://` - Insecure connection ```python # Secure gRPC patronus.init(otel_endpoint="https://collector.example.com:4317") # Insecure gRPC patronus.init(otel_endpoint="http://collector.example.com:4317") # Secure HTTP patronus.init( otel_endpoint="https://collector.example.com:4318", otel_exporter_otlp_protocol="http/protobuf" ) # Insecure HTTP patronus.init( otel_endpoint="http://collector.example.com:4318", otel_exporter_otlp_protocol="http/protobuf" ) ``` ### HTTP Path Handling For HTTP protocol, paths are automatically appended: - Traces: `/v1/traces` - Logs: `/v1/logs` ## Examples ### HTTP Protocol with Custom Endpoint ```python patronus.init( otel_endpoint="http://internal-collector:8080", otel_exporter_otlp_protocol="http/protobuf" ) ``` ### HTTP Protocol on Standard HTTPS Port ```python patronus.init( otel_endpoint="https://otel.example.com:443", otel_exporter_otlp_protocol="http/protobuf" ) ``` ### gRPC with Insecure Connection ```python patronus.init( otel_endpoint="http://internal-collector:4317", otel_exporter_otlp_protocol="grpc" ) ``` ### Mixed Protocols ```bash export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL="http/protobuf" export OTEL_EXPORTER_OTLP_LOGS_PROTOCOL="grpc" ``` # Logging Logging is an essential feature of the Patronus SDK that allows you to record events, debug information, and track the execution of your LLM applications. This page covers how to set up and use logging in your code. Configuration For information about configuring observability features, including exporter protocols and endpoints, see the [Observability Configuration](../configuration/) guide. ## Getting Started with Logging The Patronus SDK provides a simple logging interface that integrates with Python's standard logging module while also automatically exporting logs to the Patronus AI Platform: ```python import patronus patronus.init() log = patronus.get_logger() # Basic logging log.info("Processing user query") # Different log levels are available log.debug("Detailed debug information") log.warning("Something might be wrong") log.error("An error occurred") log.critical("System cannot continue") ``` ## Configuring Console Output By default, Patronus logs are sent to the Patronus AI Platform but are not printed to the console. To display logs in your console output, you can add a standard Python logging handler: ```python import sys import logging import patronus patronus.init() log = patronus.get_logger() # Add a console handler to see logs in your terminal console_handler = logging.StreamHandler(sys.stdout) log.addHandler(console_handler) # Now logs will appear in both console and Patronus Platform log.info("This message appears in the console and is sent to Patronus") ``` You can also customize the format of console logs: ```python import sys import logging import patronus patronus.init() log = patronus.get_logger() formatter = logging.Formatter('[%(asctime)s] %(levelname)-8s: %(message)s') console_handler = logging.StreamHandler(sys.stdout) console_handler.setFormatter(formatter) log.addHandler(console_handler) # Logs will now include timestamp and level log.info("Formatted log message") ``` ## Advanced Configuration Patronus integrates with Python's logging module, allowing for advanced configuration options. The SDK uses two main loggers: - `patronus.sdk` - For client-emitted messages that are automatically exported to the Patronus AI Platform - `patronus.core` - For library-emitted messages related to the SDK's internal operations Here's how to configure these loggers using standard library methods: ```python import logging import patronus # Initialize Patronus before configuring logging patronus.init() # Configure the root Patronus logger patronus_root_logger = logging.getLogger("patronus") patronus_root_logger.setLevel(logging.WARNING) # Set base level for all Patronus loggers # Add a console handler with custom formatting console_handler = logging.StreamHandler() formatter = logging.Formatter( fmt='[%(asctime)s] %(levelname)-8s %(name)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S' ) console_handler.setFormatter(formatter) patronus_root_logger.addHandler(console_handler) # Configure specific loggers patronus_core_logger = logging.getLogger("patronus.core") patronus_core_logger.setLevel(logging.WARNING) # Only show warnings and above for internal SDK messages patronus_sdk_logger = logging.getLogger("patronus.sdk") patronus_sdk_logger.setLevel(logging.INFO) # Show info and above for your application logs ``` ## Logging with Traces Patronus logging integrates seamlessly with the tracing system, allowing you to correlate logs with specific spans in your application flow: ```python import patronus from patronus import traced, start_span patronus.init() log = patronus.get_logger() @traced() def process_user_query(query): log.info("Processing query") with start_span("Query Analysis"): log.info("Analyzing query intent") ... with start_span("Response Generation"): log.info("Generating LLM response") ... return "Response to: " + query # Logs will be associated with the appropriate spans result = process_user_query("Tell me about machine learning") ``` # Tracing Tracing is a core feature of the Patronus SDK that allows you to monitor and understand the behavior of your LLM applications. This page covers how to set up and use tracing in your code. Configuration For information about configuring observability features, including exporter protocols and endpoints, see the [Observability Configuration](../configuration/) guide. ## Getting Started with Tracing Tracing in Patronus works through two main mechanisms: 1. **Function decorators**: Easily trace entire functions 1. **Context managers**: Trace specific code blocks within functions ## Using the `@traced()` Decorator The simplest way to add tracing is with the `@traced()` decorator: ```python import patronus from patronus import traced patronus.init() @traced() def generate_response(prompt: str) -> str: # Your LLM call or processing logic here return f"Response to: {prompt}" # Call the traced function result = generate_response("Tell me about machine learning") ``` ### Decorator Options The `@traced()` decorator accepts several parameters for customization: ```python @traced( span_name="Custom span name", # Default: function name log_args=True, # Whether to log function arguments log_results=True, # Whether to log function return values log_exceptions=True, # Whether to log exceptions disable_log=False, # Completely disable logging (maintains spans) attributes={"key": "value"} # Custom attributes to add to the span ) def my_function(): pass ``` See the API Reference for complete details. ## Using the `start_span()` Context Manager For more granular control, use the `start_span()` context manager to trace specific blocks of code: ```python import patronus from patronus.tracing import start_span patronus.init() def complex_workflow(data): # First phase with start_span("Data preparation", attributes={"data_size": len(data)}): prepared_data = preprocess(data) # Second phase with start_span("Model inference"): results = run_model(prepared_data) # Third phase with start_span("Post-processing"): final_results = postprocess(results) return final_results ``` ### Context Manager Options The `start_span()` context manager accepts these parameters: ```python with start_span( "Span name", # Name of the span (required) record_exception=False, # Whether to record exceptions attributes={"custom": "attribute"} # Custom attributes to add ) as span: # Your code here # You can also add attributes during execution: span.set_attribute("dynamic_value", 42) ``` See the API Reference for complete details. ## Custom Attributes Both tracing methods allow you to add custom attributes that provide additional context for your traces: ```python @traced(attributes={ "model": "gpt-4", "version": "1.0", "temperature": 0.7 }) def generate_with_gpt4(prompt): # Function implementation pass # Or with context manager with start_span("Query processing", attributes={ "query_type": "search", "filters_applied": True, "result_limit": 10 }): # Processing code pass ``` ## Distributed Tracing The Patronus SDK is built on OpenTelemetry and automatically supports context propagation across distributed services. This enables you to trace requests as they flow through multiple services in your application architecture. The [OpenTelemetry Python Contrib](https://github.com/open-telemetry/opentelemetry-python-contrib) repository provides instrumentation for many popular frameworks and libraries. ### Example: FastAPI Services with Context Propagation First, install the required dependencies: ```bash uv add opentelemetry-instrumentation-httpx \ opentelemetry-instrumentation-fastapi \ fastapi[all] \ patronus ``` Here's a complete example showing two FastAPI services with automatic trace context propagation: **Backend Service (`service_backend.py`):** ```python import patronus from fastapi import FastAPI from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor # Initialize Patronus SDK patronus_context = patronus.init(service="backend") app = FastAPI(title="Backend Service") @app.get("/hello/{name}") async def hello_backend(name: str): return { "message": f"Hello {name} from Backend Service!", "service": "backend" } # Instrument FastAPI after Patronus initialization FastAPIInstrumentor.instrument_app(app, tracer_provider=patronus_context.tracer_provider) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8001) ``` **Gateway Service (`service_gateway.py`):** ```python import httpx import patronus from fastapi import FastAPI from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor # Initialize Patronus SDK with HTTPX instrumentation patronus_context = patronus.init( service="gateway", integrations=[ HTTPXClientInstrumentor(), ] ) app = FastAPI(title="Gateway Service") @app.get("/hello/{name}") async def hello_gateway(name: str): # This HTTP call will automatically propagate trace context async with httpx.AsyncClient() as client: response = await client.get(f"http://localhost:8001/hello/{name}") backend_data = response.json() return { "gateway_message": f"Gateway received request for {name}", "backend_response": backend_data } # Instrument FastAPI after Patronus initialization FastAPIInstrumentor.instrument_app(app, tracer_provider=patronus_context.tracer_provider) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` ### Running the Example First, export your Patronus API key: ```bash export PATRONUS_API_KEY="your-api-key" ``` Then run the services: 1. Start the backend: `python service_backend.py` 1. Start the gateway: `python service_gateway.py` 1. Make a request: `curl http://localhost:8000/hello/world` After making the request, you should see the connected traces in the Patronus Platform showing the complete request flow from gateway to backend service. ### Important Notes - FastAPI instrumenter requires manual setup with `FastAPIInstrumentor.instrument_app()` after Patronus initialization - Pass the `tracer_provider` from Patronus context to ensure proper integration - Trace context is automatically propagated through HTTP headers when services are properly instrumented # Evaluations # Batch Evaluations When evaluating multiple outputs or using multiple evaluators, Patronus provides efficient batch evaluation capabilities. This page covers how to perform batch evaluations and manage evaluation groups. ## Using Patronus Client For more advanced batch evaluation needs, use the `Patronus` client: ```python from patronus import init from patronus.pat_client import Patronus from patronus.evals import RemoteEvaluator init() with Patronus() as client: # Run multiple evaluators in parallel results = client.evaluate( evaluators=[ RemoteEvaluator("judge", "patronus:is-helpful"), RemoteEvaluator("lynx", "patronus:hallucination") ], task_input="What is quantum computing?", task_output="Quantum computing uses quantum bits or qubits to perform computations...", gold_answer="Computing that uses quantum phenomena like superposition and entanglement" ) # Check if all evaluations passed if results.all_succeeded(): print("All evaluations passed!") else: print("Some evaluations failed:") for failed in results.failed_evaluations(): print(f" - {failed.text_output}") ``` The `Patronus` client provides: - Parallel evaluation execution - Connection pooling - Error handling - Result aggregation ### Asynchronous Evaluation For asynchronous workflows, use `AsyncPatronus`: ```python import asyncio from patronus import init from patronus.pat_client import AsyncPatronus from patronus.evals import AsyncRemoteEvaluator init() async def evaluate_responses(): async with AsyncPatronus() as client: # Run evaluations asynchronously results = await client.evaluate( evaluators=[ AsyncRemoteEvaluator("judge", "patronus:is-helpful"), AsyncRemoteEvaluator("lynx", "patronus:hallucination") ], task_input="What is quantum computing?", task_output="Quantum computing uses quantum bits or qubits to perform computations...", gold_answer="Computing that uses quantum phenomena like superposition and entanglement" ) print(f"Number of evaluations: {len(results.results)}") print(f"All passed: {results.all_succeeded()}") # Run the async function asyncio.run(evaluate_responses()) ``` ## Background Evaluation For non-blocking evaluation, use the `evaluate_bg()` method: ```python from patronus import init from patronus.pat_client import Patronus from patronus.evals import RemoteEvaluator init() with Patronus() as client: # Start background evaluation future = client.evaluate_bg( evaluators=[ RemoteEvaluator("judge", "factual-accuracy"), RemoteEvaluator("judge", "patronus:helpfulness") ], task_input="Explain how vaccines work.", task_output="Vaccines work by training the immune system to recognize and combat pathogens..." ) # Do other work while evaluation happens in background print("Continuing with other tasks...") results = future.get() # Blocks until complete print(f"Evaluation complete: {results.all_succeeded()}") ``` The async version works similarly: ```python async with AsyncPatronus() as client: # Start background evaluation task = client.evaluate_bg( evaluators=[...], task_input="...", task_output="..." ) # Do other async work await some_other_async_function() # Get results when needed results = await task ``` ## Working with Evaluation Results The `evaluate()` method returns an `EvaluationContainer` with several useful methods: ```python results = client.evaluate(evaluators=[...], task_input="...", task_output="...") if results.any_failed(): print("Some evaluations failed") if results.all_succeeded(): print("All evaluations passed") for failed in results.failed_evaluations(): print(f"Failed: {failed.text_output}") for success in results.succeeded_evaluations(): print(f"Passed: {success.text_output}") if results.has_exception(): results.raise_on_exception() # Re-raise any exceptions that occurred ``` ## Example: Comprehensive Quality Check Here's a complete example of batch evaluation for content quality: ```python from patronus import init from patronus.pat_client import Patronus from patronus.evals import RemoteEvaluator init() def check_content_quality(question, answer): with Patronus() as client: results = client.evaluate( evaluators=[ RemoteEvaluator("judge", "factual-accuracy"), RemoteEvaluator("judge", "helpfulness"), RemoteEvaluator("judge", "coherence"), RemoteEvaluator("judge", "grammar"), RemoteEvaluator("lynx", "patronus:hallucination") ], task_input=question, task_output=answer ) if results.any_failed(): print("Content quality check failed") for failed in results.failed_evaluations(): print(f"- Failed check: {failed.text_output}") print(f" Explanation: {failed.explanation}") return False print("Content passed all quality checks") return True check_content_quality( "What is the capital of France?", "The capital of France is Paris, which is located on the Seine River." ) ``` ## Using the `bundled_eval()` Context Manager The `bundled_eval()` is a lower-level context manager that groups multiple evaluations together based on their arguments. This is particularly useful when working with multiple user-defined evaluators that don't conform to the Patronus structured evaluator format. ```python import patronus from patronus.evals import bundled_eval, evaluator patronus.init() @evaluator() def exact_match(actual, expected) -> bool: return actual == expected @evaluator() def iexact_match(actual: str, expected: str) -> bool: return actual.strip().lower() == expected.strip().lower() # Group these evaluations together in a single trace and single log record with bundled_eval(): exact_match("string", "string") iexact_match("string", "string") ``` # User-Defined Evaluators Evaluators are the core building blocks of Patronus's evaluation system. This page covers how to create and use your own custom evaluators to assess LLM outputs according to your specific criteria. ## Creating Basic Evaluators The simplest way to create an evaluator is with the `@evaluator()` decorator: ```python from patronus import evaluator @evaluator() def keyword_match(text: str, keywords: list[str]) -> float: """ Evaluates whether the text contains the specified keywords. Returns a score between 0.0 and 1.0 based on the percentage of matched keywords. """ matches = sum(keyword.lower() in text.lower() for keyword in keywords) return matches / len(keywords) if keywords else 0.0 ``` This decorator automatically: - Integrates with the Patronus tracing - Exports evaluation results to the Patronus Platform ### Flexible Input and Output User-defined evaluators can accept any parameters and return several types of results: ```python # Boolean evaluator (pass/fail) @evaluator() def contains_answer(text: str, answer: str) -> bool: return answer.lower() in text.lower() # Numeric evaluator (score) @evaluator() def semantic_similarity(text1: str, text2: str) -> float: # Simple example - in practice use proper semantic similarity words1, words2 = set(text1.lower().split()), set(text2.lower().split()) intersection = words1.intersection(words2) union = words1.union(words2) return len(intersection) / len(union) if union else 0.0 # String evaluator @evaluator() def tone_classifier(text: str) -> str: positive = ['good', 'excellent', 'great', 'helpful'] negative = ['bad', 'poor', 'unhelpful', 'wrong'] pos_count = sum(word in text.lower() for word in positive) neg_count = sum(word in text.lower() for word in negative) if pos_count > neg_count: return "positive" elif neg_count > pos_count: return "negative" else: return "neutral" ``` ### Return Types Evaluators can return different types which are automatically converted to `EvaluationResult` objects: - **Boolean**: `True`/`False` indicating pass/fail - **Float/Integer**: Numerical scores (typically between 0-1) - **String**: Text output categorizing the result - **EvaluationResult**: Complete evaluation with scores, explanations, etc. ## Using EvaluationResult For more detailed evaluations, return an `EvaluationResult` object: ```python from patronus import evaluator from patronus.evals import EvaluationResult @evaluator() def comprehensive_evaluation(response: str, reference: str) -> EvaluationResult: # Example implementation - replace with actual logic has_keywords = all(word in response.lower() for word in ["important", "key", "concept"]) accuracy = 0.85 # Calculated accuracy score return EvaluationResult( score=accuracy, # Numeric score (typically 0-1) pass_=accuracy >= 0.7, # Boolean pass/fail text_output="Satisfactory" if accuracy >= 0.7 else "Needs improvement", # Category explanation=f"Response {'contains' if has_keywords else 'is missing'} key terms. Accuracy: {accuracy:.2f}", metadata={ # Additional structured data "has_required_keywords": has_keywords, "response_length": len(response), "accuracy": accuracy } ) ``` The `EvaluationResult` object can include: - **score**: Numerical assessment (typically 0-1) - **pass\_**: Boolean pass/fail status - **text_output**: Categorical or textual result - **explanation**: Human-readable explanation of the result - **metadata**: Additional structured data for analysis - **tags**: Key-value pairs for filtering and organization ## Using Evaluators Once defined, evaluators can be used directly: ```python # Use evaluators as normal function result = keyword_match("The capital of France is Paris", ["capital", "France", "Paris"]) print(f"Score: {result}") # Output: Score: 1.0 # Using class-based evaluator safety_check = ContentSafetyEvaluator() result = safety_check.evaluate( task_output="This is a helpful and safe response." ) print(f"Safety check passed: {result.pass_}") # Output: Safety check passed: True ``` # Patronus Evaluators Patronus provides a suite of evaluators that help you assess LLM outputs without writing complex evaluation logic. These managed evaluators run on Patronus infrastructure. Visit Patronus Platform console to define your own criteria. ## Using Patronus Evaluators You can use Patronus evaluators through the `RemoteEvaluator` class: ```python from patronus import init from patronus.evals import RemoteEvaluator init() factual_accuracy = RemoteEvaluator("judge", "factual-accuracy") # Evaluate an LLM output result = factual_accuracy.evaluate( task_input="What is the capital of France?", task_output="The capital of France is Paris, which is located on the Seine River.", gold_answer="Paris" ) print(f"Passed: {result.pass_}") print(f"Score: {result.score}") print(f"Explanation: {result.explanation}") ``` ## Synchronous and Asynchronous Versions Patronus evaluators are available in both synchronous and asynchronous versions: ```python # Synchronous usage (as shown above) factual_accuracy = RemoteEvaluator("judge", "factual-accuracy") result = factual_accuracy.evaluate(...) # Asynchronous usage from patronus.evals import AsyncRemoteEvaluator async_factual_accuracy = AsyncRemoteEvaluator("judge", "factual-accuracy") result = await async_factual_accuracy.evaluate(...) ``` # Experiments # Advanced Experiment Features This page covers advanced features of the Patronus Experimentation Framework that help you build more sophisticated evaluation workflows. ## Multi-Stage Processing with Chains For complex workflows, you can use chains to create multi-stage processing and evaluation pipelines. Chains connect multiple processing stages where the output of one stage becomes the input to the next. ### Basic Chain Structure ```python from patronus.experiments import run_experiment from patronus.evals import RemoteEvaluator experiment = run_experiment( dataset=dataset, chain=[ # Stage 1: Generate summaries { "task": generate_summary, "evaluators": [ RemoteEvaluator("judge", "conciseness"), RemoteEvaluator("judge", "coherence") ] }, # Stage 2: Generate questions from summaries { "task": generate_questions, "evaluators": [ RemoteEvaluator("judge", "relevance"), QuestionDiversityEvaluator() ] }, # Stage 3: Answer questions { "task": answer_questions, "evaluators": [ RemoteEvaluator("judge", "factual-accuracy"), RemoteEvaluator("judge", "helpfulness") ] } ] ) ``` Each stage in the chain can: 1. Apply its own task function (or no task if set to `None`) 1. Use its own set of evaluators 1. Access results from previous stages ### Accessing Previous Results in Chain Tasks Tasks in later chain stages can access outputs and evaluations from earlier stages through the `parent` parameter: ```python def generate_questions(row, parent, **kwargs): """Generate questions based on a summary from the previous stage.""" # Get the summary from the previous task summary = parent.task.output if parent and parent.task else None if not summary: return None # Check if summary evaluations are available if parent and parent.evals: coherence = parent.evals.get("judge:coherence") # Use previous evaluation results to guide question generation if coherence and coherence.score > 0.8: return "Here are three detailed questions based on the summary..." else: return "Here are three basic questions about the summary..." # Default questions if no evaluations available return "Here are some standard questions about the topic..." ``` This example demonstrates how a task can adapt its behavior based on previous outputs and evaluations. ## Concurrency Controls For better performance, the framework automatically processes dataset examples concurrently. You can control this behavior to prevent rate limiting or resource exhaustion: ```python experiment = run_experiment( dataset=large_dataset, task=api_intensive_task, evaluators=[evaluator1, evaluator2], # Limit the number of concurrent tasks and evaluations max_concurrency=5 ) ``` This is particularly important for: - Tasks that make API calls with rate limits - Resource-intensive processing - Large datasets with many examples ## OpenTelemetry Integrations The framework supports OpenTelemetry instrumentation for enhanced tracing and monitoring: ```python from openinference.instrumentation.openai import OpenAIInstrumentor experiment = run_experiment( dataset=dataset, task=openai_task, evaluators=[evaluator1, evaluator2], # Add OpenTelemetry instrumentors integrations=[OpenAIInstrumentor()] ) ``` Benefits of OpenTelemetry integration include: - Automatic capture of API calls and parameters - Detailed timing information for performance analysis - Integration with observability platforms ## Organizing Experiments ### Custom Experiment Names and Projects Organize your experiments into projects with descriptive names for better management: ```python experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[evaluator1, evaluator2], # Organize experiments project_name="RAG System Evaluation", experiment_name="baseline-gpt4-retrieval" ) ``` The framework automatically appends a timestamp to experiment names for uniqueness. ### Tags for Filtering and Organization Tags help organize and filter experiment results: ```python experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[evaluator1, evaluator2], # Add tags for filtering and organization tags={ "model": "gpt-4", "version": "2.0", "retrieval_method": "bm25", "environment": "staging" } ) ``` Important notes about tags: - Tags are propagated to all evaluation results in the experiment - They cannot be overridden by tasks or evaluators - Use a small set of consistent values for each tag (avoid having too many unique values) - Tags are powerful for filtering and grouping in analysis ### Experiment Metadata Experiments automatically capture important metadata, including evaluator weights when specified: ```python from patronus.experiments import run_experiment, FuncEvaluatorAdapter from patronus.evals import RemoteEvaluator from patronus import evaluator @evaluator() def custom_check(row, **kwargs): return True # Experiment with weighted evaluators experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[ RemoteEvaluator("judge", "patronus:is-concise", weight=0.6), FuncEvaluatorAdapter(custom_check, weight="0.4") ] ) # Weights are automatically stored in experiment metadata # as "evaluator_weights": { # "judge:patronus:is-concise": "0.6", # "custom_check:": "0.4" # } ``` Evaluator weights are automatically collected and stored in the experiment's metadata under the `evaluator_weights` key. This provides a permanent record of how evaluators were weighted in each experiment for reproducibility and analysis. For more details on using evaluator weights, see the [Using Evaluators](../evaluators/#evaluator-weights-experiments-only) page. ## Custom API Configuration For on-prem environments, you can customize the API configuration: ```python experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[evaluator1, evaluator2], # Custom API configuration api_key="your-api-key", api_url="https://custom-endpoint.patronus.ai", otel_endpoint="https://custom-telemetry.patronus.ai", timeout_s=120 ) ``` ## Manual Experiment Control For fine-grained control over the experiment lifecycle, you can create and run experiments manually: ```python from patronus.experiments import Experiment # Create the experiment experiment = await Experiment.create( dataset=dataset, task=task, evaluators=evaluators, # Additional configuration... ) # Perform custom setup if needed # ... # Run the experiment when ready await experiment.run() # Export results experiment.to_csv("results.csv") ``` This pattern is useful when you need to: - Perform additional setup after experiment creation - Control exactly when execution starts - Implement custom pre- or post-processing ## Best Practices When using advanced experiment features: 1. **Start simple**: Begin with basic experiments before adding chain complexity 1. **Test incrementally**: Validate each stage before combining them 1. **Monitor resources**: Watch for memory usage with large datasets 1. **Set appropriate concurrency**: Balance throughput against rate limits 1. **Use consistent tags**: Create a standard tagging system across experiments # Working with Datasets Datasets provide the foundation for Patronus experiments, containing the examples that your tasks and evaluators will process. This page explains how to create, load, and work with datasets effectively. ## Dataset Structure and Evaluator Compatibility Patronus experiments are designed to work with `StructuredEvaluator` classes, which expect specific input parameters. The standard dataset fields map directly to these parameters, making integration seamless: - `system_prompt`: System instruction for LLM-based tasks - `task_context`: Additional information or context (string or list of strings) - `task_metadata`: Additional structured information about the task - `task_attachments`: Files or other binary data - `task_input`: The primary input query or text - `task_output`: The model's response or output to evaluate - `gold_answer`: The expected correct answer or reference output - `tags`: Key-value pairs - `sid`: A unique identifier for the example (automatically generated if not provided) While you can include any custom fields in your dataset, using these standard field names ensures compatibility with structured evaluators without additional configuration. ## Creating Datasets Patronus accepts datasets in several formats: ### List of Dictionaries ```python dataset = [ { "task_input": "What is machine learning?", "gold_answer": "Machine learning is a subfield of artificial intelligence...", "tags": {"category": "ai", "difficulty": "beginner"}, "difficulty": "beginner" # Custom field }, { "task_input": "Explain quantum computing", "gold_answer": "Quantum computing uses quantum phenomena...", "tags": {"category": "physics", "difficulty": "advanced"}, "difficulty": "advanced" # Custom field } ] experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[my_evaluator] ) ``` ### Pandas DataFrame ```python import pandas as pd df = pd.DataFrame({ "task_input": ["What is Python?", "What is JavaScript?"], "gold_answer": ["Python is a programming language...", "JavaScript is a programming language..."], "tags": [{"type": "backend"}, {"type": "frontend"}], "language_type": ["backend", "frontend"] # Custom field }) experiment = run_experiment(dataset=df, ...) ``` ### CSV or JSONL Files ```python from patronus.datasets import read_csv, read_jsonl # Load with default field mappings dataset = read_csv("questions.csv") # Load with custom field mappings dataset = read_jsonl( "custom.jsonl", task_input_field="question", # Map "question" field to "task_input" gold_answer_field="answer", # Map "answer" field to "gold_answer" system_prompt_field="instruction", # Map "instruction" field to "system_prompt" tags_field="metadata" # Map "metadata" field to "tags" ) ``` ### Remote Datasets Patronus allows you to work with datasets stored remotely on the Patronus platform. This is useful for sharing standard datasets across your organization or utilizing pre-built evaluation datasets. ```python from patronus.datasets import RemoteDatasetLoader # Load a dataset from the Patronus platform using its name remote_dataset = RemoteDatasetLoader("financebench") # Load a dataset from the Patronus platform using its ID remote_dataset = RemoteDatasetLoader(by_id="d-eo6a5zy3nwach69b") experiment = run_experiment( dataset=remote_dataset, task=my_task, evaluators=[my_evaluator], ) ``` The `RemoteDatasetLoader` asynchronously fetches the dataset from the Patronus API when the experiment runs. It handles the data mapping automatically, transforming the API response into the standard dataset structure with all the expected fields (`system_prompt`, `task_input`, `gold_answer`, etc.). Remote datasets follow the same structure and field conventions as local datasets, making them interchangeable in your experiment code. ## Accessing Dataset Fields During experiment execution, dataset examples are provided as `Row` objects: ```python def my_task(row, **kwargs): # Access standard fields question = row.task_input reference = row.gold_answer context = row.task_context # Access tags if row.tags: category = row.tags.get("category") # Access custom fields directly difficulty = row.difficulty # Access custom field by name # Access row ID sample_id = row.sid return f"Answering {difficulty} question (ID: {sample_id}): {question}" ``` The `Row` object automatically provides attributes for all fields in your dataset, making access straightforward for both standard and custom fields. ## Using Custom Dataset Schemas If your dataset uses a different schema than the standard field names, you have two options: 1. **Map fields during loading**: Use field mapping parameters when loading data ```python from patronus.datasets import read_csv dataset = read_csv("data.csv", task_input_field="question", gold_answer_field="answer", tags_field="metadata") ``` 1. **Use evaluator adapters**: Create adapters that transform your data structure to match what evaluators expect ```python from patronus import evaluator from patronus.experiments import run_experiment, FuncEvaluatorAdapter @evaluator() def my_evaluator_function(*, expected, actual, context): ... class CustomAdapter(FuncEvaluatorAdapter): def transform(self, row, task_result, parent, **kwargs): # Transform dataset fields to evaluator parameters. # The first value is list of positional arguments (*args) passed to the evaluator function. # The second value is named arguments (**kwargs) passed to the evaluator function. return [], { "expected": row.reference_answer, # Map custom field to expected parameter "actual": task_result.output if task_result else None, "context": row.additional_info # Map custom field to context parameter } experiment = run_experiment( dataset=custom_dataset, evaluators=[CustomAdapter(my_evaluator_function)] ) ``` This adapter approach is particularly important for function-based evaluators, which need to be explicitly adapted for use in experiments. ## Dataset IDs and Sample IDs Each dataset and row can have identifiers that are used for organization and tracing: ```python from patronus.datasets import Dataset # Dataset with explicit ID dataset = Dataset.from_records( records=[...], dataset_id="qa-dataset-v1" ) # Dataset with explicit sample IDs dataset = Dataset.from_records([ {"sid": "q1", "task_input": "Question 1", "gold_answer": "Answer 1"}, {"sid": "q2", "task_input": "Question 2", "gold_answer": "Answer 2"} ]) ``` If not provided, sample IDs (`sid`) are automatically generated. ## Best Practices 1. **Use standard field names when possible**: This minimizes the need for custom adapters 1. **Include gold answers**: This enables more comprehensive evaluation 1. **Use tags for organization**: Tags provide a flexible way to categorize examples 1. **Keep task inputs focused**: Clear, concise inputs lead to better evaluations 1. **Add relevant metadata**: Additional context helps with result analysis 1. **Normalize data before experiments**: Pre-process data to ensure consistent format 1. **Consider remote datasets for team collaboration**: Use the Patronus platform to share standardized datasets In the next section, we'll explore how to create tasks that process your dataset examples. # Using Evaluators in Experiments Evaluators are the core assessment tools in Patronus experiments, measuring the quality of task outputs against defined criteria. This page covers how to use various types of evaluators in the Patronus Experimentation Framework. ## Evaluator Types The framework supports several types of evaluators: - **Remote Evaluators**: Use Patronus's managed evaluation services - **Custom Evaluators**: Your own evaluation logic. - **Function-based**: Simple functions decorated with @evaluator() that need to be wrapped with FuncEvaluatorAdapter when used in experiments. - **Class-based**: More powerful evaluators created by extending `StructuredEvaluator` (synchronous) or `AsyncStructuredEvaluator` (asynchronous) base classes with predefined interfaces. Each type has different capabilities and use cases. ## Remote Evaluators Remote evaluators run on Patronus infrastructure and provide standardized, high-quality assessments: ```python from patronus.evals import RemoteEvaluator from patronus.experiments import run_experiment experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[ RemoteEvaluator("judge", "patronus:is-concise"), RemoteEvaluator("lynx", "patronus:hallucination"), RemoteEvaluator("judge", "patronus:is-helpful") ] ) ``` ## Class-Based Evaluators You can create custom evaluator classes by inheriting from the Patronus base classes: > **Note**: The following example uses the `transformers` library from Hugging Face. Install it with `pip install transformers` before running this code. ```python import numpy as np from transformers import BertTokenizer, BertModel from patronus import StructuredEvaluator, EvaluationResult from patronus.experiments import run_experiment class BERTScore(StructuredEvaluator): def __init__(self, pass_threshold: float): self.pass_threshold = pass_threshold self.tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") self.model = BertModel.from_pretrained("bert-base-uncased") def evaluate(self, *, task_output: str, gold_answer: str, **kwargs) -> EvaluationResult: output_toks = self.tokenizer(task_output, return_tensors="pt", padding=True, truncation=True) gold_answer_toks = self.tokenizer(gold_answer, return_tensors="pt", padding=True, truncation=True) output_embeds = self.model(**output_toks).last_hidden_state.mean(dim=1).detach().numpy() gold_answer_embeds = self.model(**gold_answer_toks).last_hidden_state.mean(dim=1).detach().numpy() score = np.dot(output_embeds, gold_answer_embeds.T) / ( np.linalg.norm(output_embeds) * np.linalg.norm(gold_answer_embeds) ) return EvaluationResult( score=score, pass_=score >= self.pass_threshold, tags={"pass_threshold": str(self.pass_threshold)}, ) experiment = run_experiment( dataset=[ { "task_output": "Translate 'Goodbye' to Spanish.", "gold_answer": "Adiós", } ], evaluators=[BERTScore(pass_threshold=0.8)], ) ``` Class-based evaluators that inherit from `StructuredEvaluator` or `AsyncStructuredEvaluator` are automatically adapted for use in experiments. ## Function Evaluators For simpler evaluation logic, you can use function-based evaluators. When using function evaluators in experiments, you must wrap them with `FuncEvaluatorAdapter`. ### Standard Function Adapter By default, `FuncEvaluatorAdapter` expects functions that follow this interface: ```python from typing import Optional from patronus import evaluator from patronus.datasets import Row from patronus.experiments.types import TaskResult, EvalParent from patronus.evals import EvaluationResult from patronus.experiments import run_experiment, FuncEvaluatorAdapter @evaluator() def standard_evaluator( row: Row, task_result: TaskResult, parent: EvalParent, **kwargs ) -> Optional[EvaluationResult]: """ Standard interface for function evaluators used with FuncEvaluatorAdapter. """ if not task_result or not task_result.output: # Skip the evaluation return None if row.gold_answer and row.gold_answer.lower() in task_result.output.lower(): return EvaluationResult(score=1.0, pass_=True, text_output="Contains answer") else: return EvaluationResult(score=0.0, pass_=False, text_output="Missing answer") # Use with standard adapter experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[ FuncEvaluatorAdapter(standard_evaluator) ] ) ``` ### Custom Function Adapters If your evaluator function doesn't match the standard interface, you can create a custom adapter: ```python from patronus import evaluator from patronus.datasets import Row from patronus.experiments.types import TaskResult, EvalParent from patronus.experiments.adapters import FuncEvaluatorAdapter # An evaluator function with a different interface @evaluator() def exact_match(expected: str, actual: str, case_sensitive: bool = False) -> bool: """ Checks if actual text exactly matches expected text. """ if not case_sensitive: return expected.lower() == actual.lower() return expected == actual # Custom adapter to transform experiment arguments to evaluator arguments class ExactMatchAdapter(FuncEvaluatorAdapter): def __init__(self, case_sensitive=False): super().__init__(exact_match) self.case_sensitive = case_sensitive def transform( self, row: Row, task_result: TaskResult, parent: EvalParent, **kwargs ) -> tuple[list, dict]: # Create arguments list and dict for the evaluator function args = [] # No positional arguments in this case # Create keyword arguments matching the evaluator's parameters evaluator_kwargs = { "expected": row.gold_answer, "actual": task_result.output if task_result else "", "case_sensitive": self.case_sensitive } return args, evaluator_kwargs # Use custom adapter in an experiment experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[ ExactMatchAdapter(case_sensitive=False) ] ) ``` The `transform()` method is the key to adapting any function to the experiment framework. It takes the standard arguments provided by the framework and transforms them into the format your evaluator function expects. ## Combining Evaluator Types You can use multiple types of evaluators in a single experiment: ```python experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[ # Remote evaluator RemoteEvaluator("judge", "factual-accuracy", weight=0.4), # Class-based evaluator BERTScore(pass_threshold=0.7, weight=0.3), # Function evaluator with standard adapter FuncEvaluatorAdapter(standard_evaluator, weight=0.2), # Function evaluator with custom adapter ExactMatchAdapter(case_sensitive=False, weight=0.1) ] ) ``` ## Evaluator Chains In multi-stage evaluation chains, evaluators from one stage can see the results of previous stages: ```python experiment = run_experiment( dataset=dataset, chain=[ # First stage { "task": generate_summary, "evaluators": [ RemoteEvaluator("judge", "conciseness"), RemoteEvaluator("judge", "coherence") ] }, # Second stage - evaluating based on first stage results { "task": None, # No additional processing "evaluators": [ # This evaluator can see previous evaluations DependentEvaluator() ] } ] ) # Example of a function evaluator that uses previous results @evaluator() def final_aggregate_evaluator(row, task_result, parent, **kwargs): # Check if we have previous evaluation results if not parent or not parent.evals: return None # Access evaluations from previous stage conciseness = parent.evals.get("judge:conciseness") coherence = parent.evals.get("judge:coherence") # Use the previous results avg_score = ((conciseness.score or 0) + (coherence.score or 0)) / 2 return EvaluationResult(score=avg_score, pass_=avg_score > 0.7) ``` ## Evaluator Weights (Experiments Only) Experiments Feature Evaluator weights are only supported when using evaluators within the experiment framework. This feature is not available for standalone evaluator usage. You can assign weights to evaluators to indicate their relative importance in your evaluation strategy. Weights can be provided as either strings or floats representing valid decimal numbers and are automatically stored as experiment metadata. Weights work consistently across all evaluator types but are configured differently depending on whether you're using remote evaluators, function-based evaluators, or class-based evaluators. ### Weight Support by Evaluator Type Each evaluator type handles weight configuration differently: #### Remote Evaluators For remote evaluators, pass the `weight` parameter directly to the `RemoteEvaluator` constructor: ```python from patronus.evals import RemoteEvaluator from patronus.experiments import run_experiment # Remote evaluator with weight (string or float) pii_evaluator = RemoteEvaluator("pii", "patronus:pii:1", weight="0.6") conciseness_evaluator = RemoteEvaluator("judge", "patronus:is-concise", weight=0.4) experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[pii_evaluator, conciseness_evaluator] ) ``` #### Function-Based Evaluators For function-based evaluators, pass the `weight` parameter to the `FuncEvaluatorAdapter` that wraps your evaluator function: ```python from patronus import evaluator from patronus.experiments import FuncEvaluatorAdapter, run_experiment from patronus.datasets import Row @evaluator() def exact_match(row: Row, **kwargs) -> bool: return row.task_output.lower().strip() == row.gold_answer.lower().strip() # Function evaluator with weight (string or float) exact_match_weighted = FuncEvaluatorAdapter(exact_match, weight=0.7) experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[exact_match_weighted] ) ``` #### Class-Based Evaluators For class-based evaluators, pass the `weight` parameter to your evaluator's constructor and ensure it's passed to the parent class: ```python from typing import Union from patronus import StructuredEvaluator, EvaluationResult from patronus.experiments import run_experiment class CustomEvaluator(StructuredEvaluator): def __init__(self, threshold: float, weight: Union[str, float] = None): super().__init__(weight=weight) # Pass to parent class self.threshold = threshold def evaluate(self, *, task_output: str, **kwargs) -> EvaluationResult: score = len(task_output) / 100 # Simple length-based scoring return EvaluationResult( score=score, pass_=score >= self.threshold ) # Class-based evaluator with weight (string or float) custom_evaluator = CustomEvaluator(threshold=0.5, weight=0.3) experiment = run_experiment( dataset=dataset, task=my_task, evaluators=[custom_evaluator] ) ``` ### Complete Example Here's a comprehensive example demonstrating weighted evaluators of all three types, based on the patterns shown in the experiment framework: ```python from patronus.experiments import FuncEvaluatorAdapter, run_experiment from patronus import RemoteEvaluator, EvaluationResult, StructuredEvaluator, evaluator from patronus.datasets import Row class DummyEvaluator(StructuredEvaluator): def evaluate(self, task_output: str, gold_answer: str, **kwargs) -> EvaluationResult: return EvaluationResult(score_raw=1, pass_=True) @evaluator def exact_match(row: Row, **kwargs) -> bool: return row.task_output.lower().strip() == row.gold_answer.lower().strip() experiment = run_experiment( project_name="Weighted Evaluation Example", dataset=[ { "task_input": "Please provide your contact details.", "task_output": "My email is john.doe@example.com and my phone number is 123-456-7890.", "gold_answer": "My email is john.doe@example.com and my phone number is 123-456-7890.", }, { "task_input": "Share your personal information.", "task_output": "My name is Jane Doe and I live at 123 Elm Street.", "gold_answer": "My name is Jane Doe and I live at 123 Elm Street.", }, ], evaluators=[ RemoteEvaluator("pii", "patronus:pii:1", weight="0.3"), # Remote evaluator with string weight FuncEvaluatorAdapter(exact_match, weight="0.3"), # Function evaluator with string weight DummyEvaluator(weight="0.4"), # Class evaluator with string weight ], experiment_name="Weighted Evaluators Demo" ) ``` ### Weight Validation and Rules 1. **Experiments Only**: Weights are exclusively available within the experiment framework - they cannot be used with standalone evaluator calls 1. **Valid Format**: Weights must be valid decimal numbers provided as either strings or floats (e.g., "0.3", 1.0, 0.7) 1. **Consistency**: The same evaluator (identified by its canonical name) cannot have different weights within the same experiment 1. **Automatic Storage**: Weights are automatically collected and stored in the experiment's metadata under the "evaluator_weights" key 1. **Optional**: Weights are optional - evaluators without weights will simply not have weight metadata stored 1. **Best Practice**: Consider making weights sum to 1.0 for clearer interpretation of relative importance ### Error Examples ```python # Invalid weight format - will raise TypeError RemoteEvaluator("judge", "patronus:is-concise", weight="invalid") RemoteEvaluator("judge", "patronus:is-concise", weight=[1, 2, 3]) # Lists not supported # Inconsistent weights for same evaluator - will raise TypeError during experiment run_experiment( dataset=dataset, task=my_task, evaluators=[ RemoteEvaluator("judge", "patronus:is-concise", weight=0.7), RemoteEvaluator("judge", "patronus:is-concise", weight="0.3"), # Different weight! ] ) ``` ## Best Practices When using evaluators in experiments: 1. **Use the right evaluator type for the job**: Remote evaluators for standardized assessments, custom evaluators for specialized logic 1. **Focus each evaluator on one aspect**: Create multiple focused evaluators rather than one complex evaluator 1. **Provide detailed explanations**: Include explanations to help understand evaluation results 1. **Create custom adapters when needed**: Don't force your evaluator functions to match the standard interface if there's a more natural way to express them 1. **Handle edge cases gracefully**: Consider what happens with empty inputs, very long texts, etc. 1. **Reuse evaluators across experiments**: Create a library of evaluators for consistent assessment 1. **Weight consistency across evaluator types**: When using evaluator weights, maintain consistency across experiments regardless of whether you're using remote, function-based, or class-based evaluators 1. **Consider weight distribution**: When using weights, consider making them sum to 1.0 for clearer interpretation of relative importance (e.g., "0.4", "0.3", "0.3" rather than "0.1", "0.1", "0.1") 1. **Document weight rationale**: Consider documenting why specific weights were chosen for your evaluation strategy, especially when mixing different evaluator types Next, we'll explore advanced features of the Patronus Experimentation Framework. # Introduction to Experiments The Patronus Experimentation Framework provides a systematic way to evaluate, compare, and improve Large Language Model (LLM) applications. By standardizing the evaluation process, the framework enables consistent testing across model versions, prompting strategies, and data inputs. ## What are Experiments? In Patronus, an experiment is a structured evaluation that: 1. Processes a **dataset** of examples 1. Runs each example through a **task** function (optional) 1. Evaluates the output using one or more **evaluators** 1. Records and analyzes the results This approach provides a comprehensive view of how your LLM application performs across different inputs, making it easier to identify strengths, weaknesses, and areas for improvement. ## Key Concepts ### Dataset A dataset in Patronus consists of examples that your models or systems will process. Each example, represented as a `Row` object, can contain: - Input data - Context information - Expected outputs (gold answers) - Metadata - And more... Datasets can be loaded from various sources including JSON files, CSV files, Pandas DataFrames, or defined directly in your code. ### Task A task is a function that processes each dataset example. Tasks typically: - Receive a `Row` object from the dataset - Perform some processing (like calling an LLM) - Return a `TaskResult` containing the output Tasks are optional - you can evaluate pre-existing outputs by including them directly in your dataset. ### Evaluators Evaluators assess the quality of task outputs based on specific criteria. Patronus supports various types of evaluators: - **Remote Evaluators**: Use Patronus's managed evaluation services - **Custom Evaluators**: Your own evaluation logic. - **Function-based**: Simple functions decorated with @evaluator() that need to be wrapped with FuncEvaluatorAdapter when used in experiments. - **Class-based**: More powerful evaluators created by extending `StructuredEvaluator` (synchronous) or `AsyncStructuredEvaluator` (asynchronous) base classes with predefined interfaces. Each evaluator produces an `EvaluationResult` containing scores, pass/fail status, explanations, and other metadata. **Evaluator Weights**: You can assign weights to evaluators to indicate their relative importance in your evaluation strategy. Weights are stored as experiment metadata and can be provided as either strings or floats representing valid decimal numbers. See the [Using Evaluators](../evaluators/#evaluator-weights-experiments-only) page for detailed information. ### Chains For more complex workflows, Patronus supports multi-stage evaluation chains where the output of one evaluation stage becomes the input for the next. This allows for pipeline-based approaches to LLM evaluation. ## Why Use the Experimentation Framework? The Patronus Experimentation Framework offers several advantages over ad-hoc evaluation approaches: - **Consistency**: Standardized evaluation across models and time - **Reproducibility**: Experiments can be re-run with the same configuration - **Scalability**: Process large datasets efficiently with concurrent execution - **Comprehensive Analysis**: Collect detailed metrics and explanations - **Integration**: Built-in tracing and logging with the broader Patronus ecosystem ## Example: Basic Experiment Here's a simple example of a Patronus experiment: ```python # experiment.py from patronus.evals import RemoteEvaluator from patronus.experiments import run_experiment # Define a simple task function def my_task(row, **kwargs): return f"The answer is: {row.task_input}" # Run the experiment experiment = run_experiment( dataset=[ {"task_input": "What is 2+2?", "gold_answer": "4"}, {"task_input": "Who wrote Hamlet?", "gold_answer": "Shakespeare"} ], task=my_task, evaluators=[ RemoteEvaluator("judge", "patronus:fuzzy-match") ] ) experiment.to_csv("./experiment-result.csv") ``` You can run the experiment by simply executing the python file: ```shell python ./exeriment.py ``` The output of the script should look similar to this: ```text ================================== Experiment Global/root-1742834029: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 2/2 [00:04<00:00, 2.44s/sample] patronus:fuzzy-match (judge) [link_idx=0] ----------------------------------------- Count : 2 Pass rate : 0 Mean : 0.0 Min : 0.0 25% : 0.0 50% : 0.0 75% : 0.0 Max : 0.0 Score distribution Score Range Count Histogram 0.00 - 0.20 2 #################### 0.20 - 0.40 0 0.40 - 0.60 0 0.60 - 0.80 0 0.80 - 1.00 0 ``` In the following sections, we'll explore how to set up, run, and analyze experiments in detail. # Running Experiments This page covers how to set up and run experiments using the Patronus Experimentation Framework. ## Basic Experiment Structure A Patronus experiment requires at minimum: - A dataset to process - One or more evaluators to assess outputs Additionally, most experiments will include: - A task function that processes each dataset example - Configuration options for tracing, logging, and concurrency ## Setting Up an Experiment ### The `run_experiment` Function The main entry point for the framework is the `run_experiment()` function: ```python from patronus.experiments import run_experiment experiment = run_experiment( dataset=my_dataset, # Required: What to evaluate task=my_task_function, # Optional: How to process inputs evaluators=[my_evaluator], # Required: How to assess outputs tags={"dataset-version": "v1.0"}, # Optional: Tags for the experiment max_concurrency=10, # Optional: Control parallel execution project_name="My Project", # Optional: Override the global project name experiment_name="Test Run" # Optional: Name this experiment run ) ``` ## Creating a Simple Experiment Let's walk through a complete example: ```python from patronus import evaluator, RemoteEvaluator from patronus.experiments import run_experiment, FuncEvaluatorAdapter dataset = [ { "task_input": "What is the capital of France?", "gold_answer": "Paris" }, { "task_input": "Who wrote Romeo and Juliet?", "gold_answer": "William Shakespeare" } ] # Define a task (in a real scenario, this would call an LLM) def answer_question(row, **kwargs): if "France" in row.task_input: return "The capital of France is Paris." elif "Romeo and Juliet" in row.task_input: return "Romeo and Juliet was written by William Shakespeare." return "I don't know the answer to that question." @evaluator() def contains_answer(task_result, row, **kwargs) -> bool: if not task_result or not row.gold_answer: return False return row.gold_answer.lower() in task_result.output.lower() run_experiment( dataset=dataset, task=answer_question, evaluators=[ # Use a Patronus-managed evaluator RemoteEvaluator("judge", "patronus:fuzzy-match"), # Use our custom evaluator FuncEvaluatorAdapter(contains_answer) ], tags={"model": "simulated", "version": "v1"} ) ``` ## Experiment Execution Flow When you call `run_experiment()`, the framework follows these steps: 1. **Preparation**: Initializes the experiment context and prepares the dataset 1. **Processing**: For each dataset row: 1. Runs the task function if provided 1. Passes the task output to the evaluators 1. Collects evaluation results 1. **Reporting**: Generates a summary of evaluation results 1. **Return**: Returns an `Experiment` object with the complete results ## Synchronous vs. Asynchronous Execution The `run_experiment()` function detects whether it's being called from an async context: - In a synchronous context, it will block until the experiment completes - In an async context, it returns an awaitable that can be awaited ```python # Synchronous usage: experiment = run_experiment(dataset, task, evaluators) # Asynchronous usage: experiment = await run_experiment(dataset, task, evaluators) ``` ## Manual Experiment Control For more control over the experiment lifecycle, you can create and run an experiment manually: ```python from patronus.experiments import Experiment # Create the experiment experiment = await Experiment.create( dataset=dataset, task=task, evaluators=evaluators, # Additional configuration options... ) # Run the experiment when ready experiment = await experiment.run() ``` This approach is useful when you need to perform additional setup between experiment creation and execution. ## Experiment Results After an experiment completes, you can access the results in several ways: ```python # Get a Pandas DataFrame df = experiment.to_dataframe() # Save to CSV experiment.to_csv("results.csv") # Access the built-in summary # (This is automatically printed at the end of the experiment) ``` The experiment results include: - Inputs from the dataset - Task outputs - Evaluation scores and pass/fail statuses - Explanations and metadata - Performance timing information In the next sections, we'll explore datasets, tasks, and evaluators in more detail. # Creating Tasks Tasks in Patronus experiments are functions that process each dataset example and produce outputs that will be evaluated. This page covers how to create and use tasks effectively. ## Task Function Basics A task function receives a dataset row and produces an output. The simplest task functions look like this: ```python def simple_task(row, **kwargs): # Process the input from the row input_text = row.task_input # Generate an output (typically a score between 0 and 1) quality_score = 0.85 # Return the output as a float return quality_score ``` The framework automatically converts numeric outputs to `TaskResult` objects. ## Task Function Parameters Task functions always receive these parameters: - `row`: Row - The dataset example to process - `parent`: EvalParent - Information from previous chain stages (if any) - `tags`: Tags - Tags associated with the experiment and dataset - `**kwargs`: Additional keyword arguments Here's a more complete task function: ```python from patronus.datasets import Row from patronus.experiments.types import EvalParent def complete_task( row: Row, parent: EvalParent = None, tags: dict[str, str] = None, **kwargs ): # Access dataset fields input_text = row.task_input context = row.task_context system_prompt = row.system_prompt gold_answer = row.gold_answer # Access parent information (from previous chain steps) previous_output = None if parent and parent.task: previous_output = parent.task.output # Access tags model_name = tags.get("model_name", "default") # Generate output (in real usage, this would call an LLM) output = f"Model {model_name} processed: {input_text}" # Return the output return output ``` ## Return Types Task functions can return several types: ### String Output Here's an improved example for the string return type section that demonstrates a classification task: ```python def classify_sentiment(row: Row, **kwargs) -> str: # Extract the text to classify text = row.task_input # Simple rule-based sentiment classifier positive_words = ["good", "great", "excellent", "happy", "positive"] negative_words = ["bad", "terrible", "awful", "sad", "negative"] text_lower = text.lower() positive_count = sum(word in text_lower for word in positive_words) negative_count = sum(word in text_lower for word in negative_words) # Classify based on word counts if positive_count > negative_count: return "positive" elif negative_count > positive_count: return "negative" else: return "neutral" ``` The string output represents a specific classification category, which is a common pattern in text classification tasks. ### Numeric Output (Float/Int) For score-based outputs: ```python def score_task(row: Row, **kwargs) -> float: # Calculate a relevance score between 0 and 1 return 0.92 ``` ### TaskResult Object For more control, return a TaskResult object: ```python from patronus.experiments.types import TaskResult def task_result(row: Row, **kwargs) -> TaskResult: # Generate output output = f"Processed: {row.task_input}" # Include metadata about the processing metadata = { "processing_time_ms": 42, "confidence": 0.95, "tokens_used": 150 } # Add tags for filtering and organization tags = { "model": "gpt-4", "temperature": "0.7" } # Return a complete TaskResult return TaskResult( output=output, metadata=metadata, tags=tags ) ``` ### None / Skipping Examples Return `None` to skip processing this example: ```python def selective_task(row: Row, **kwargs) -> None: # Skip examples without the required fields if not row.task_input or not row.gold_answer: return None # Process valid examples return f"Processed: {row.task_input}" ``` ## Calling LLMs A common use of tasks is to generate outputs using Large Language Models: ```python from openai import OpenAI from patronus.datasets import Row from patronus.experiments.types import TaskResult oai = OpenAI() def openai_task(row: Row, **kwargs) -> TaskResult: # Prepare the input for the model system_message = row.system_prompt or "You are a helpful assistant." messages = [ {"role": "system", "content": system_message}, {"role": "user", "content": row.task_input} ] # Call the OpenAI API response = oai.chat.completions.create( model="gpt-4", messages=messages, temperature=0.7, max_tokens=150 ) # Extract the output output = response.choices[0].message.content # Include metadata about the call metadata = { "model": response.model, "tokens": { "prompt": response.usage.prompt_tokens, "completion": response.usage.completion_tokens, "total": response.usage.total_tokens } } return TaskResult( output=output, metadata=metadata ) ``` ## Async Tasks For better performance, especially with API calls, you can use async tasks: ```python import asyncio from openai import AsyncOpenAI from patronus.datasets import Row from patronus.experiments.types import TaskResult oai = AsyncOpenAI() async def async_openai_task( row: Row, parent: EvalParent = None, tags: dict[str, str] = None, **kwargs ) -> TaskResult: # Create async client # Prepare the input system_message = row.system_prompt or "You are a helpful assistant." messages = [ {"role": "system", "content": system_message}, {"role": "user", "content": row.task_input} ] # Call the OpenAI API asynchronously response = await oai.chat.completions.create( model="gpt-4", messages=messages, temperature=0.7, max_tokens=150 ) # Extract and return the output output = response.choices[0].message.content return TaskResult( output=output, metadata={"model": response.model} ) ``` The Patronus framework automatically handles both synchronous and asynchronous tasks. ## Using Parent Information In multi-stage chains, tasks can access the results of previous stages: ```python from patronus.datasets import Row from patronus.experiments.types import EvalParent def second_stage_task( row: Row, parent: EvalParent, tags: dict[str, str] = None, **kwargs ) -> str: # Access previous task output if parent and parent.task: previous_output = parent.task.output return f"Building on previous output: {previous_output}" # Fallback if no previous output return f"Starting fresh: {row.task_input}" ``` ## Error Handling Task functions should handle exceptions appropriately: ```python from patronus import get_logger from patronus.datasets import Row def robust_task(row: Row, **kwargs): try: # Attempt to process if row.task_input: return f"Processed: {row.task_input}" else: # Skip if input is missing return None except Exception as e: # Log the error get_logger().exception(f"Error processing row {row.sid}: {e}") # Skip this example return None ``` If an unhandled exception occurs, the experiment will log the error and skip that example. ## Task Tracing Tasks are automatically traced with the Patronus tracing system. You can add additional tracing: ```python from patronus.tracing import start_span from patronus.datasets import Row def traced_task(row: Row, **kwargs): # Outer span is created automatically by the framework # Create spans for subtasks with start_span("Preprocessing"): # Preprocessing logic... preprocessed = preprocess(row.task_input) with start_span("Model Call"): # Model call logic... output = call_model(preprocessed) with start_span("Postprocessing"): # Postprocessing logic... final_output = postprocess(output) return final_output ``` This helps with debugging and performance analysis. ## Best Practices When creating task functions: 1. **Handle missing data gracefully**: Check for required fields and handle missing data 1. **Include useful metadata**: Add information about processing steps, model parameters, etc. 1. **Use async for API calls**: Async tasks significantly improve performance for API-dependent workflows 1. **Add explanatory tags**: Tags help with filtering and analyzing results 1. **Add tracing spans**: For complex processing, add spans to help with debugging and optimization 1. **Keep functions focused**: Tasks should have a clear purpose; use chains for multi-step processes Next, we'll explore how to use evaluators in experiments to assess task outputs. # Integrations # Agent Integrations The Patronus SDK provides integrations with various agent frameworks to enable observability, evaluation, and experimentation with agent-based LLM applications. ## Pydantic AI [Pydantic AI](https://ai.pydantic.dev/) is a framework for building AI agents with type-safe tools and structured outputs. The Patronus SDK provides a dedicated integration that automatically instruments all Pydantic AI agents for observability. ### Installation Make sure you have both the Patronus SDK and Pydantic AI installed: ```bash pip install patronus pydantic-ai ``` ### Usage To enable Pydantic AI integration with Patronus: ```python from patronus import init from patronus.integrations.pydantic_ai import PydanticAIIntegrator # Initialize Patronus with the Pydantic AI integration patronus_ctx = init( integrations=[PydanticAIIntegrator()] ) # Now all Pydantic AI agents will automatically send telemetry to Patronus ``` ### Configuration Options The `PydanticAIIntegrator` accepts the following parameters: - `event_mode`: Controls how agent events are captured - `"logs"` (default): Captures events as logs, which works best with the Patronus Platform - `"attributes"`: Captures events as span attributes Example with custom configuration: ```python from patronus import init from patronus.integrations.pydantic_ai import PydanticAIIntegrator patronus_ctx = init( integrations=[PydanticAIIntegrator(event_mode="logs")] ) ``` # LLM Integrations The Patronus SDK provides integrations with various LLM providers to enable observability, evaluation, and experimentation with LLM applications. ## OpenTelemetry LLM Instrumentors Patronus supports any OpenTelemetry-based LLM instrumentation. This allows you to easily capture telemetry data from your LLM interactions and send it to the Patronus platform for analysis. A popular option for LLM instrumentation is [OpenInference](https://github.com/Arize-ai/openinference), which provides instrumentors for multiple LLM providers. ### Anthropic Claude Integration To instrument Anthropic's Claude API calls: ```shell # Install the required package pip install openinference-instrumentation-anthropic ``` ```python from patronus import init from openinference.instrumentation.anthropic import AnthropicInstrumentor # Initialize Patronus with Anthropic instrumentation patronus_ctx = init( integrations=[AnthropicInstrumentor()] ) # Now all Claude API calls will be automatically instrumented # and the telemetry will be sent to Patronus ``` ### OpenAI Integration To instrument OpenAI API calls: ```shell # Install the required package pip install openinference-instrumentation-openai ``` ```python from patronus import init from openinference.instrumentation.openai import OpenAIInstrumentor # Initialize Patronus with OpenAI instrumentation patronus_ctx = init( integrations=[OpenAIInstrumentor()] ) # Now all OpenAI API calls will be automatically instrumented # and the telemetry will be sent to Patronus ``` ### Using Multiple LLM Instrumentors You can combine multiple instrumentors to capture telemetry from different LLM providers: ```python from patronus import init from openinference.instrumentation.anthropic import AnthropicInstrumentor from openinference.instrumentation.openai import OpenAIInstrumentor # Initialize Patronus with multiple LLM instrumentors patronus_ctx = init( project_name="my-multi-llm-project", app="llm-application", integrations=[ AnthropicInstrumentor(), OpenAIInstrumentor() ] ) # Now both Anthropic and OpenAI API calls will be automatically instrumented ``` # Prompts # Prompt Management The Patronus SDK provides tools to version, retrieve, and render prompts in your LLM applications. ## Quick Start ### Creating a Prompt ```python import patronus import textwrap from patronus.prompts import Prompt, push_prompt patronus.init() # Create a new prompt prompt = Prompt( name="support/troubleshooting/login-issues", body=textwrap.dedent(""" You are a support specialist for {product_name}. ISSUE: {issue_description} TIER: {subscription_tier} Provide a solution for this {issue_type} problem. Be concise. Include steps and end with an offer for further help. """), description="Support prompt for login issues", metadata={"temperature": 0.7, "tone": "helpful"} ) # Push the prompt to Patronus loaded_prompt = push_prompt(prompt) # Render the prompt rendered = prompt.render( issue_description="Cannot log in with correct credentials", product_name="CloudWorks", subscription_tier="Business", issue_type="authentication" ) print(rendered) ``` ### Loading a Prompt ```python import patronus from patronus.prompts import load_prompt patronus.init() # Get the latest version of the prompt we just created prompt = load_prompt(name="support/troubleshooting/login-issues") # Access metadata print(prompt.metadata) # Render the prompt with different parameters rendered = prompt.render( issue_description="Password reset link not working", product_name="CloudWorks", subscription_tier="Enterprise", issue_type="password reset" ) print(rendered) ``` ## Loading Prompts Use `load_prompt` to retrieve prompts from the Patronus platform: ```python import patronus from patronus.prompts import load_prompt patronus.init() # Load an instruction prompt that doesn't need any parameters prompt = load_prompt(name="content/writing/blog-instructions") rendered = prompt.render() print(rendered) ``` For async applications: ```python from patronus.prompts import aload_prompt prompt = await aload_prompt(name="content/writing/blog-instructions") ``` ### Loading Specific Versions Retrieve prompts by revision number or label: ```python # Load a specific revision prompt = load_prompt(name="content/blog/technical-explainer", revision=3) # Load by label (production environment) prompt = load_prompt(name="legal/contracts/privacy-policy", label="production") ``` ## Creating and Updating Prompts Create new prompts using `push_prompt`: ````python from patronus.prompts import Prompt, push_prompt new_prompt = Prompt( name="dev/bug-fix/python-error", body="Fix this Python code error: {error_message}. Code: ```python\n{code_snippet}\n```", description="Template for Python debugging assistance", metadata={ "creator": "dev-team", "temperature": 0.7, "max_tokens": 250 } ) loaded_prompt = push_prompt(new_prompt) ```` For async applications: ```python from patronus.prompts import apush_prompt loaded_prompt = await apush_prompt(new_prompt) ``` The `push_prompt` function automatically handles duplicate detection - if a prompt with identical content already exists, it returns the existing revision instead of creating a new one. ## Rendering Prompts Render prompts with variables: ```python rendered = prompt.render(user_query="How do I optimize database performance?", expertise_level="intermediate") ``` ### Template Engines Patronus supports multiple template engines: ```python # F-string templating (default) rendered = prompt.with_engine("f-string").render(**kwargs) # Mustache templating rendered = prompt.with_engine("mustache").render(**kwargs) # Jinja2 templating rendered = prompt.with_engine("jinja2").render(**kwargs) ``` ## Working with Labels Labels provide stable references to specific revisions: ```python from patronus import context client = context.get_api_client().prompts # Add audience-specific labels client.add_label( prompt_id="prompt_123", revision=3, label="technical-audience" ) # Update label to point to a new revision client.add_label( prompt_id="prompt_123", revision=5, label="technical-audience" ) # Add environment label client.add_label( prompt_id="prompt_456", revision=2, label="production" ) ``` ## Metadata Usage Prompt revisions support arbitrary metadata: ```python from patronus.prompts import Prompt, push_prompt, load_prompt # Create with metadata prompt_with_meta = Prompt( name="research/data-analysis/summarize-findings", body="Analyze the {data_type} data and summarize the key {metric_type} trends in {time_period}.", metadata={ "models": ["gpt-4", "claude-3"], "created_by": "data-team", "tags": ["data", "analysis"] } ) loaded_prompt = push_prompt(prompt_with_meta) # Access metadata prompt = load_prompt(name="research/data-analysis/summarize-findings") supported_models = prompt.metadata.get("models", []) creator = prompt.metadata.get("created_by", "unknown") print(f"Prompt supports models: {', '.join(supported_models)}") print(f"Created by: {creator}") ``` ## Using Multiple Prompts Together Complex applications often use multiple prompts together: ```python import patronus from patronus.prompts import load_prompt import openai patronus.init() # Load different prompt components system_prompt = load_prompt(name="support/chat/system") user_query_template = load_prompt(name="support/chat/user-message") response_formatter = load_prompt(name="support/chat/response-format") # Create OpenAI client client = openai.OpenAI() # Combine the prompts in a chat completion response = client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": system_prompt.render( product_name="CloudWorks Pro", available_features=["file sharing", "collaboration", "automation"], knowledge_cutoff="2024-05-01" )}, {"role": "user", "content": user_query_template.render( user_name="Alex", user_tier="premium", user_query="How do I share files with external users?" )} ], temperature=0.7, max_tokens=500 ) # Post-process the response using another prompt formatted_response = response_formatter.render( raw_response=response.choices[0].message.content, user_name="Alex", add_examples=True ) ``` ## Naming Conventions Use a descriptive, hierarchical naming structure similar to file paths. This makes prompts easier to organize, find, and manage: ```text [domain]/[use-case]/[component]/[prompt-type] ``` Where `[prompt-type]` indicates the intended role of the prompt in an LLM conversation (optional but recommended): - `system` - Sets the overall behavior, persona, or context for the model - `instruction` - Provides specific instructions for a task - `user` - Represents a user message template - `assistant` - Template for assistant responses - `few-shot` - Contains examples of input/output pairs Examples: - `support/troubleshooting/diagnostic-questions/system` - `marketing/email-campaigns/follow-up-template/instruction` - `dev/code-generation/python-function/instruction` - `finance/report/quarterly-analysis` - `content/blog-post/technical-tutorial/few-shot` - `legal/contracts/terms-of-service-v2/system` Including the prompt type in the name helps team members quickly understand the intended usage context in multi-prompt conversations. ### Consistent Prefixes Use consistent prefixes for prompts that work together in the same feature: ```text # Onboarding chat prompts share the prefix onboarding/chat/ onboarding/chat/welcome/system onboarding/chat/questions/user onboarding/chat/intro/assistant # Support classifier prompts support/classifier/system support/classifier/categories/instruction ``` This approach simplifies filtering and management of related prompts, making it easier to maintain and evolve complete prompt flows as your library grows. ## Configuration The default template engine can be configured during initialization: ```python import patronus patronus.init( # Default template engine for all prompts prompt_templating_engine="mustache" ) ``` For additional configuration options, see the [Configuration](../configuration/) page. ## Using with LLMs Prompts can be used with any LLM provider: ```python import patronus from patronus.prompts import load_prompt import anthropic patronus.init() system_prompt = load_prompt(name="support/knowledge-base/technical-assistance") client = anthropic.Anthropic() response = client.messages.create( model="claude-3-opus-20240229", system=system_prompt.render( product_name="CloudWorks Pro", user_tier="enterprise", available_features=["advanced monitoring", "auto-scaling", "SSO integration"] ), messages=[ {"role": "user", "content": "How do I configure the load balancer for high availability?"} ] ) ``` ## Additional Resources While the SDK provides high-level, convenient access to Patronus functionality, you can also use the lower-level APIs for more direct control: - [REST API documentation](https://docs.patronus.ai/docs/api_ref) - For direct HTTP access to the Patronus platform - [Patronus API Python library](https://github.com/patronus-ai/patronus-api-python) - A typed Python client for the REST API with both synchronous and asynchronous support # Configuration # Configuration The Patronus Experimentation Framework offers several configuration options that can be set in the following ways: 1. Through function parameters (in code) 1. Environment variables 1. YAML configuration file Configuration options are prioritized in the order listed above, meaning that if a configuration value is provided through function parameters, it will override values from environment variables or the YAML file. ## Configuration Options | Config name | Environment Variable | Default Value | | --- | --- | --- | | service | PATRONUS_SERVICE | Defaults to value retrieved from `OTEL_SERVICE_NAME` env var or `platform.node()`. | | project_name | PATRONUS_PROJECT_NAME | `Global` | | app | PATRONUS_APP | `default` | | api_key | PATRONUS_API_KEY | | | api_url | PATRONUS_API_URL | `https://api.patronus.ai` | | ui_url | PATRONUS_UI_URL | `https://app.patronus.ai` | | otel_endpoint | PATRONUS_OTEL_ENDPOINT | `https://otel.patronus.ai:4317` | | otel_exporter_otlp_protocol | PATRONUS_OTEL_EXPORTER_OTLP_PROTOCOL | Falls back to OTEL env vars, defaults to `grpc` | | timeout_s | PATRONUS_TIMEOUT_S | `300` | | prompt_templating_engine | PATRONUS_PROMPT_TEMPLATING_ENGINE | `f-string` | | prompt_providers | PATRONUS_PROMPT_PROVIDERS | `["local", "api"]` | | resource_dir | PATRONUS_RESOURCE_DIR | `./patronus` | ## Configuration Methods ### 1. Function Parameters You can provide configuration options directly through function parameters when calling key Patronus functions. #### Using init() Use the `init()` function when you need to set up the Patronus SDK for evaluations, logging, and tracing outside of experiments. This initializes the global context used by the SDK. ```python import patronus # Initialize with specific configuration patronus.init( project_name="my-project", app="recommendation-service", api_key="your-api-key", api_url="https://api.patronus.ai", service="my-service", prompt_templating_engine="mustache" ) ``` #### Using run_experiment() or Experiment.create() Use these functions when running experiments. They handle their own initialization, so you don't need to call `init()` separately. Experiments create their own context scoped to the experiment. ```python from patronus import run_experiment # Run experiment with specific configuration experiment = run_experiment( dataset=my_dataset, task=my_task, evaluators=[my_evaluator], project_name="my-project", api_key="your-api-key", service="my-service" ) ``` ### 2. Environment Variables You can set configuration options using environment variables with the prefix `PATRONUS_`: ```bash export PATRONUS_API_KEY="your-api-key" export PATRONUS_PROJECT_NAME="my-project" export PATRONUS_SERVICE="my-service" ``` ### 3. YAML Configuration File (`patronus.yaml`) You can also provide configuration options using a `patronus.yaml` file. This file must be present in the working directory when executing your script. ```yaml service: "my-service" project_name: "my-project" app: "my-agent" api_key: "YOUR_API_KEY" api_url: "https://api.patronus.ai" ui_url: "https://app.patronus.ai" otel_endpoint: "https://otel.patronus.ai:4317" otel_exporter_otlp_protocol: "grpc" # or "http/protobuf" timeout_s: 300 # Prompt management configuration prompt_templating_engine: "mustache" prompt_providers: [ "local", "api" ] resource_dir: "./my-resources" ``` ## Configuration Precedence When determining the value for a configuration option, Patronus follows this order of precedence: 1. Function parameter values (highest priority) 1. Environment variables 1. YAML configuration file 1. Default values (lowest priority) For example, if you provide `project_name` as a function parameter and also have it defined in your environment variables and YAML file, the function parameter value will be used. ## Programmatic Configuration Access For more advanced use cases, you can directly access the configuration system through the Config class and the config() function: ```python from patronus.config import config # Access the configuration singleton cfg = config() # Read configuration values api_key = cfg.api_key project_name = cfg.project_name # Check for specific conditions if cfg.api_url != "https://api.patronus.ai": print("Using custom API endpoint") ``` This approach is particularly useful when you need to inspect or log the current configuration state. ## Observability Configuration For detailed information about configuring observability features like tracing and logging, including exporter protocol selection and endpoint configuration, see the [Observability Configuration](../observability/configuration/) guide. # Examples # Examples Examples of how to use Patronus and what it can do. ## Usage These examples demonstrate common use cases and integration patterns for Patronus. ### Setting required environment variables Most examples require you to set up authentication with Patronus and other services. In most cases, you'll need to set the following environment variables: ```bash export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key ``` Some examples may require additional API keys (like `ANTHROPIC_API_KEY`). ### Running Examples There are three ways to run the examples: #### 1. Running with `uv` You can run examples with `uv`, which automatically installs the required dependencies: ```bash # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[smolagents]" \ -m patronus_examples.tracking.smolagents_weather ``` This installs the `patronus-examples` package with the necessary optional dependencies. #### 2. Pulling the repository and executing the scripts directly You can clone the repository and run the scripts directly: ```bash # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/smolagents_weather.py ``` See the script files for more information. They use uv script annotations to handle dependencies. #### 3. Copy and paste example You can copy the example code into your own project and install the dependencies with any package manager of your choice. Each example file includes a list of required dependencies at the top of the document. ## Available Examples Patronus provides examples for various LLM frameworks and direct API integrations: ### Direct LLM API Integrations - [OpenAI Weather Example](openai-weather/) - Simple example of tracing OpenAI API calls - [Anthropic Weather Example](anthropic-weather/) - Simple example of tracing Anthropic API calls ### Agent Frameworks - [Smolagents Weather](smolagents-weather/) - Using Patronus with Smolagents - [PydanticAI Weather](pydanticai-weather/) - Using Patronus with PydanticAI - [OpenAI Agents Weather](openai-agents-weather/) - Using Patronus with OpenAI Agents - [LangChain Weather](langchain-weather/) - Using Patronus with LangChain and LangGraph - [CrewAI Weather](crewai-weather/) - Using Patronus with CrewAI Each example demonstrates: - How to set up Patronus integrations with the specific framework - How to trace LLM calls and tool usage - How to analyze the execution flow of your application All examples follow a similar pattern using a weather application to make it easy to compare the different frameworks. ### Advanced Examples - [Manual OpenTelemetry with OpenAI](otel-openai-weather/) - An example showing how to use OpenTelemetry directly without Patronus SDK ## Running the example To run this example, you need to add API keys to your environment: ```shell export PATRONUS_API_KEY=your-api-key export ANTHROPIC_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[anthropic]" \ -m patronus_examples.tracking.anthropic_weather ``` ### Running the script directly If you've cloned the repository, you can run the script directly: ```shell # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/anthropic_weather.py ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install patronus pip install anthropic pip install openinference-instrumentation-anthropic ``` ## Example overview This example demonstrates how to use Patronus to trace Anthropic API calls when implementing a simple weather application. The application: 1. Uses the Anthropic Claude API to parse a user question about weather 1. Extracts location coordinates from the LLM's output through Claude's tool calling 1. Calls a weather API to get actual temperature data 1. Returns the result to the user The example shows how Patronus can help you monitor and debug Anthropic API interactions, track tool usage, and visualize the entire application flow. ## Example code ```python # examples/patronus_examples/tracking/anthropic_weather.py import requests import anthropic from openinference.instrumentation.anthropic import AnthropicInstrumentor import patronus # Initialize patronus with Anthropic Instrumentor patronus.init(integrations=[AnthropicInstrumentor()]) def get_weather(latitude, longitude): response = requests.get( f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}¤t=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m" ) data = response.json() return data["current"]["temperature_2m"] def get_client(): client = anthropic.Anthropic() return client @patronus.traced() def call_llm(client, user_prompt): tools = [ { "name": "get_weather", "description": "Get current temperature for provided coordinates in celsius.", "input_schema": { "type": "object", "properties": { "latitude": {"type": "number"}, "longitude": {"type": "number"}, }, "required": ["latitude", "longitude"], }, } ] response = client.messages.create( model="claude-3-7-sonnet-20250219", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": user_prompt}], ) return response @patronus.traced("anthropic-weather") def main(): user_prompt = "What's the weather like in Paris today?" client = get_client() response = call_llm(client, user_prompt) print("LLM Response") print(response.model_dump_json()) weather_response = None if response.content: for content_block in response.content: if content_block.type == "tool_use" and content_block.name == "get_weather": kwargs = content_block.input print("Weather API Response") weather_response = get_weather(**kwargs) print(weather_response) if weather_response: print(user_prompt) print(f"Answer: {weather_response}") if __name__ == "__main__": main() ``` ## Running the example To run this example, you need to add API keys to your environment: ```shell export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[crewai]" \ -m patronus_examples.tracking.crewai_weather ``` ### Running the script directly If you've cloned the repository, you can run the script directly: ```shell # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/crewai_weather.py ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install patronus pip install crewai pip install openinference.instrumentation.crewai pip install opentelemetry-instrumentation-threading pip install opentelemetry-instrumentation-asyncio ``` ## Example overview This example demonstrates how to use Patronus to trace and monitor CrewAI agents in a weather application. The example: 1. Sets up a specialized Weather Information Specialist agent with a custom weather tool 1. Creates a manager agent that coordinates information requests 1. Defines tasks for each agent to perform 1. Configures a hierarchical workflow using the CrewAI Crew construct 1. Traces the entire execution flow with Patronus The example shows how Patronus integrates with CrewAI to provide visibility into agent interactions, tool usage, and the hierarchical task execution process. ## Example code ```python # examples/patronus_examples/tracking/crewai_weather.py from crewai import Agent, Task, Crew, Process from crewai.tools import BaseTool from opentelemetry.instrumentation.threading import ThreadingInstrumentor from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor from openinference.instrumentation.crewai import CrewAIInstrumentor import patronus patronus.init( integrations=[CrewAIInstrumentor(), ThreadingInstrumentor(), AsyncioInstrumentor()] ) # Create a custom tool for weather information class WeatherTool(BaseTool): name: str = "get_weather_api" description: str = "Returns the weather report for a specific location" def _run(self, location: str) -> str: """ Returns the weather report. Args: location: the name of the place that you want the weather for. Should be a place name, followed by possibly a city name, then a country, like "Anchor Point, Taghazout, Morocco". Returns: The weather report. """ temperature_celsius, risk_of_rain, wave_height = 10, 0.5, 4 # mock outputs return f"Weather report for {location}: Temperature will be {temperature_celsius}°C, risk of rain is {risk_of_rain * 100:.0f}%, wave height is {wave_height}m." # Initialize weather tool weather_tool = WeatherTool() # Define agents weather_agent = Agent( role="Weather Information Specialist", goal="Provide accurate weather information for specific locations and times", backstory="""You are a weather information specialist that must call the available tool to get the most recent reports""", verbose=False, allow_delegation=False, tools=[weather_tool], max_iter=5, ) manager_agent = Agent( role="Information Manager", goal="Coordinate information requests and delegate to specialized agents", backstory="""You manage and coordinate information requests, delegating specialized queries to the appropriate experts. You ensure users get the most accurate and relevant information.""", verbose=False, allow_delegation=True, max_iter=10, ) # Create tasks weather_task = Task( description="""Find out the current weather at a specific location.""", expected_output="Complete weather report with temperature, rain and wave height information", agent=weather_agent, ) manager_task = Task( description="""Process the user query about weather in Paris, France. Ensure the weather information is complete (with temperature, rain and wave height) and properly formatted. You must coordinate with the weather agent for this task.""", expected_output="Weather report for Paris", agent=manager_agent, ) # Instantiate crew with a sequential process crew = Crew( agents=[weather_agent], tasks=[manager_task, weather_task], verbose=False, manager_agent=manager_agent, process=Process.hierarchical, ) @patronus.traced("weather-crew-ai") def main(): result = crew.kickoff() print(result) if __name__ == "__main__": main() ``` ## Running the example To run this example, you need to add API keys to your environment: ```shell export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[langchain]" \ -m patronus_examples.tracking.langchain_weather ``` ### Running the script directly If you've cloned the repository, you can run the script directly: ```shell # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/langchain_weather.py ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install patronus pip install pydantic pip install langchain_openai pip install langgraph pip install langchain_core pip install openinference-instrumentation-langchain pip install opentelemetry-instrumentation-threading pip install opentelemetry-instrumentation-asyncio ``` ## Example overview This example demonstrates how to use Patronus to trace a LangChain and LangGraph workflow for a weather application. The example: 1. Sets up a StateGraph with manager and weather agent nodes 1. Implements a router to control workflow transitions 1. Uses a tool to provide mock weather data 1. Traces the entire LangChain and LangGraph execution with Patronus The example shows how Patronus can provide visibility into complex, multi-node LangGraph workflows, including tool usage and agent transitions. ## Example code ```python # examples/patronus_examples/tracking/langchain_weather.py from typing import Literal, Dict, List, Any from langchain_core.messages import ( HumanMessage, AIMessage, BaseMessage, ) from langchain_openai import ChatOpenAI from langgraph.checkpoint.memory import MemorySaver from langgraph.graph import END, StateGraph from langchain_core.tools import tool from pydantic import BaseModel, Field from openinference.instrumentation.langchain import LangChainInstrumentor from opentelemetry.instrumentation.threading import ThreadingInstrumentor from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor import patronus patronus.init( integrations=[ LangChainInstrumentor(), ThreadingInstrumentor(), AsyncioInstrumentor(), ] ) @tool def get_weather(city: str) -> str: """Get the current weather in a given city. Args: city: The name of the city to get weather for. Returns: A string describing the current weather in the city. """ return f"The weather in {city} is sunny" class MessagesState(BaseModel): """State for the manager-weather agent workflow.""" messages: List[BaseMessage] = Field(default_factory=list) current_agent: str = Field(default="manager") manager_model = ChatOpenAI(temperature=0, model="gpt-4o") weather_model = ChatOpenAI(temperature=0, model="gpt-4o") tools = [get_weather] tools_dict = {tool.name: tool for tool in tools} weather_model_with_tools = weather_model.bind_tools(tools) def manager_agent(state: MessagesState) -> Dict[str, Any]: messages = state.messages # Access as attribute # Get response from the manager model response = manager_model.invoke(messages) # Check if the manager wants to use the weather agent manager_text = response.content.lower() if "weather" in manager_text and "in" in manager_text: # Delegate to weather agent return { "messages": messages + [ AIMessage( content="I'll check the weather for you. Delegating to weather agent." ) ], "current_agent": "weather", } return {"messages": messages + [response], "current_agent": "manager"} # Define the weather agent node using a simpler approach def weather_agent(state: MessagesState) -> Dict[str, Any]: messages = state.messages # Access as attribute human_queries = [msg for msg in messages if isinstance(msg, HumanMessage)] if not human_queries: return { "messages": messages + [AIMessage(content="I need a query about weather.")], "current_agent": "manager", } query = human_queries[-1].content try: # weather_prompt = ( # f"Extract the city name from this query and provide the weather: '{query}'" # ) city_match = None # Common cities that might be mentioned common_cities = [ "Paris", "London", "New York", "Tokyo", "Berlin", "Rome", "Madrid", ] for city in common_cities: if city.lower() in query.lower(): city_match = city break if city_match: weather_result = get_weather.invoke(city_match) weather_response = ( f"I checked the weather for {city_match}. {weather_result}" ) else: if "weather in " in query.lower(): parts = query.lower().split("weather in ") if len(parts) > 1: city_match = parts[1].strip().split()[0].capitalize() weather_result = get_weather.invoke(city_match) weather_response = ( f"I checked the weather for {city_match}. {weather_result}" ) else: weather_response = ( "I couldn't identify a specific city in your query." ) else: weather_response = "I couldn't identify a specific city in your query." return { "messages": messages + [AIMessage(content=f"Weather Agent: {weather_response}")], "current_agent": "manager", } except Exception as e: error_message = f"I encountered an error while checking the weather: {str(e)}" return { "messages": messages + [AIMessage(content=f"Weather Agent: {error_message}")], "current_agent": "manager", } def router(state: MessagesState) -> Literal["manager", "weather", END]: if len(state.messages) > 10: # Prevent infinite loops return END # Route based on current_agent if state.current_agent == "weather": return "weather" elif state.current_agent == "manager": # Check if the last message is from the manager and indicates completion if len(state.messages) > 0 and isinstance(state.messages[-1], AIMessage): if "delegating to weather agent" not in state.messages[-1].content.lower(): return END return "manager" workflow = StateGraph(MessagesState) workflow.add_node("manager", manager_agent) workflow.add_node("weather", weather_agent) workflow.set_entry_point("manager") workflow.add_conditional_edges("manager", router) workflow.add_conditional_edges("weather", router) checkpointer = MemorySaver() app = workflow.compile(checkpointer=checkpointer) def run_workflow(query: str): initial_state = MessagesState( messages=[HumanMessage(content=query)], current_agent="manager" ) config = {"configurable": {"thread_id": "weather_demo_thread"}} final_state = app.invoke(initial_state, config=config) for message in final_state["messages"]: if isinstance(message, HumanMessage): print(f"Human: {message.content}") elif isinstance(message, AIMessage): print(f"AI: {message.content}") else: print(f"Other: {message.content}") return final_state @patronus.traced("weather-langchain") def main(): final_state = run_workflow("What is the weather in Paris?") return final_state if __name__ == "__main__": main() ``` ## Running the example To run this example, you need to add API keys to your environment: ```shell export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[openai-agents]" \ -m patronus_examples.tracking.openai_agents_weather ``` ### Running the script directly If you've cloned the repository, you can run the script directly: ```shell # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/openai_agents_weather.py ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install patronus pip install openai-agents pip install openinference-instrumentation-openai-agents pip install opentelemetry-instrumentation-threading pip install opentelemetry-instrumentation-asyncio ``` ## Example overview This example demonstrates how to use Patronus to trace and monitor OpenAI Agents in an asynchronous weather application. The example: 1. Sets up a weather agent with a function tool to retrieve weather information 1. Creates a manager agent that can delegate to the weather agent 1. Handles the workflow using the OpenAI Agents Runner 1. Traces the entire agent execution flow with Patronus The example shows how Patronus integrates with OpenAI Agents to provide visibility into agent hierarchies, tool usage, and asynchronous workflows. ## Example code ```python # examples/patronus_examples/tracking/openai_agents_weather.py from agents import Agent, Runner, function_tool from openinference.instrumentation.openai_agents import OpenAIAgentsInstrumentor from opentelemetry.instrumentation.threading import ThreadingInstrumentor from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor import asyncio import patronus patronus.init( integrations=[ OpenAIAgentsInstrumentor(), ThreadingInstrumentor(), AsyncioInstrumentor(), ] ) @function_tool def get_weather(city: str) -> str: return f"The weather in {city} is sunny" def get_agents(tools=[]): weather_agent = Agent( name="weather_agent", instructions="You are a helpful assistant that can call tools and return weather related information", model="o3-mini", tools=tools, ) manager_agent = Agent( name="manager_agent", instructions="You are a helpful assistant that can call other agents to accomplish different tasks", model="o3-mini", handoffs=[weather_agent], ) return manager_agent @patronus.traced("weather-openai-agent") async def main(): manager_agent = get_agents([get_weather]) result = await Runner.run(manager_agent, "How is the weather in Paris, France?") return result.final_output if __name__ == "__main__": print("Starting agent...") result = asyncio.run(main()) print(result) ``` ## Running the example To run this example, you need to add API keys to your environment: ```shell export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[openai]" \ -m patronus_examples.tracking.openai_weather ``` ### Running the script directly If you've cloned the repository, you can run the script directly: ```shell # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/openai_weather.py ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install patronus pip install openai pip install openinference-instrumentation-openai ``` ## Example overview This example demonstrates how to use Patronus to trace OpenAI API calls when implementing a simple weather application. The application: 1. Uses the OpenAI API to parse a user question about weather 1. Extracts location coordinates from the LLM's output 1. Calls a weather API to get actual temperature data 1. Returns the result to the user The example shows how Patronus can help you monitor and debug OpenAI API interactions, track tool usage, and visualize the entire application flow. ## Example code ```python # examples/patronus_examples/tracking/openai_weather.py import json import requests from openai import OpenAI from openinference.instrumentation.openai import OpenAIInstrumentor import patronus # Initialize patronus with OpenAI Instrumentor patronus.init(integrations=[OpenAIInstrumentor()]) def get_weather(latitude, longitude): response = requests.get( f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}¤t=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m" ) data = response.json() return data["current"]["temperature_2m"] def get_client(): client = OpenAI() return client @patronus.traced() def call_llm(client, user_prompt): tools = [ { "type": "function", "name": "get_weather", "description": "Get current temperature for provided coordinates in celsius.", "parameters": { "type": "object", "properties": { "latitude": {"type": "number"}, "longitude": {"type": "number"}, }, "required": ["latitude", "longitude"], "additionalProperties": False, }, "strict": True, } ] input_messages = [{"role": "user", "content": user_prompt}] response = client.responses.create( model="gpt-4.1", input=input_messages, tools=tools, ) return response @patronus.traced("openai-weather") def main(): user_prompt = "What's the weather like in Paris today?" client = get_client() response = call_llm(client, user_prompt) print("LLM Response") print(response.model_dump_json()) weather_response = None if response.output: output = response.output[0] if output.type == "function_call" and output.name == "get_weather": kwargs = json.loads(output.arguments) print("Weather API Response") weather_response = get_weather(**kwargs) print(weather_response) if weather_response: print(user_prompt) print(f"Answer: {weather_response}") if __name__ == "__main__": main() ``` ## Manual OpenTelemetry Tracing Example This example demonstrates how to use OpenTelemetry (OTel) directly with OpenInference instrumenters to trace a simple OpenAI weather application **without** using Patronus SDK. This shows how to implement manual instrumentation combined with automatic instrumenters. ## Running the example To run this example, you need to add your OpenAI API key to your environment: ```shell export OPENAI_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example uv run --no-cache --with "patronus-examples opentelemetry-api>=1.31.0 opentelemetry-sdk>=1.31.0 opentelemetry-exporter-otlp>=1.31.0 openinference-instrumentation-openai>=0.1.28 openai httpx>=0.27.0" \ -m patronus_examples.tracking.otel_openai_weather ``` ### Running with Patronus OTel collector To export traces to Patronus OTel collector, set these additional environment variables: ```shell export PATRONUS_API_KEY=your-api-key export OTEL_EXPORTER_OTLP_ENDPOINT="https://otel.patronus.ai:4317" export OTEL_EXPORTER_OTLP_HEADERS="x-api-key=$PATRONUS_API_KEY" ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install openai pip install opentelemetry-api pip install opentelemetry-sdk pip install opentelemetry-exporter-otlp pip install openinference-instrumentation-openai pip install httpx ``` ## Example overview This example demonstrates how to combine manual OpenTelemetry instrumentation with OpenInference auto-instrumentation for an OpenAI-based weather application. The application: 1. Sets up a complete OpenTelemetry tracing pipeline 1. Initializes OpenInference instrumenter for OpenAI 1. Calls the OpenAI API which is automatically traced by OpenInference 1. Adds additional manual spans for non-OpenAI components 1. Makes an instrumented HTTP request using httpx to a weather API 1. Records all relevant attributes and events in spans The example shows how to: - Configure an OpenTelemetry TracerProvider - Set up either console or OTLP exporters - Initialize OpenInference instrumenters with OpenTelemetry - Create nested manual spans for tracking operations - Use httpx for HTTP requests with proper tracing - Add attributes to spans for better observability - Handle errors and exceptions in spans ## Example code ```python # examples/patronus_examples/tracking/otel_openai_weather.py import json import os import httpx from openai import OpenAI # OpenTelemetry imports from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from opentelemetry.semconv.resource import ResourceAttributes # Import OpenInference instrumenter for OpenAI from openinference.instrumentation.openai import OpenAIInstrumentor # Configure OpenTelemetry resource = Resource(attributes={ ResourceAttributes.SERVICE_NAME: "openai-weather-app", ResourceAttributes.SERVICE_VERSION: "0.1.0", }) # Initialize the trace provider with the resource trace_provider = TracerProvider(resource=resource) # If OTEL_EXPORTER_OTLP_ENDPOINT is not set, we'll use console exporter # Otherwise, we'll use OTLP exporter for sending to the Patronus collector if os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT"): # Configure OTLPSpanExporter # The environment variables OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS # should be set before running this example otlp_exporter = OTLPSpanExporter() trace_provider.add_span_processor(BatchSpanProcessor(otlp_exporter)) else: # For local development/testing we can use ConsoleSpanExporter from opentelemetry.sdk.trace.export import ConsoleSpanExporter trace_provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter())) # Set the provider trace.set_tracer_provider(trace_provider) # Initialize OpenInference instrumenter for OpenAI # This will automatically instrument all OpenAI API calls openai_instrumentor = OpenAIInstrumentor() openai_instrumentor.instrument() # Get a tracer for our manual spans tracer = trace.get_tracer("openai.weather.example") def get_weather(latitude, longitude): """Get weather data from the Open Meteo API using httpx""" with tracer.start_as_current_span( "get_weather", attributes={ "service.name": "weather_api", "weather.latitude": latitude, "weather.longitude": longitude } ) as span: try: # Create the URL with parameters url = "https://api.open-meteo.com/v1/forecast" params = { "latitude": latitude, "longitude": longitude, "current": "temperature_2m,wind_speed_10m", "hourly": "temperature_2m,relative_humidity_2m,wind_speed_10m" } # Trace the HTTP request using httpx with tracer.start_as_current_span( "http_request", attributes={ "http.method": "GET", "http.url": url, "http.request.query": str(params) } ): # Use httpx client for the request with httpx.Client() as client: response = client.get(url, params=params) # Add response information to the span span.set_attribute("http.status_code", response.status_code) if response.status_code != 200: span.record_exception(Exception(f"Weather API returned status {response.status_code}")) span.set_status(trace.StatusCode.ERROR) return None data = response.json() temperature = data["current"]["temperature_2m"] # Add weather data to the span span.set_attribute("weather.temperature_celsius", temperature) return temperature except Exception as e: # Record the exception in the span span.record_exception(e) span.set_status(trace.StatusCode.ERROR, str(e)) raise def get_client(): """Create and return an OpenAI client""" with tracer.start_as_current_span("get_openai_client"): return OpenAI() def call_llm(client, user_prompt): """Call the OpenAI API to process the user prompt Note: With OpenInference instrumenter, the OpenAI API call will be automatically traced. This function adds some additional manual spans for demonstration purposes. """ with tracer.start_as_current_span( "call_llm", attributes={ "ai.prompt.text": user_prompt, "ai.prompt.tokens": len(user_prompt.split()) } ) as span: try: # Define tools available to the model tools = [ { "type": "function", "name": "get_weather", "description": "Get current temperature for provided coordinates in celsius.", "parameters": { "type": "object", "properties": { "latitude": {"type": "number"}, "longitude": {"type": "number"}, }, "required": ["latitude", "longitude"], "additionalProperties": False, }, "strict": True, } ] input_messages = [{"role": "user", "content": user_prompt}] # The OpenAI API call will be automatically traced by OpenInference # We don't need to create a span for it, but we can add attributes to our parent span response = client.responses.create( model="gpt-4.1", input=input_messages, tools=tools, ) # Check if the response contains a tool call has_tool_call = False if response.output and len(response.output) > 0: output = response.output[0] if output.type == "function_call": has_tool_call = True span.set_attribute("openai.response.tool_called", output.name) span.set_attribute("openai.response.has_tool_call", has_tool_call) return response except Exception as e: span.record_exception(e) span.set_status(trace.StatusCode.ERROR, str(e)) raise def main(): """Main function to process the weather query""" with tracer.start_as_current_span("openai-weather-main") as root_span: user_prompt = "What's the weather like in Paris today?" root_span.set_attribute("query", user_prompt) try: client = get_client() response = call_llm(client, user_prompt) print("LLM Response") print(response.model_dump_json()) weather_response = None if response.output: output = response.output[0] if output.type == "function_call" and output.name == "get_weather": # Parse the arguments from the function call with tracer.start_as_current_span( "parse_function_call", attributes={"function_name": output.name} ): kwargs = json.loads(output.arguments) root_span.set_attribute("weather.latitude", kwargs.get("latitude")) root_span.set_attribute("weather.longitude", kwargs.get("longitude")) print("Weather API Response") weather_response = get_weather(**kwargs) print(weather_response) if weather_response: with tracer.start_as_current_span("format_weather_response"): print(user_prompt) formatted_answer = f"Answer: {weather_response}" print(formatted_answer) root_span.set_attribute("weather.answer", formatted_answer) # Mark the trace as successful root_span.set_status(trace.StatusCode.OK) except Exception as e: # Record any exceptions that occurred root_span.record_exception(e) root_span.set_status(trace.StatusCode.ERROR, str(e)) print(f"Error: {e}") if __name__ == "__main__": main() # Ensure all spans are exported before the program exits trace_provider.shutdown() ``` ## Running the example To run this example, you need to add API keys to your environment: ```shell export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[pydantic-ai]" \ -m patronus_examples.tracking.pydanticai_weather ``` ### Running the script directly If you've cloned the repository, you can run the script directly: ```shell # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/pydanticai_weather.py ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install patronus pip install pydantic-ai-slim[openai] pip install opentelemetry-instrumentation-asyncio pip install opentelemetry-instrumentation-threading ``` ## Example overview This example demonstrates how to use Patronus to trace Pydantic-AI agent interactions in an asynchronous application. The example: 1. Sets up two Pydantic-AI agents: a weather agent and a manager agent 1. Configures the weather agent with a tool to provide mock weather data 1. Configures the manager agent with a tool to call the weather agent 1. Demonstrates how to handle agent-to-agent communication The example shows how Patronus can trace asynchronous workflows and provide visibility into multi-agent systems built with Pydantic-AI. ## Example code ```python # examples/patronus_examples/tracking/pydanticai_weather.py import asyncio from pydantic_ai import Agent from opentelemetry.instrumentation.threading import ThreadingInstrumentor from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor from patronus.integrations.pydantic_ai import PydanticAIIntegrator import patronus patronus.init( integrations=[ PydanticAIIntegrator(), ThreadingInstrumentor(), AsyncioInstrumentor(), ] ) def get_agent(system_prompt="You are a helpful assistant"): agent = Agent("openai:gpt-4o", output_type=str, system_prompt=system_prompt) return agent @patronus.traced("weather-pydantic-ai") async def main(): # Create weather agent and attach tool to it weather_agent = get_agent( "You are a helpful assistant that can help with weather information." ) @weather_agent.tool_plain() async def get_weather(): # Mock tool output return ( "Today's weather is Sunny with a forecasted high of 30°C and a low of 25°C. " "The wind is expected at 4 km/h." ) # Create manager agent manager_agent = get_agent( "You are a helpful assistant that can coordinate with other subagents " "and query them for more information about topics." ) # Create a tool to execute the weather agent @manager_agent.tool_plain() async def call_weather_agent(): weather_info = await weather_agent.run("What is the weather in Paris, France?") return str(weather_info) # Run the manager print("Running the agent...") return await manager_agent.run("What is the weather in Paris, France?") if __name__ == "__main__": result = asyncio.run(main()) print(result) ``` ## Running the example To run this example, you need to add API keys to your environment: ```shell export PATRONUS_API_KEY=your-api-key export OPENAI_API_KEY=your-api-key ``` ### Running with `uv` You can run the example as a one-liner with zero setup: ```shell # Remember to export environment variables before running the example. uv run --no-cache --with "patronus-examples[smolagents]" \ -m patronus_examples.tracking.smolagents_weather ``` ### Running the script directly If you've cloned the repository, you can run the script directly: ```shell # Clone the repository git clone https://github.com/patronus-ai/patronus-py.git cd patronus-py # Run the example script (requires uv) ./examples/patronus_examples/tracking/smolagents_weather.py ``` ### Manual installation If you prefer to copy the example code to your own project, you'll need to install these dependencies: ```shell pip install patronus pip install smolagents[litellm] pip install openinference-instrumentation-smolagents pip install opentelemetry-instrumentation-threading ``` ## Example overview This example demonstrates how to use Patronus to trace Smolagents tool calls and LLM interactions. The application: 1. Sets up a Smolagents agent with a weather tool 1. Configures a hierarchical agent structure with subagents 1. Processes a user query about weather in Paris 1. Handles the tool calling workflow automatically The example shows how Patronus provides visibility into the agent's decision-making process, tool usage, and interaction between different agent layers. ## Example code ```python # examples/patronus_examples/tracking/smolagents_weather.py from datetime import datetime from openinference.instrumentation.smolagents import SmolagentsInstrumentor from opentelemetry.instrumentation.threading import ThreadingInstrumentor from smolagents import LiteLLMModel, ToolCallingAgent, tool import patronus patronus.init(integrations=[SmolagentsInstrumentor(), ThreadingInstrumentor()]) @tool def get_weather_api(location: str, date_time: str) -> str: """ Returns the weather report. Args: location: the name of the place that you want the weather for. Should be a place name, followed by possibly a city name, then a country, like "Anchor Point, Taghazout, Morocco". date_time: the date and time for which you want the report, formatted as '%m/%d/%y %H:%M:%S'. """ try: date_time = datetime.strptime(date_time, "%m/%d/%y %H:%M:%S") except Exception as e: raise ValueError( "Conversion of `date_time` to datetime format failed, " f"make sure to provide a string in format '%m/%d/%y %H:%M:%S': {e}" ) temperature_celsius, risk_of_rain, wave_height = 10, 0.5, 4 # mock outputs return ( f"Weather report for {location}, {date_time}: " f"Temperature will be {temperature_celsius}°C, " f"risk of rain is {risk_of_rain * 100:.0f}%, wave height is {wave_height}m." ) def create_agent(model_id): # Create weather agent weather_model = LiteLLMModel(model_id, temperature=0.0, top_p=1.0) weather_subagent = ToolCallingAgent( tools=[get_weather_api], model=weather_model, max_steps=10, name="weather_agent", description="This agent can provide information about the weather at a certain location", ) # Create manager agent and add weather agent as subordinate manager_model = LiteLLMModel(model_id, temperature=0.0, top_p=1.0) agent = ToolCallingAgent( model=manager_model, managed_agents=[weather_subagent], tools=[], add_base_tools=False, ) return agent @patronus.traced("weather-smolagents") def main(): agent = create_agent("openai/gpt-4o") agent.run("What is the weather in Paris, France?") if __name__ == "__main__": main() ``` # API Reference # API ## patronus.api.api_client.PatronusAPIClient ```python PatronusAPIClient( *, client_http_async: AsyncClient, client_http: Client, base_url: str, api_key: str, ) ``` Bases: `BaseAPIClient` Source code in `src/patronus/api/api_client_base.py` ```python def __init__( self, *, client_http_async: httpx.AsyncClient, client_http: httpx.Client, base_url: str, api_key: str, ): self.version = importlib.metadata.version("patronus") self.http = client_http_async self.http_sync = client_http self.base_url = base_url.rstrip("/") self.api_key = api_key ``` ### add_evaluator_criteria_revision ```python add_evaluator_criteria_revision( evaluator_criteria_id, request: AddEvaluatorCriteriaRevisionRequest, ) -> api_types.AddEvaluatorCriteriaRevisionResponse ``` Adds a revision to existing evaluator criteria. Source code in `src/patronus/api/api_client.py` ```python async def add_evaluator_criteria_revision( self, evaluator_criteria_id, request: api_types.AddEvaluatorCriteriaRevisionRequest, ) -> api_types.AddEvaluatorCriteriaRevisionResponse: """Adds a revision to existing evaluator criteria.""" resp = await self.call( "POST", f"/v1/evaluator-criteria/{evaluator_criteria_id}/revision", body=request, response_cls=api_types.AddEvaluatorCriteriaRevisionResponse, ) resp.raise_for_status() return resp.data ``` ### add_evaluator_criteria_revision_sync ```python add_evaluator_criteria_revision_sync( evaluator_criteria_id, request: AddEvaluatorCriteriaRevisionRequest, ) -> api_types.AddEvaluatorCriteriaRevisionResponse ``` Adds a revision to existing evaluator criteria. Source code in `src/patronus/api/api_client.py` ```python def add_evaluator_criteria_revision_sync( self, evaluator_criteria_id, request: api_types.AddEvaluatorCriteriaRevisionRequest, ) -> api_types.AddEvaluatorCriteriaRevisionResponse: """Adds a revision to existing evaluator criteria.""" resp = self.call_sync( "POST", f"/v1/evaluator-criteria/{evaluator_criteria_id}/revision", body=request, response_cls=api_types.AddEvaluatorCriteriaRevisionResponse, ) resp.raise_for_status() return resp.data ``` ### annotate ```python annotate( request: AnnotateRequest, ) -> api_types.AnnotateResponse ``` Annotates log based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def annotate(self, request: api_types.AnnotateRequest) -> api_types.AnnotateResponse: """Annotates log based on the given request.""" resp = await self.call( "POST", "/v1/annotate", body=request, response_cls=api_types.AnnotateResponse, ) resp.raise_for_status() return resp.data ``` ### annotate_sync ```python annotate_sync( request: AnnotateRequest, ) -> api_types.AnnotateResponse ``` Annotates log based on the given request. Source code in `src/patronus/api/api_client.py` ```python def annotate_sync(self, request: api_types.AnnotateRequest) -> api_types.AnnotateResponse: """Annotates log based on the given request.""" resp = self.call_sync( "POST", "/v1/annotate", body=request, response_cls=api_types.AnnotateResponse, ) resp.raise_for_status() return resp.data ``` ### batch_create_evaluations ```python batch_create_evaluations( request: BatchCreateEvaluationsRequest, ) -> api_types.BatchCreateEvaluationsResponse ``` Creates multiple evaluations in a single request. Source code in `src/patronus/api/api_client.py` ```python async def batch_create_evaluations( self, request: api_types.BatchCreateEvaluationsRequest ) -> api_types.BatchCreateEvaluationsResponse: """Creates multiple evaluations in a single request.""" resp = await self.call( "POST", "/v1/evaluations/batch", body=request, response_cls=api_types.BatchCreateEvaluationsResponse, ) resp.raise_for_status() return resp.data ``` ### batch_create_evaluations_sync ```python batch_create_evaluations_sync( request: BatchCreateEvaluationsRequest, ) -> api_types.BatchCreateEvaluationsResponse ``` Creates multiple evaluations in a single request. Source code in `src/patronus/api/api_client.py` ```python def batch_create_evaluations_sync( self, request: api_types.BatchCreateEvaluationsRequest ) -> api_types.BatchCreateEvaluationsResponse: """Creates multiple evaluations in a single request.""" resp = self.call_sync( "POST", "/v1/evaluations/batch", body=request, response_cls=api_types.BatchCreateEvaluationsResponse, ) resp.raise_for_status() return resp.data ``` ### create_annotation_criteria ```python create_annotation_criteria( request: CreateAnnotationCriteriaRequest, ) -> api_types.CreateAnnotationCriteriaResponse ``` Creates annotation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def create_annotation_criteria( self, request: api_types.CreateAnnotationCriteriaRequest ) -> api_types.CreateAnnotationCriteriaResponse: """Creates annotation criteria based on the given request.""" resp = await self.call( "POST", "/v1/annotation-criteria", body=request, response_cls=api_types.CreateAnnotationCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### create_annotation_criteria_sync ```python create_annotation_criteria_sync( request: CreateAnnotationCriteriaRequest, ) -> api_types.CreateAnnotationCriteriaResponse ``` Creates annotation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python def create_annotation_criteria_sync( self, request: api_types.CreateAnnotationCriteriaRequest ) -> api_types.CreateAnnotationCriteriaResponse: """Creates annotation criteria based on the given request.""" resp = self.call_sync( "POST", "/v1/annotation-criteria", body=request, response_cls=api_types.CreateAnnotationCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### create_criteria ```python create_criteria( request: CreateCriteriaRequest, ) -> api_types.CreateCriteriaResponse ``` Creates evaluation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def create_criteria(self, request: api_types.CreateCriteriaRequest) -> api_types.CreateCriteriaResponse: """Creates evaluation criteria based on the given request.""" resp = await self.call( "POST", "/v1/evaluator-criteria", body=request, response_cls=api_types.CreateCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### create_criteria_sync ```python create_criteria_sync( request: CreateCriteriaRequest, ) -> api_types.CreateCriteriaResponse ``` Creates evaluation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python def create_criteria_sync(self, request: api_types.CreateCriteriaRequest) -> api_types.CreateCriteriaResponse: """Creates evaluation criteria based on the given request.""" resp = self.call_sync( "POST", "/v1/evaluator-criteria", body=request, response_cls=api_types.CreateCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### create_experiment ```python create_experiment( request: CreateExperimentRequest, ) -> api_types.Experiment ``` Creates a new experiment based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def create_experiment(self, request: api_types.CreateExperimentRequest) -> api_types.Experiment: """Creates a new experiment based on the given request.""" resp = await self.call( "POST", "/v1/experiments", body=request, response_cls=api_types.CreateExperimentResponse, ) resp.raise_for_status() return resp.data.experiment ``` ### create_experiment_sync ```python create_experiment_sync( request: CreateExperimentRequest, ) -> api_types.Experiment ``` Creates a new experiment based on the given request. Source code in `src/patronus/api/api_client.py` ```python def create_experiment_sync(self, request: api_types.CreateExperimentRequest) -> api_types.Experiment: """Creates a new experiment based on the given request.""" resp = self.call_sync( "POST", "/v1/experiments", body=request, response_cls=api_types.CreateExperimentResponse, ) resp.raise_for_status() return resp.data.experiment ``` ### create_project ```python create_project( request: CreateProjectRequest, ) -> api_types.Project ``` Creates a new project based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def create_project(self, request: api_types.CreateProjectRequest) -> api_types.Project: """Creates a new project based on the given request.""" resp = await self.call("POST", "/v1/projects", body=request, response_cls=api_types.Project) resp.raise_for_status() return resp.data ``` ### create_project_sync ```python create_project_sync( request: CreateProjectRequest, ) -> api_types.Project ``` Creates a new project based on the given request. Source code in `src/patronus/api/api_client.py` ```python def create_project_sync(self, request: api_types.CreateProjectRequest) -> api_types.Project: """Creates a new project based on the given request.""" resp = self.call_sync("POST", "/v1/projects", body=request, response_cls=api_types.Project) resp.raise_for_status() return resp.data ``` ### delete_annotation_criteria ```python delete_annotation_criteria(criteria_id: str) -> None ``` Deletes annotation criteria by its ID. Source code in `src/patronus/api/api_client.py` ```python async def delete_annotation_criteria(self, criteria_id: str) -> None: """Deletes annotation criteria by its ID.""" resp = await self.call( "DELETE", f"/v1/annotation-criteria/{criteria_id}", response_cls=None, ) resp.raise_for_status() ``` ### delete_annotation_criteria_sync ```python delete_annotation_criteria_sync(criteria_id: str) -> None ``` Deletes annotation criteria by its ID. Source code in `src/patronus/api/api_client.py` ```python def delete_annotation_criteria_sync(self, criteria_id: str) -> None: """Deletes annotation criteria by its ID.""" resp = self.call_sync( "DELETE", f"/v1/annotation-criteria/{criteria_id}", response_cls=None, ) resp.raise_for_status() ``` ### evaluate ```python evaluate( request: EvaluateRequest, ) -> api_types.EvaluateResponse ``` Evaluates content using the specified evaluators. Source code in `src/patronus/api/api_client.py` ```python async def evaluate(self, request: api_types.EvaluateRequest) -> api_types.EvaluateResponse: """Evaluates content using the specified evaluators.""" resp = await self.call( "POST", "/v1/evaluate", body=request, response_cls=api_types.EvaluateResponse, ) resp.raise_for_status() return resp.data ``` ### evaluate_one ```python evaluate_one( request: EvaluateRequest, ) -> api_types.EvaluationResult ``` Evaluates content using a single evaluator. Source code in `src/patronus/api/api_client.py` ```python async def evaluate_one(self, request: api_types.EvaluateRequest) -> api_types.EvaluationResult: """Evaluates content using a single evaluator.""" if len(request.evaluators) > 1: raise ValueError("'evaluate_one()' cannot accept more than one evaluator in the request body") resp = await self.call( "POST", "/v1/evaluate", body=request, response_cls=api_types.EvaluateResponse, ) return self._evaluate_one_process_resp(resp) ``` ### evaluate_one_sync ```python evaluate_one_sync( request: EvaluateRequest, ) -> api_types.EvaluationResult ``` Evaluates content using a single evaluator. Source code in `src/patronus/api/api_client.py` ```python def evaluate_one_sync(self, request: api_types.EvaluateRequest) -> api_types.EvaluationResult: """Evaluates content using a single evaluator.""" if len(request.evaluators) > 1: raise ValueError("'evaluate_one_sync()' cannot accept more than one evaluator in the request body") resp = self.call_sync( "POST", "/v1/evaluate", body=request, response_cls=api_types.EvaluateResponse, ) return self._evaluate_one_process_resp(resp) ``` ### evaluate_sync ```python evaluate_sync( request: EvaluateRequest, ) -> api_types.EvaluateResponse ``` Evaluates content using the specified evaluators. Source code in `src/patronus/api/api_client.py` ```python def evaluate_sync(self, request: api_types.EvaluateRequest) -> api_types.EvaluateResponse: """Evaluates content using the specified evaluators.""" resp = self.call_sync( "POST", "/v1/evaluate", body=request, response_cls=api_types.EvaluateResponse, ) resp.raise_for_status() return resp.data ``` ### export_evaluations ```python export_evaluations( request: ExportEvaluationRequest, ) -> api_types.ExportEvaluationResponse ``` Exports evaluations based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def export_evaluations( self, request: api_types.ExportEvaluationRequest ) -> api_types.ExportEvaluationResponse: """Exports evaluations based on the given request.""" resp = await self.call( "POST", "/v1/evaluation-results/batch", body=request, response_cls=api_types.ExportEvaluationResponse, ) resp.raise_for_status() return resp.data ``` ### export_evaluations_sync ```python export_evaluations_sync( request: ExportEvaluationRequest, ) -> api_types.ExportEvaluationResponse ``` Exports evaluations based on the given request. Source code in `src/patronus/api/api_client.py` ```python def export_evaluations_sync(self, request: api_types.ExportEvaluationRequest) -> api_types.ExportEvaluationResponse: """Exports evaluations based on the given request.""" resp = self.call_sync( "POST", "/v1/evaluation-results/batch", body=request, response_cls=api_types.ExportEvaluationResponse, ) resp.raise_for_status() return resp.data ``` ### get_experiment ```python get_experiment( experiment_id: str, ) -> Optional[api_types.Experiment] ``` Fetches an experiment by its ID or returns None if not found. Source code in `src/patronus/api/api_client.py` ```python async def get_experiment(self, experiment_id: str) -> Optional[api_types.Experiment]: """Fetches an experiment by its ID or returns None if not found.""" resp = await self.call( "GET", f"/v1/experiments/{experiment_id}", response_cls=api_types.GetExperimentResponse, ) if resp.response.status_code == 404: return None resp.raise_for_status() return resp.data.experiment ``` ### get_experiment_sync ```python get_experiment_sync( experiment_id: str, ) -> Optional[api_types.Experiment] ``` Fetches an experiment by its ID or returns None if not found. Source code in `src/patronus/api/api_client.py` ```python def get_experiment_sync(self, experiment_id: str) -> Optional[api_types.Experiment]: """Fetches an experiment by its ID or returns None if not found.""" resp = self.call_sync( "GET", f"/v1/experiments/{experiment_id}", response_cls=api_types.GetExperimentResponse, ) if resp.response.status_code == 404: return None resp.raise_for_status() return resp.data.experiment ``` ### get_project ```python get_project(project_id: str) -> api_types.Project ``` Fetches a project by its ID. Source code in `src/patronus/api/api_client.py` ```python async def get_project(self, project_id: str) -> api_types.Project: """Fetches a project by its ID.""" resp = await self.call( "GET", f"/v1/projects/{project_id}", response_cls=api_types.GetProjectResponse, ) resp.raise_for_status() return resp.data.project ``` ### get_project_sync ```python get_project_sync(project_id: str) -> api_types.Project ``` Fetches a project by its ID. Source code in `src/patronus/api/api_client.py` ```python def get_project_sync(self, project_id: str) -> api_types.Project: """Fetches a project by its ID.""" resp = self.call_sync( "GET", f"/v1/projects/{project_id}", response_cls=api_types.GetProjectResponse, ) resp.raise_for_status() return resp.data.project ``` ### list_annotation_criteria ```python list_annotation_criteria( *, project_id: Optional[str] = None, limit: Optional[int] = None, offset: Optional[int] = None, ) -> api_types.ListAnnotationCriteriaResponse ``` Retrieves a list of annotation criteria with optional filtering. Source code in `src/patronus/api/api_client.py` ```python async def list_annotation_criteria( self, *, project_id: Optional[str] = None, limit: Optional[int] = None, offset: Optional[int] = None ) -> api_types.ListAnnotationCriteriaResponse: """Retrieves a list of annotation criteria with optional filtering.""" params = {} if project_id is not None: params["project_id"] = project_id if limit is not None: params["limit"] = limit if offset is not None: params["offset"] = offset resp = await self.call( "GET", "/v1/annotation-criteria", params=params, response_cls=api_types.ListAnnotationCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### list_annotation_criteria_sync ```python list_annotation_criteria_sync( *, project_id: Optional[str] = None, limit: Optional[int] = None, offset: Optional[int] = None, ) -> api_types.ListAnnotationCriteriaResponse ``` Retrieves a list of annotation criteria with optional filtering. Source code in `src/patronus/api/api_client.py` ```python def list_annotation_criteria_sync( self, *, project_id: Optional[str] = None, limit: Optional[int] = None, offset: Optional[int] = None ) -> api_types.ListAnnotationCriteriaResponse: """Retrieves a list of annotation criteria with optional filtering.""" params = {} if project_id is not None: params["project_id"] = project_id if limit is not None: params["limit"] = limit if offset is not None: params["offset"] = offset resp = self.call_sync( "GET", "/v1/annotation-criteria", params=params, response_cls=api_types.ListAnnotationCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### list_criteria ```python list_criteria( request: ListCriteriaRequest, ) -> api_types.ListCriteriaResponse ``` Retrieves a list of evaluation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def list_criteria(self, request: api_types.ListCriteriaRequest) -> api_types.ListCriteriaResponse: """Retrieves a list of evaluation criteria based on the given request.""" params = request.model_dump(exclude_none=True) resp = await self.call( "GET", "/v1/evaluator-criteria", params=params, response_cls=api_types.ListCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### list_criteria_sync ```python list_criteria_sync( request: ListCriteriaRequest, ) -> api_types.ListCriteriaResponse ``` Retrieves a list of evaluation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python def list_criteria_sync(self, request: api_types.ListCriteriaRequest) -> api_types.ListCriteriaResponse: """Retrieves a list of evaluation criteria based on the given request.""" params = request.model_dump(exclude_none=True) resp = self.call_sync( "GET", "/v1/evaluator-criteria", params=params, response_cls=api_types.ListCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### list_dataset_data ```python list_dataset_data( dataset_id: str, ) -> api_types.ListDatasetData ``` Retrieves data from a dataset by its ID. Source code in `src/patronus/api/api_client.py` ```python async def list_dataset_data(self, dataset_id: str) -> api_types.ListDatasetData: """Retrieves data from a dataset by its ID.""" resp = await self.call( "GET", f"/v1/datasets/{dataset_id}/data", response_cls=api_types.ListDatasetData, ) resp.raise_for_status() return resp.data ``` ### list_dataset_data_sync ```python list_dataset_data_sync( dataset_id: str, ) -> api_types.ListDatasetData ``` Retrieves data from a dataset by its ID. Source code in `src/patronus/api/api_client.py` ```python def list_dataset_data_sync(self, dataset_id: str) -> api_types.ListDatasetData: """Retrieves data from a dataset by its ID.""" resp = self.call_sync( "GET", f"/v1/datasets/{dataset_id}/data", response_cls=api_types.ListDatasetData, ) resp.raise_for_status() return resp.data ``` ### list_datasets ```python list_datasets( dataset_type: Optional[str] = None, ) -> list[api_types.Dataset] ``` Retrieves a list of datasets, optionally filtered by type. Source code in `src/patronus/api/api_client.py` ```python async def list_datasets(self, dataset_type: Optional[str] = None) -> list[api_types.Dataset]: """ Retrieves a list of datasets, optionally filtered by type. """ params = {} if dataset_type is not None: params["type"] = dataset_type resp = await self.call( "GET", "/v1/datasets", params=params, response_cls=api_types.ListDatasetsResponse, ) resp.raise_for_status() return resp.data.datasets ``` ### list_datasets_sync ```python list_datasets_sync( dataset_type: Optional[str] = None, ) -> list[api_types.Dataset] ``` Retrieves a list of datasets, optionally filtered by type. Source code in `src/patronus/api/api_client.py` ```python def list_datasets_sync(self, dataset_type: Optional[str] = None) -> list[api_types.Dataset]: """ Retrieves a list of datasets, optionally filtered by type. """ params = {} if dataset_type is not None: params["type"] = dataset_type resp = self.call_sync( "GET", "/v1/datasets", params=params, response_cls=api_types.ListDatasetsResponse, ) resp.raise_for_status() return resp.data.datasets ``` ### list_evaluators ```python list_evaluators( by_alias_or_id: Optional[str] = None, ) -> list[api_types.Evaluator] ``` Retrieves a list of available evaluators. Source code in `src/patronus/api/api_client.py` ```python async def list_evaluators(self, by_alias_or_id: Optional[str] = None) -> list[api_types.Evaluator]: """Retrieves a list of available evaluators.""" params = {} if by_alias_or_id: params["by_alias_or_id"] = by_alias_or_id resp = await self.call("GET", "/v1/evaluators", params=params, response_cls=api_types.ListEvaluatorsResponse) resp.raise_for_status() return resp.data.evaluators ``` ### list_evaluators_sync ```python list_evaluators_sync( by_alias_or_id: Optional[str] = None, ) -> list[api_types.Evaluator] ``` Retrieves a list of available evaluators. Source code in `src/patronus/api/api_client.py` ```python def list_evaluators_sync(self, by_alias_or_id: Optional[str] = None) -> list[api_types.Evaluator]: """Retrieves a list of available evaluators.""" params = {} if by_alias_or_id: params["by_alias_or_id"] = by_alias_or_id resp = self.call_sync("GET", "/v1/evaluators", params=params, response_cls=api_types.ListEvaluatorsResponse) resp.raise_for_status() return resp.data.evaluators ``` ### search_evaluations ```python search_evaluations( request: SearchEvaluationsRequest, ) -> api_types.SearchEvaluationsResponse ``` Searches for evaluations based on the given criteria. Source code in `src/patronus/api/api_client.py` ```python async def search_evaluations( self, request: api_types.SearchEvaluationsRequest ) -> api_types.SearchEvaluationsResponse: """Searches for evaluations based on the given criteria.""" resp = await self.call( "POST", "/v1/evaluations/search", body=request, response_cls=api_types.SearchEvaluationsResponse, ) resp.raise_for_status() return resp.data ``` ### search_evaluations_sync ```python search_evaluations_sync( request: SearchEvaluationsRequest, ) -> api_types.SearchEvaluationsResponse ``` Searches for evaluations based on the given criteria. Source code in `src/patronus/api/api_client.py` ```python def search_evaluations_sync( self, request: api_types.SearchEvaluationsRequest ) -> api_types.SearchEvaluationsResponse: """Searches for evaluations based on the given criteria.""" resp = self.call_sync( "POST", "/v1/evaluations/search", body=request, response_cls=api_types.SearchEvaluationsResponse, ) resp.raise_for_status() return resp.data ``` ### search_logs ```python search_logs( request: SearchLogsRequest, ) -> api_types.SearchLogsResponse ``` Searches for logs based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def search_logs(self, request: api_types.SearchLogsRequest) -> api_types.SearchLogsResponse: """Searches for logs based on the given request.""" resp = await self.call( "POST", "/v1/otel/logs/search", body=request, response_cls=api_types.SearchLogsResponse, ) resp.raise_for_status() return resp.data ``` ### search_logs_sync ```python search_logs_sync( request: SearchLogsRequest, ) -> api_types.SearchLogsResponse ``` Searches for logs based on the given request. Source code in `src/patronus/api/api_client.py` ```python def search_logs_sync(self, request: api_types.SearchLogsRequest) -> api_types.SearchLogsResponse: """Searches for logs based on the given request.""" resp = self.call_sync( "POST", "/v1/otel/logs/search", body=request, response_cls=api_types.SearchLogsResponse, ) resp.raise_for_status() return resp.data ``` ### update_annotation_criteria ```python update_annotation_criteria( criteria_id: str, request: UpdateAnnotationCriteriaRequest, ) -> api_types.UpdateAnnotationCriteriaResponse ``` Creates annotation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def update_annotation_criteria( self, criteria_id: str, request: api_types.UpdateAnnotationCriteriaRequest ) -> api_types.UpdateAnnotationCriteriaResponse: """Creates annotation criteria based on the given request.""" resp = await self.call( "PUT", f"/v1/annotation-criteria/{criteria_id}", body=request, response_cls=api_types.UpdateAnnotationCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### update_annotation_criteria_sync ```python update_annotation_criteria_sync( criteria_id: str, request: UpdateAnnotationCriteriaRequest, ) -> api_types.UpdateAnnotationCriteriaResponse ``` Creates annotation criteria based on the given request. Source code in `src/patronus/api/api_client.py` ```python def update_annotation_criteria_sync( self, criteria_id: str, request: api_types.UpdateAnnotationCriteriaRequest ) -> api_types.UpdateAnnotationCriteriaResponse: """Creates annotation criteria based on the given request.""" resp = self.call_sync( "PUT", f"/v1/annotation-criteria/{criteria_id}", body=request, response_cls=api_types.UpdateAnnotationCriteriaResponse, ) resp.raise_for_status() return resp.data ``` ### update_experiment ```python update_experiment( experiment_id: str, request: UpdateExperimentRequest ) -> api_types.Experiment ``` Updates an existing experiment based on the given request. Source code in `src/patronus/api/api_client.py` ```python async def update_experiment( self, experiment_id: str, request: api_types.UpdateExperimentRequest ) -> api_types.Experiment: """Updates an existing experiment based on the given request.""" resp = await self.call( "POST", f"/v1/experiments/{experiment_id}", body=request, response_cls=api_types.UpdateExperimentResponse, ) resp.raise_for_status() return resp.data.experiment ``` ### update_experiment_sync ```python update_experiment_sync( experiment_id: str, request: UpdateExperimentRequest ) -> api_types.Experiment ``` Updates an existing experiment based on the given request. Source code in `src/patronus/api/api_client.py` ```python def update_experiment_sync( self, experiment_id: str, request: api_types.UpdateExperimentRequest ) -> api_types.Experiment: """Updates an existing experiment based on the given request.""" resp = self.call_sync( "POST", f"/v1/experiments{experiment_id}", body=request, response_cls=api_types.UpdateExperimentResponse, ) resp.raise_for_status() return resp.data.experiment ``` ### upload_dataset ```python upload_dataset( file_path: str, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[ dict[str, Union[str, list[str]]] ] = None, ) -> api_types.Dataset ``` Upload a dataset file to create a new dataset in Patronus. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `file_path` | `str` | Path to the dataset file (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` | Returns: | Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset | Source code in `src/patronus/api/api_client.py` ```python async def upload_dataset( self, file_path: str, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None, ) -> api_types.Dataset: """ Upload a dataset file to create a new dataset in Patronus. Args: file_path: Path to the dataset file (CSV or JSONL format) dataset_name: Name for the created dataset dataset_description: Optional description for the dataset custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset Returns: Dataset object representing the created dataset """ with open(file_path, "rb") as f: return await self.upload_dataset_from_buffer(f, dataset_name, dataset_description, custom_field_mapping) ``` ### upload_dataset_from_buffer ```python upload_dataset_from_buffer( file_obj: BinaryIO, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[ dict[str, Union[str, list[str]]] ] = None, ) -> api_types.Dataset ``` Upload a dataset file to create a new dataset in Patronus AI Platform. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `file_obj` | `BinaryIO` | File-like object containing dataset content (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` | Returns: | Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset | Source code in `src/patronus/api/api_client.py` ```python async def upload_dataset_from_buffer( self, file_obj: typing.BinaryIO, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None, ) -> api_types.Dataset: """ Upload a dataset file to create a new dataset in Patronus AI Platform. Args: file_obj: File-like object containing dataset content (CSV or JSONL format) dataset_name: Name for the created dataset dataset_description: Optional description for the dataset custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset Returns: Dataset object representing the created dataset """ data = { "dataset_name": dataset_name, } if dataset_description is not None: data["dataset_description"] = dataset_description if custom_field_mapping is not None: data["custom_field_mapping"] = json.dumps(custom_field_mapping) files = {"file": (dataset_name, file_obj)} resp = await self.call_multipart( "POST", "/v1/datasets", files=files, data=data, response_cls=api_types.CreateDatasetResponse, ) resp.raise_for_status() return resp.data.dataset ``` ### upload_dataset_from_buffer_sync ```python upload_dataset_from_buffer_sync( file_obj: BinaryIO, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[ dict[str, Union[str, list[str]]] ] = None, ) -> api_types.Dataset ``` Upload a dataset file to create a new dataset in Patronus AI Platform. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `file_obj` | `BinaryIO` | File-like object containing dataset content (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` | Returns: | Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset | Source code in `src/patronus/api/api_client.py` ```python def upload_dataset_from_buffer_sync( self, file_obj: typing.BinaryIO, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None, ) -> api_types.Dataset: """ Upload a dataset file to create a new dataset in Patronus AI Platform. Args: file_obj: File-like object containing dataset content (CSV or JSONL format) dataset_name: Name for the created dataset dataset_description: Optional description for the dataset custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset Returns: Dataset object representing the created dataset """ data = { "dataset_name": dataset_name, } if dataset_description is not None: data["dataset_description"] = dataset_description if custom_field_mapping is not None: data["custom_field_mapping"] = json.dumps(custom_field_mapping) files = {"file": (dataset_name, file_obj)} resp = self.call_multipart_sync( "POST", "/v1/datasets", files=files, data=data, response_cls=api_types.CreateDatasetResponse, ) resp.raise_for_status() return resp.data.dataset ``` ### upload_dataset_sync ```python upload_dataset_sync( file_path: str, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[ dict[str, Union[str, list[str]]] ] = None, ) -> api_types.Dataset ``` Upload a dataset file to create a new dataset in Patronus AI Platform. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `file_path` | `str` | Path to the dataset file (CSV or JSONL format) | *required* | | `dataset_name` | `str` | Name for the created dataset | *required* | | `dataset_description` | `Optional[str]` | Optional description for the dataset | `None` | | `custom_field_mapping` | `Optional[dict[str, Union[str, list[str]]]]` | Optional mapping of standard field names to custom field names in the dataset | `None` | Returns: | Type | Description | | --- | --- | | `Dataset` | Dataset object representing the created dataset | Source code in `src/patronus/api/api_client.py` ```python def upload_dataset_sync( self, file_path: str, dataset_name: str, dataset_description: Optional[str] = None, custom_field_mapping: Optional[dict[str, Union[str, list[str]]]] = None, ) -> api_types.Dataset: """ Upload a dataset file to create a new dataset in Patronus AI Platform. Args: file_path: Path to the dataset file (CSV or JSONL format) dataset_name: Name for the created dataset dataset_description: Optional description for the dataset custom_field_mapping: Optional mapping of standard field names to custom field names in the dataset Returns: Dataset object representing the created dataset """ with open(file_path, "rb") as f: return self.upload_dataset_from_buffer_sync(f, dataset_name, dataset_description, custom_field_mapping) ``` ### whoami ```python whoami() -> api_types.WhoAmIResponse ``` Fetches information about the authenticated user. Source code in `src/patronus/api/api_client.py` ```python async def whoami(self) -> api_types.WhoAmIResponse: """Fetches information about the authenticated user.""" resp = await self.call("GET", "/v1/whoami", response_cls=api_types.WhoAmIResponse) resp.raise_for_status() return resp.data ``` ### whoami_sync ```python whoami_sync() -> api_types.WhoAmIResponse ``` Fetches information about the authenticated user. Source code in `src/patronus/api/api_client.py` ```python def whoami_sync(self) -> api_types.WhoAmIResponse: """Fetches information about the authenticated user.""" resp = self.call_sync("GET", "/v1/whoami", response_cls=api_types.WhoAmIResponse) resp.raise_for_status() return resp.data ``` ## patronus.api.api_types ### SanitizedApp ```python SanitizedApp = Annotated[ str, _create_field_sanitizer( "[^a-zA-Z0-9-_./ -]", max_len=50, replace_with="_" ), ] ``` ### SanitizedLocalEvaluatorID ```python SanitizedLocalEvaluatorID = Annotated[ Optional[str], _create_field_sanitizer( "[^a-zA-Z0-9\\-_./]", max_len=50, replace_with="-" ), ] ``` ### SanitizedProjectName ```python SanitizedProjectName = Annotated[ str, project_name_sanitizer ] ``` ### project_name_sanitizer ```python project_name_sanitizer = ( _create_field_sanitizer( "[^a-zA-Z0-9_ -]", max_len=50, replace_with="_" ), ) ``` ### Account Bases: `BaseModel` #### id ```python id: str ``` #### name ```python name: str ``` ### AddEvaluatorCriteriaRevisionRequest Bases: `BaseModel` #### config ```python config: dict[str, Any] ``` ### AddEvaluatorCriteriaRevisionResponse Bases: `BaseModel` #### evaluator_criteria ```python evaluator_criteria: EvaluatorCriteria ``` ### AnnotateRequest Bases: `BaseModel` #### annotation_criteria_id ```python annotation_criteria_id: str ``` #### explanation ```python explanation: Optional[str] = None ``` #### log_id ```python log_id: str ``` #### value_pass ```python value_pass: Optional[bool] = None ``` #### value_score ```python value_score: Optional[float] = None ``` #### value_text ```python value_text: Optional[str] = None ``` ### AnnotateResponse Bases: `BaseModel` #### evaluation ```python evaluation: Evaluation ``` ### AnnotationCategory Bases: `BaseModel` #### label ```python label: Optional[str] = None ``` #### score ```python score: Optional[float] = None ``` ### AnnotationCriteria Bases: `BaseModel` #### annotation_type ```python annotation_type: AnnotationType ``` #### categories ```python categories: Optional[list[AnnotationCategory]] = None ``` #### created_at ```python created_at: datetime ``` #### description ```python description: Optional[str] = None ``` #### id ```python id: str ``` #### name ```python name: str ``` #### project_id ```python project_id: str ``` #### updated_at ```python updated_at: datetime ``` ### AnnotationType Bases: `str`, `Enum` #### binary ```python binary = 'binary' ``` #### categorical ```python categorical = 'categorical' ``` #### continuous ```python continuous = 'continuous' ``` #### discrete ```python discrete = 'discrete' ``` #### text_annotation ```python text_annotation = 'text_annotation' ``` ### BatchCreateEvaluationsRequest Bases: `BaseModel` #### evaluations ```python evaluations: list[ClientEvaluation] = Field( min_length=1, max_length=1000 ) ``` ### BatchCreateEvaluationsResponse Bases: `BaseModel` #### evaluations ```python evaluations: list[Evaluation] ``` ### ClientEvaluation Bases: `BaseModel` #### app ```python app: Optional[SanitizedApp] = None ``` #### created_at ```python created_at: Optional[datetime] = None ``` #### criteria ```python criteria: Optional[str] = None ``` #### dataset_id ```python dataset_id: Optional[str] = None ``` #### dataset_sample_id ```python dataset_sample_id: Optional[str] = None ``` #### evaluation_duration ```python evaluation_duration: Optional[timedelta] = None ``` #### evaluator_id ```python evaluator_id: SanitizedLocalEvaluatorID ``` #### experiment_id ```python experiment_id: Optional[str] = None ``` #### explanation ```python explanation: Optional[str] = None ``` #### explanation_duration ```python explanation_duration: Optional[timedelta] = None ``` #### log_id ```python log_id: UUID ``` #### metadata ```python metadata: Optional[dict[str, Any]] = None ``` #### metric_description ```python metric_description: Optional[str] = None ``` #### metric_name ```python metric_name: Optional[str] = None ``` #### pass\_ ```python pass_: Optional[bool] = Field( default=None, serialization_alias="pass" ) ``` #### project_id ```python project_id: Optional[str] = None ``` #### project_name ```python project_name: Optional[SanitizedProjectName] = None ``` #### score ```python score: Optional[float] = None ``` #### span_id ```python span_id: Optional[str] = None ``` #### tags ```python tags: Optional[dict[str, str]] = None ``` #### text_output ```python text_output: Optional[str] = None ``` #### trace_id ```python trace_id: Optional[str] = None ``` ### CreateAnnotationCriteriaRequest Bases: `BaseModel` #### annotation_type ```python annotation_type: AnnotationType ``` #### categories ```python categories: Optional[list[AnnotationCategory]] = None ``` #### description ```python description: Optional[str] = None ``` #### name ```python name: str = Field(min_length=1, max_length=100) ``` #### project_id ```python project_id: str ``` ### CreateAnnotationCriteriaResponse Bases: `BaseModel` #### annotation_criteria ```python annotation_criteria: AnnotationCriteria ``` ### CreateCriteriaRequest Bases: `BaseModel` #### config ```python config: dict[str, Any] ``` #### evaluator_family ```python evaluator_family: str ``` #### name ```python name: str ``` ### CreateCriteriaResponse Bases: `BaseModel` #### evaluator_criteria ```python evaluator_criteria: EvaluatorCriteria ``` ### CreateDatasetResponse Bases: `BaseModel` #### dataset ```python dataset: Dataset ``` #### dataset_id ```python dataset_id: str ``` ### CreateExperimentRequest Bases: `BaseModel` #### metadata ```python metadata: Optional[dict[str, Any]] = None ``` #### name ```python name: str ``` #### project_id ```python project_id: str ``` #### tags ```python tags: dict[str, str] = Field(default_factory=dict) ``` ### CreateExperimentResponse Bases: `BaseModel` #### experiment ```python experiment: Experiment ``` ### CreateProjectRequest Bases: `BaseModel` #### name ```python name: SanitizedProjectName ``` ### Dataset Bases: `BaseModel` #### created_at ```python created_at: datetime ``` #### creation_at ```python creation_at: Optional[datetime] = None ``` #### description ```python description: Optional[str] = None ``` #### id ```python id: str ``` #### name ```python name: str ``` #### samples ```python samples: int ``` #### type ```python type: str ``` ### DatasetDatum Bases: `BaseModel` #### dataset_id ```python dataset_id: str ``` #### evaluated_model_gold_answer ```python evaluated_model_gold_answer: Optional[str] = None ``` #### evaluated_model_input ```python evaluated_model_input: Optional[str] = None ``` #### evaluated_model_output ```python evaluated_model_output: Optional[str] = None ``` #### evaluated_model_retrieved_context ```python evaluated_model_retrieved_context: Optional[list[str]] = ( None ) ``` #### evaluated_model_system_prompt ```python evaluated_model_system_prompt: Optional[str] = None ``` #### meta_evaluated_model_name ```python meta_evaluated_model_name: Optional[str] = None ``` #### meta_evaluated_model_params ```python meta_evaluated_model_params: Optional[ dict[str, Union[str, int, float]] ] = None ``` #### meta_evaluated_model_provider ```python meta_evaluated_model_provider: Optional[str] = None ``` #### meta_evaluated_model_selected_model ```python meta_evaluated_model_selected_model: Optional[str] = None ``` #### sid ```python sid: int ``` ### EvaluateEvaluator Bases: `BaseModel` #### criteria ```python criteria: Optional[str] = None ``` #### evaluator ```python evaluator: str ``` #### explain_strategy ```python explain_strategy: str = 'always' ``` ### EvaluateRequest Bases: `BaseModel` #### app ```python app: Optional[str] = None ``` #### capture ```python capture: str = 'all' ``` #### dataset_id ```python dataset_id: Optional[str] = None ``` #### dataset_sample_id ```python dataset_sample_id: Optional[str] = None ``` #### evaluated_model_attachments ```python evaluated_model_attachments: Optional[ list[EvaluatedModelAttachment] ] = None ``` #### evaluated_model_gold_answer ```python evaluated_model_gold_answer: Optional[str] = None ``` #### evaluated_model_input ```python evaluated_model_input: Optional[str] = None ``` #### evaluated_model_output ```python evaluated_model_output: Optional[str] = None ``` #### evaluated_model_retrieved_context ```python evaluated_model_retrieved_context: Optional[ Union[list[str], str] ] = None ``` #### evaluated_model_system_prompt ```python evaluated_model_system_prompt: Optional[str] = None ``` #### evaluators ```python evaluators: list[EvaluateEvaluator] = Field(min_length=1) ``` #### experiment_id ```python experiment_id: Optional[str] = None ``` #### log_id ```python log_id: Optional[str] = None ``` #### project_id ```python project_id: Optional[str] = None ``` #### project_name ```python project_name: Optional[str] = None ``` #### span_id ```python span_id: Optional[str] = None ``` #### tags ```python tags: Optional[dict[str, str]] = None ``` #### trace_id ```python trace_id: Optional[str] = None ``` ### EvaluateResponse Bases: `BaseModel` #### results ```python results: list[EvaluateResult] ``` ### EvaluateResult Bases: `BaseModel` #### criteria ```python criteria: str ``` #### error_message ```python error_message: Optional[str] ``` #### evaluation_result ```python evaluation_result: Optional[EvaluationResult] ``` #### evaluator_id ```python evaluator_id: str ``` #### status ```python status: str ``` ### EvaluatedModelAttachment Bases: `BaseModel` #### media_type ```python media_type: str ``` #### url ```python url: str ``` #### usage_type ```python usage_type: Optional[str] = 'evaluated_model_input' ``` ### Evaluation Bases: `BaseModel` #### annotation_criteria_id ```python annotation_criteria_id: Optional[str] = None ``` #### app ```python app: Optional[str] = None ``` #### created_at ```python created_at: datetime ``` #### criteria ```python criteria: Optional[str] = None ``` #### criteria_id ```python criteria_id: Optional[str] = None ``` #### dataset_id ```python dataset_id: Optional[str] = None ``` #### dataset_sample_id ```python dataset_sample_id: Optional[str] = None ``` #### evaluation_duration ```python evaluation_duration: Optional[timedelta] = None ``` #### evaluation_type ```python evaluation_type: Optional[str] = None ``` #### evaluator_family ```python evaluator_family: Optional[str] = None ``` #### evaluator_id ```python evaluator_id: Optional[str] = None ``` #### experiment_id ```python experiment_id: Optional[int] = None ``` #### explain_strategy ```python explain_strategy: Optional[str] = None ``` #### explanation ```python explanation: Optional[str] = None ``` #### explanation_duration ```python explanation_duration: Optional[timedelta] = None ``` #### id ```python id: int ``` #### log_id ```python log_id: str ``` #### metadata ```python metadata: Optional[dict[str, Any]] = None ``` #### metric_description ```python metric_description: Optional[str] = None ``` #### metric_name ```python metric_name: Optional[str] = None ``` #### pass\_ ```python pass_: Optional[bool] = Field(default=None, alias='pass') ``` #### project_id ```python project_id: Optional[str] = None ``` #### score ```python score: Optional[float] = None ``` #### span_id ```python span_id: Optional[str] = None ``` #### tags ```python tags: Optional[dict[str, str]] = None ``` #### text_output ```python text_output: Optional[str] = None ``` #### trace_id ```python trace_id: Optional[str] = None ``` #### usage ```python usage: Optional[dict[str, Any]] = None ``` ### EvaluationResult Bases: `BaseModel` #### additional_info ```python additional_info: Optional[dict[str, Any]] = None ``` #### app ```python app: Optional[str] = None ``` #### created_at ```python created_at: Optional[AwareDatetime] = None ``` #### criteria ```python criteria: str ``` #### dataset_id ```python dataset_id: Optional[str] = None ``` #### dataset_sample_id ```python dataset_sample_id: Optional[int] = None ``` #### evaluated_model_gold_answer ```python evaluated_model_gold_answer: Optional[str] = None ``` #### evaluated_model_input ```python evaluated_model_input: Optional[str] = None ``` #### evaluated_model_output ```python evaluated_model_output: Optional[str] = None ``` #### evaluated_model_retrieved_context ```python evaluated_model_retrieved_context: Optional[list[str]] = ( None ) ``` #### evaluated_model_system_prompt ```python evaluated_model_system_prompt: Optional[str] = None ``` #### evaluation_duration ```python evaluation_duration: Optional[timedelta] = None ``` #### evaluation_metadata ```python evaluation_metadata: Optional[dict] = None ``` #### evaluator_family ```python evaluator_family: str ``` #### evaluator_id ```python evaluator_id: str ``` #### evaluator_profile_public_id ```python evaluator_profile_public_id: str ``` #### experiment_id ```python experiment_id: Optional[str] = None ``` #### explanation ```python explanation: Optional[str] = None ``` #### explanation_duration ```python explanation_duration: Optional[timedelta] = None ``` #### id ```python id: Optional[str] = None ``` #### pass\_ ```python pass_: Optional[bool] = Field(default=None, alias='pass') ``` #### project_id ```python project_id: Optional[str] = None ``` #### score_raw ```python score_raw: Optional[float] = None ``` #### tags ```python tags: Optional[dict[str, str]] = None ``` #### text_output ```python text_output: Optional[str] = None ``` ### Evaluator Bases: `BaseModel` #### aliases ```python aliases: Optional[list[str]] ``` #### default_criteria ```python default_criteria: Optional[str] = None ``` #### evaluator_family ```python evaluator_family: Optional[str] ``` #### id ```python id: str ``` #### name ```python name: str ``` ### EvaluatorCriteria Bases: `BaseModel` #### config ```python config: Optional[dict[str, Any]] ``` #### created_at ```python created_at: datetime ``` #### description ```python description: Optional[str] ``` #### evaluator_family ```python evaluator_family: str ``` #### is_patronus_managed ```python is_patronus_managed: bool ``` #### name ```python name: str ``` #### public_id ```python public_id: str ``` #### revision ```python revision: int ``` ### Experiment Bases: `BaseModel` #### id ```python id: str ``` #### metadata ```python metadata: Optional[dict[str, Any]] = None ``` #### name ```python name: str ``` #### project_id ```python project_id: str ``` #### tags ```python tags: Optional[dict[str, str]] = None ``` ### ExportEvaluationRequest Bases: `BaseModel` #### evaluation_results ```python evaluation_results: list[ExportEvaluationResult] ``` ### ExportEvaluationResponse Bases: `BaseModel` #### evaluation_results ```python evaluation_results: list[ExportEvaluationResultPartial] ``` ### ExportEvaluationResult Bases: `BaseModel` #### app ```python app: Optional[str] = None ``` #### criteria ```python criteria: Optional[str] = None ``` #### dataset_id ```python dataset_id: Optional[str] = None ``` #### dataset_sample_id ```python dataset_sample_id: Optional[int] = None ``` #### evaluated_model_attachments ```python evaluated_model_attachments: Optional[ list[EvaluatedModelAttachment] ] = None ``` #### evaluated_model_gold_answer ```python evaluated_model_gold_answer: Optional[str] = None ``` #### evaluated_model_input ```python evaluated_model_input: Optional[str] = None ``` #### evaluated_model_name ```python evaluated_model_name: Optional[str] = None ``` #### evaluated_model_output ```python evaluated_model_output: Optional[str] = None ``` #### evaluated_model_params ```python evaluated_model_params: Optional[ dict[str, Union[str, int, float]] ] = None ``` #### evaluated_model_provider ```python evaluated_model_provider: Optional[str] = None ``` #### evaluated_model_retrieved_context ```python evaluated_model_retrieved_context: Optional[list[str]] = ( None ) ``` #### evaluated_model_selected_model ```python evaluated_model_selected_model: Optional[str] = None ``` #### evaluated_model_system_prompt ```python evaluated_model_system_prompt: Optional[str] = None ``` #### evaluation_duration ```python evaluation_duration: Optional[timedelta] = None ``` #### evaluation_metadata ```python evaluation_metadata: Optional[dict[str, Any]] = None ``` #### evaluator_id ```python evaluator_id: SanitizedLocalEvaluatorID ``` #### experiment_id ```python experiment_id: Optional[str] = None ``` #### explanation ```python explanation: Optional[str] = None ``` #### explanation_duration ```python explanation_duration: Optional[timedelta] = None ``` #### pass\_ ```python pass_: Optional[bool] = Field( default=None, serialization_alias="pass" ) ``` #### score_raw ```python score_raw: Optional[float] = None ``` #### tags ```python tags: Optional[dict[str, str]] = None ``` #### text_output ```python text_output: Optional[str] = None ``` ### ExportEvaluationResultPartial Bases: `BaseModel` #### app ```python app: Optional[str] ``` #### created_at ```python created_at: AwareDatetime ``` #### evaluator_id ```python evaluator_id: str ``` #### id ```python id: str ``` ### GetAnnotationCriteriaResponse Bases: `BaseModel` #### annotation_criteria ```python annotation_criteria: AnnotationCriteria ``` ### GetEvaluationResponse Bases: `BaseModel` #### evaluation ```python evaluation: Evaluation ``` ### GetExperimentResponse Bases: `BaseModel` #### experiment ```python experiment: Experiment ``` ### GetProjectResponse Bases: `BaseModel` #### project ```python project: Project ``` ### ListAnnotationCriteriaResponse Bases: `BaseModel` #### annotation_criteria ```python annotation_criteria: list[AnnotationCriteria] ``` ### ListCriteriaRequest Bases: `BaseModel` #### evaluator_family ```python evaluator_family: Optional[str] = None ``` #### evaluator_id ```python evaluator_id: Optional[str] = None ``` #### get_last_revision ```python get_last_revision: bool = False ``` #### is_patronus_managed ```python is_patronus_managed: Optional[bool] = None ``` #### limit ```python limit: int = 1000 ``` #### name ```python name: Optional[str] = None ``` #### offset ```python offset: int = 0 ``` #### public_id ```python public_id: Optional[str] = None ``` #### revision ```python revision: Optional[str] = None ``` ### ListCriteriaResponse Bases: `BaseModel` #### evaluator_criteria ```python evaluator_criteria: list[EvaluatorCriteria] ``` ### ListDatasetData Bases: `BaseModel` #### data ```python data: list[DatasetDatum] ``` ### ListDatasetsResponse Bases: `BaseModel` #### datasets ```python datasets: list[Dataset] ``` ### ListEvaluatorsResponse Bases: `BaseModel` #### evaluators ```python evaluators: list[Evaluator] ``` ### Log Bases: `BaseModel` #### body ```python body: Any = None ``` #### log_attributes ```python log_attributes: Optional[dict[str, str]] = None ``` #### resource_attributes ```python resource_attributes: Optional[dict[str, str]] = None ``` #### resource_schema_url ```python resource_schema_url: Optional[str] = None ``` #### scope_attributes ```python scope_attributes: Optional[dict[str, str]] = None ``` #### scope_name ```python scope_name: Optional[str] = None ``` #### scope_schema_url ```python scope_schema_url: Optional[str] = None ``` #### scope_version ```python scope_version: Optional[str] = None ``` #### service_name ```python service_name: Optional[str] = None ``` #### severity_number ```python severity_number: Optional[int] = None ``` #### severity_test ```python severity_test: Optional[str] = None ``` #### span_id ```python span_id: Optional[str] = None ``` #### timestamp ```python timestamp: Optional[datetime] = None ``` #### trace_flags ```python trace_flags: Optional[int] = None ``` #### trace_id ```python trace_id: Optional[str] = None ``` ### Project Bases: `BaseModel` #### id ```python id: str ``` #### name ```python name: str ``` ### SearchEvaluationsFilter Bases: `BaseModel` #### and\_ ```python and_: Optional[list[SearchEvaluationsFilter]] = None ``` #### field ```python field: Optional[str] = None ``` #### operation ```python operation: Optional[str] = None ``` #### or\_ ```python or_: Optional[list[SearchEvaluationsFilter]] = None ``` #### value ```python value: Optional[Any] = None ``` ### SearchEvaluationsRequest Bases: `BaseModel` #### filters ```python filters: Optional[list[SearchEvaluationsFilter]] = None ``` ### SearchEvaluationsResponse Bases: `BaseModel` #### evaluations ```python evaluations: list[Evaluation] ``` ### SearchLogsFilter Bases: `BaseModel` #### and\_ ```python and_: Optional[list[SearchLogsFilter]] = None ``` #### field ```python field: Optional[str] = None ``` #### op ```python op: Optional[str] = None ``` #### or\_ ```python or_: Optional[list[SearchLogsFilter]] = None ``` #### value ```python value: Optional[Any] = None ``` ### SearchLogsRequest Bases: `BaseModel` #### filters ```python filters: Optional[list[SearchLogsFilter]] = None ``` #### limit ```python limit: int = 1000 ``` #### order ```python order: str = 'timestamp desc' ``` ### SearchLogsResponse Bases: `BaseModel` #### logs ```python logs: list[Log] ``` ### UpdateAnnotationCriteriaRequest Bases: `BaseModel` #### annotation_type ```python annotation_type: AnnotationType ``` #### categories ```python categories: Optional[list[AnnotationCategory]] = None ``` #### description ```python description: Optional[str] = None ``` #### name ```python name: str = Field(min_length=1, max_length=100) ``` ### UpdateAnnotationCriteriaResponse Bases: `BaseModel` #### annotation_criteria ```python annotation_criteria: AnnotationCriteria ``` ### UpdateExperimentRequest Bases: `BaseModel` #### metadata ```python metadata: dict[str, Any] ``` ### UpdateExperimentResponse Bases: `BaseModel` #### experiment ```python experiment: Experiment ``` ### WhoAmIAPIKey Bases: `BaseModel` #### account ```python account: Account ``` #### id ```python id: str ``` ### WhoAmICaller Bases: `BaseModel` #### api_key ```python api_key: WhoAmIAPIKey ``` ### WhoAmIResponse Bases: `BaseModel` #### caller ```python caller: WhoAmICaller ``` ### sanitize_field ```python sanitize_field(max_length: int, sub_pattern: str) ``` Source code in `src/patronus/api/api_types.py` ```python def sanitize_field(max_length: int, sub_pattern: str): def wrapper(value: str) -> str: if not value: return value value = value[:max_length] return re.sub(sub_pattern, "_", value).strip() return wrapper ``` # Config ## patronus.config ### Config Bases: `BaseSettings` Configuration settings for the Patronus SDK. This class defines all available configuration options with their default values and handles loading configuration from environment variables and YAML files. Configuration sources are checked in this order: 1. Code-specified values 1. Environment variables (with prefix PATRONUS\_) 1. YAML configuration file (patronus.yaml) 1. Default values Attributes: | Name | Type | Description | | --- | --- | --- | | `service` | `str` | The name of the service or application component. Defaults to OTEL_SERVICE_NAME env var or platform.node(). | | `api_key` | `Optional[str]` | Authentication key for Patronus services. | | `api_url` | `str` | URL for the Patronus API service. Default: https://api.patronus.ai | | `otel_endpoint` | `str` | Endpoint for OpenTelemetry data collection. Default: https://otel.patronus.ai:4317 | | `otel_exporter_otlp_protocol` | `Optional[Literal['grpc', 'http/protobuf']]` | OpenTelemetry exporter protocol. Values: grpc, http/protobuf. Falls back to standard OTEL environment variables if not set. | | `ui_url` | `str` | URL for the Patronus UI. Default: https://app.patronus.ai | | `timeout_s` | `int` | Timeout in seconds for HTTP requests. Default: 300 | | `project_name` | `str` | Name of the project for organizing evaluations and experiments. Default: Global | | `app` | `str` | Name of the application within the project. Default: default | ### config ```python config() -> Config ``` Returns the Patronus SDK configuration singleton. Configuration is loaded from environment variables and the patronus.yaml file (if present) when this function is first called. Returns: | Name | Type | Description | | --- | --- | --- | | `Config` | `Config` | A singleton Config object containing all Patronus configuration settings. | Example ```python from patronus.config import config # Get the configuration cfg = config() # Access configuration values api_key = cfg.api_key project_name = cfg.project_name ``` Source code in `src/patronus/config.py` ````python @functools.lru_cache() def config() -> Config: """ Returns the Patronus SDK configuration singleton. Configuration is loaded from environment variables and the patronus.yaml file (if present) when this function is first called. Returns: Config: A singleton Config object containing all Patronus configuration settings. Example: ```python from patronus.config import config # Get the configuration cfg = config() # Access configuration values api_key = cfg.api_key project_name = cfg.project_name ``` """ cfg = Config() return cfg ```` # Context ## patronus.context Context management for Patronus SDK. This module provides classes and utility functions for managing the global Patronus context and accessing different components of the SDK like logging, tracing, and API clients. ### PatronusScope ```python PatronusScope( service: Optional[str], project_name: Optional[str], app: Optional[str], experiment_id: Optional[str], experiment_name: Optional[str], ) ``` Scope information for Patronus context. Defines the scope of the current Patronus application or experiment. Attributes: | Name | Type | Description | | --- | --- | --- | | `service` | `Optional[str]` | The service name as defined in OTeL. | | `project_name` | `Optional[str]` | The project name. | | `app` | `Optional[str]` | The application name. | | `experiment_id` | `Optional[str]` | The unique identifier for the experiment. | | `experiment_name` | `Optional[str]` | The name of the experiment. | ### PromptsConfig ```python PromptsConfig( directory: Path, providers: list[str], templating_engine: str, ) ``` #### directory ```python directory: Path ``` The absolute path to a directory where prompts are stored locally. #### providers ```python providers: list[str] ``` List of default prompt providers. #### templating_engine ```python templating_engine: str ``` Default prompt templating engine. ### PatronusContext ```python PatronusContext( scope: PatronusScope, tracer_provider: TracerProvider, logger_provider: LoggerProvider, api_client_deprecated: PatronusAPIClient, api_client: Client, async_api_client: AsyncClient, exporter: BatchEvaluationExporter, prompts: PromptsConfig, ) ``` Context object for Patronus SDK. Contains all the necessary components for the SDK to function properly. Attributes: | Name | Type | Description | | --- | --- | --- | | `scope` | `PatronusScope` | Scope information for this context. | | `tracer_provider` | `TracerProvider` | The OpenTelemetry tracer provider. | | `logger_provider` | `LoggerProvider` | The OpenTelemetry logger provider. | | `api_client_deprecated` | `PatronusAPIClient` | Client for Patronus API communication (deprecated). | | `api_client` | `Client` | Client for Patronus API communication using the modern client. | | `async_api_client` | `AsyncClient` | Asynchronous client for Patronus API communication. | | `exporter` | `BatchEvaluationExporter` | Exporter for batch evaluation results. | | `prompts` | `PromptsConfig` | Configuration for prompt management. | ### set_global_patronus_context ```python set_global_patronus_context(ctx: PatronusContext) ``` Set the global Patronus context. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context to set globally. | *required* | Source code in `src/patronus/context/__init__.py` ```python def set_global_patronus_context(ctx: PatronusContext): """ Set the global Patronus context. Args: ctx: The Patronus context to set globally. """ _CTX_PAT.set_global(ctx) ``` ### get_current_context_or_none ```python get_current_context_or_none() -> Optional[PatronusContext] ``` Get the current Patronus context or None if not initialized. Returns: | Type | Description | | --- | --- | | `Optional[PatronusContext]` | The current PatronusContext if set, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_current_context_or_none() -> Optional[PatronusContext]: """ Get the current Patronus context or None if not initialized. Returns: The current PatronusContext if set, otherwise None. """ return _CTX_PAT.get() ``` ### get_current_context ```python get_current_context() -> PatronusContext ``` Get the current Patronus context. Returns: | Type | Description | | --- | --- | | `PatronusContext` | The current PatronusContext. | Raises: | Type | Description | | --- | --- | | `UninitializedError` | If no active Patronus context is found. | Source code in `src/patronus/context/__init__.py` ```python def get_current_context() -> PatronusContext: """ Get the current Patronus context. Returns: The current PatronusContext. Raises: UninitializedError: If no active Patronus context is found. """ ctx = get_current_context_or_none() if ctx is None: raise UninitializedError( "No active Patronus context found. Please initialize the library by calling patronus.init()." ) return ctx ``` ### get_logger ```python get_logger( ctx: Optional[PatronusContext] = None, level: int = logging.INFO, ) -> logging.Logger ``` Get a standard Python logger configured with the Patronus context. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | | `level` | `int` | The logging level to set. Defaults to INFO. | `INFO` | Returns: | Type | Description | | --- | --- | | `Logger` | A configured Python logger. | Source code in `src/patronus/context/__init__.py` ```python def get_logger(ctx: Optional[PatronusContext] = None, level: int = logging.INFO) -> logging.Logger: """ Get a standard Python logger configured with the Patronus context. Args: ctx: The Patronus context to use. If None, uses the current context. level: The logging level to set. Defaults to INFO. Returns: A configured Python logger. """ from patronus.tracing.logger import set_logger_handler ctx = ctx or get_current_context() logger = logging.getLogger("patronus.sdk") set_logger_handler(logger, ctx.scope, ctx.logger_provider) logger.setLevel(level) return logger ``` ### get_logger_or_none ```python get_logger_or_none( level: int = logging.INFO, ) -> Optional[logging.Logger] ``` Get a standard Python logger or None if context is not initialized. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `level` | `int` | The logging level to set. Defaults to INFO. | `INFO` | Returns: | Type | Description | | --- | --- | | `Optional[Logger]` | A configured Python logger if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_logger_or_none(level: int = logging.INFO) -> Optional[logging.Logger]: """ Get a standard Python logger or None if context is not initialized. Args: level: The logging level to set. Defaults to INFO. Returns: A configured Python logger if context is available, otherwise None. """ ctx = get_current_context() if ctx is None: return None return get_logger(ctx, level=level) ``` ### get_pat_logger ```python get_pat_logger( ctx: Optional[PatronusContext] = None, ) -> PatLogger ``` Get a Patronus logger. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `Logger` | A Patronus logger. | Source code in `src/patronus/context/__init__.py` ```python def get_pat_logger(ctx: Optional[PatronusContext] = None) -> "PatLogger": """ Get a Patronus logger. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: A Patronus logger. """ ctx = ctx or get_current_context() return ctx.logger_provider.get_logger("patronus.sdk") ``` ### get_pat_logger_or_none ```python get_pat_logger_or_none() -> Optional[PatLogger] ``` Get a Patronus logger or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[Logger]` | A Patronus logger if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_pat_logger_or_none() -> Optional["PatLogger"]: """ Get a Patronus logger or None if context is not initialized. Returns: A Patronus logger if context is available, otherwise None. """ ctx = get_current_context_or_none() if ctx is None: return None return ctx.logger_provider.get_logger("patronus.sdk") ``` ### get_tracer ```python get_tracer( ctx: Optional[PatronusContext] = None, ) -> trace.Tracer ``` Get an OpenTelemetry tracer. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `Tracer` | An OpenTelemetry tracer. | Source code in `src/patronus/context/__init__.py` ```python def get_tracer(ctx: Optional[PatronusContext] = None) -> trace.Tracer: """ Get an OpenTelemetry tracer. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: An OpenTelemetry tracer. """ ctx = ctx or get_current_context() return ctx.tracer_provider.get_tracer("patronus.sdk") ``` ### get_tracer_or_none ```python get_tracer_or_none() -> Optional[trace.Tracer] ``` Get an OpenTelemetry tracer or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[Tracer]` | An OpenTelemetry tracer if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_tracer_or_none() -> Optional[trace.Tracer]: """ Get an OpenTelemetry tracer or None if context is not initialized. Returns: An OpenTelemetry tracer if context is available, otherwise None. """ ctx = get_current_context_or_none() if ctx is None: return None return ctx.tracer_provider.get_tracer("patronus.sdk") ``` ### get_api_client_deprecated ```python get_api_client_deprecated( ctx: Optional[PatronusContext] = None, ) -> PatronusAPIClient ``` Get the Patronus API client. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `PatronusAPIClient` | The Patronus API client. | Source code in `src/patronus/context/__init__.py` ```python def get_api_client_deprecated(ctx: Optional[PatronusContext] = None) -> "PatronusAPIClient": """ Get the Patronus API client. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: The Patronus API client. """ ctx = ctx or get_current_context() return ctx.api_client_deprecated ``` ### get_api_client_deprecated_or_none ```python get_api_client_deprecated_or_none() -> Optional[ PatronusAPIClient ] ``` Get the Patronus API client or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[PatronusAPIClient]` | The Patronus API client if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_api_client_deprecated_or_none() -> Optional["PatronusAPIClient"]: """ Get the Patronus API client or None if context is not initialized. Returns: The Patronus API client if context is available, otherwise None. """ return (ctx := get_current_context_or_none()) and ctx.api_client_deprecated ``` ### get_api_client ```python get_api_client( ctx: Optional[PatronusContext] = None, ) -> patronus_api.Client ``` Get the Patronus API client. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `Client` | The Patronus API client. | Source code in `src/patronus/context/__init__.py` ```python def get_api_client(ctx: Optional[PatronusContext] = None) -> patronus_api.Client: """ Get the Patronus API client. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: The Patronus API client. """ ctx = ctx or get_current_context() return ctx.api_client ``` ### get_api_client_or_none ```python get_api_client_or_none() -> Optional[patronus_api.Client] ``` Get the Patronus API client or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[Client]` | The Patronus API client if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_api_client_or_none() -> Optional[patronus_api.Client]: """ Get the Patronus API client or None if context is not initialized. Returns: The Patronus API client if context is available, otherwise None. """ return (ctx := get_current_context_or_none()) and ctx.api_client ``` ### get_async_api_client ```python get_async_api_client( ctx: Optional[PatronusContext] = None, ) -> patronus_api.AsyncClient ``` Get the asynchronous Patronus API client. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `AsyncClient` | The asynchronous Patronus API client. | Source code in `src/patronus/context/__init__.py` ```python def get_async_api_client(ctx: Optional[PatronusContext] = None) -> patronus_api.AsyncClient: """ Get the asynchronous Patronus API client. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: The asynchronous Patronus API client. """ ctx = ctx or get_current_context() return ctx.async_api_client ``` ### get_async_api_client_or_none ```python get_async_api_client_or_none() -> Optional[ patronus_api.AsyncClient ] ``` Get the asynchronous Patronus API client or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[AsyncClient]` | The asynchronous Patronus API client if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_async_api_client_or_none() -> Optional[patronus_api.AsyncClient]: """ Get the asynchronous Patronus API client or None if context is not initialized. Returns: The asynchronous Patronus API client if context is available, otherwise None. """ return (ctx := get_current_context_or_none()) and ctx.async_api_client ``` ### get_exporter ```python get_exporter( ctx: Optional[PatronusContext] = None, ) -> BatchEvaluationExporter ``` Get the batch evaluation exporter. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `BatchEvaluationExporter` | The batch evaluation exporter. | Source code in `src/patronus/context/__init__.py` ```python def get_exporter(ctx: Optional[PatronusContext] = None) -> "BatchEvaluationExporter": """ Get the batch evaluation exporter. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: The batch evaluation exporter. """ ctx = ctx or get_current_context() return ctx.exporter ``` ### get_exporter_or_none ```python get_exporter_or_none() -> Optional[BatchEvaluationExporter] ``` Get the batch evaluation exporter or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[BatchEvaluationExporter]` | The batch evaluation exporter if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_exporter_or_none() -> Optional["BatchEvaluationExporter"]: """ Get the batch evaluation exporter or None if context is not initialized. Returns: The batch evaluation exporter if context is available, otherwise None. """ return (ctx := get_current_context_or_none()) and ctx.exporter ``` ### get_scope ```python get_scope( ctx: Optional[PatronusContext] = None, ) -> PatronusScope ``` Get the Patronus scope. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `PatronusScope` | The Patronus scope. | Source code in `src/patronus/context/__init__.py` ```python def get_scope(ctx: Optional[PatronusContext] = None) -> PatronusScope: """ Get the Patronus scope. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: The Patronus scope. """ ctx = ctx or get_current_context() return ctx.scope ``` ### get_scope_or_none ```python get_scope_or_none() -> Optional[PatronusScope] ``` Get the Patronus scope or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[PatronusScope]` | The Patronus scope if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_scope_or_none() -> Optional[PatronusScope]: """ Get the Patronus scope or None if context is not initialized. Returns: The Patronus scope if context is available, otherwise None. """ return (ctx := get_current_context_or_none()) and ctx.scope ``` ### get_prompts_config ```python get_prompts_config( ctx: Optional[PatronusContext] = None, ) -> PromptsConfig ``` Get the Patronus prompts configuration. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `Optional[PatronusContext]` | The Patronus context to use. If None, uses the current context. | `None` | Returns: | Type | Description | | --- | --- | | `PromptsConfig` | The Patronus prompts configuration. | Source code in `src/patronus/context/__init__.py` ```python def get_prompts_config(ctx: Optional[PatronusContext] = None) -> PromptsConfig: """ Get the Patronus prompts configuration. Args: ctx: The Patronus context to use. If None, uses the current context. Returns: The Patronus prompts configuration. """ ctx = ctx or get_current_context() return ctx.prompts ``` ### get_prompts_config_or_none ```python get_prompts_config_or_none() -> Optional[PromptsConfig] ``` Get the Patronus prompts configuration or None if context is not initialized. Returns: | Type | Description | | --- | --- | | `Optional[PromptsConfig]` | The Patronus prompts configuration if context is available, otherwise None. | Source code in `src/patronus/context/__init__.py` ```python def get_prompts_config_or_none() -> Optional[PromptsConfig]: """ Get the Patronus prompts configuration or None if context is not initialized. Returns: The Patronus prompts configuration if context is available, otherwise None. """ return (ctx := get_current_context_or_none()) and ctx.prompts ``` # Datasets ## patronus.datasets ### datasets #### Attachment Bases: `TypedDict` Represent an attachment entry. Usually used in context of multimodal evaluation. #### Fields Bases: `TypedDict` A TypedDict class representing fields for a structured data entity. Attributes: | Name | Type | Description | | --- | --- | --- | | `sid` | `NotRequired[Optional[str]]` | An optional identifier for the system or session. | | `system_prompt` | `NotRequired[Optional[str]]` | An optional string representing the system prompt associated with the task. | | `task_context` | `NotRequired[Union[str, list[str], None]]` | Optional contextual information for the task in the form of a string or a list of strings. | | `task_attachments` | `NotRequired[Optional[list[Attachment]]]` | Optional list of attachments associated with the task. | | `task_input` | `NotRequired[Optional[str]]` | An optional string representing the input data for the task. Usually a user input sent to an LLM. | | `task_output` | `NotRequired[Optional[str]]` | An optional string representing the output result of the task. Usually a response from an LLM. | | `gold_answer` | `NotRequired[Optional[str]]` | An optional string representing the correct or expected answer for evaluation purposes. | | `task_metadata` | `NotRequired[Optional[dict[str, Any]]]` | Optional dictionary containing metadata associated with the task. | | `tags` | `NotRequired[Optional[dict[str, str]]]` | Optional dictionary holding additional key-value pair tags relevant to the task. | #### Row ```python Row(_row: Series) ``` Represents a data row encapsulating access to properties in a pandas Series. Provides attribute-based access to underlying pandas Series data with properties that ensure compatibility with structured evaluators through consistent field naming and type handling. #### Dataset ```python Dataset(dataset_id: Optional[str], df: DataFrame) ``` Represents a dataset. ##### from_records ```python from_records( records: Union[ Iterable[Fields], Iterable[dict[str, Any]] ], dataset_id: Optional[str] = None, ) -> te.Self ``` Creates an instance of the class by processing and sanitizing provided records and optionally associating them with a specific dataset ID. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `records` | `Union[Iterable[Fields], Iterable[dict[str, Any]]]` | A collection of records to initialize the instance. Each record can either be an instance of Fields or a dictionary containing corresponding data. | *required* | | `dataset_id` | `Optional[str]` | An optional identifier for associating the data with a specific dataset. | `None` | Returns: | Type | Description | | --- | --- | | `Self` | te.Self: A new instance of the class with the processed and sanitized data. | Source code in `src/patronus/datasets/datasets.py` ```python @classmethod def from_records( cls, records: Union[typing.Iterable[Fields], typing.Iterable[dict[str, typing.Any]]], dataset_id: Optional[str] = None, ) -> te.Self: """ Creates an instance of the class by processing and sanitizing provided records and optionally associating them with a specific dataset ID. Args: records: A collection of records to initialize the instance. Each record can either be an instance of `Fields` or a dictionary containing corresponding data. dataset_id: An optional identifier for associating the data with a specific dataset. Returns: te.Self: A new instance of the class with the processed and sanitized data. """ df = pd.DataFrame.from_records(records) df = cls.__sanitize_df(df, dataset_id) return cls(df=df, dataset_id=dataset_id) ``` ##### to_csv ```python to_csv( path_or_buf: Union[str, Path, IO[AnyStr]], **kwargs: Any ) -> Optional[str] ``` Saves dataset to a CSV file. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `path_or_buf` | `Union[str, Path, IO[AnyStr]]` | String path or file-like object where the CSV will be saved. | *required* | | `**kwargs` | `Any` | Additional arguments passed to pandas.DataFrame.to_csv(). | `{}` | Returns: | Type | Description | | --- | --- | | `Optional[str]` | String path if a path was specified and return_path is True, otherwise None. | Source code in `src/patronus/datasets/datasets.py` ```python def to_csv( self, path_or_buf: Union[str, pathlib.Path, typing.IO[typing.AnyStr]], **kwargs: typing.Any ) -> Optional[str]: """ Saves dataset to a CSV file. Args: path_or_buf: String path or file-like object where the CSV will be saved. **kwargs: Additional arguments passed to pandas.DataFrame.to_csv(). Returns: String path if a path was specified and return_path is True, otherwise None. """ return self.df.to_csv(path_or_buf, **kwargs) ``` #### DatasetLoader ```python DatasetLoader( loader: Union[ Awaitable[Dataset], Callable[[], Awaitable[Dataset]] ], ) ``` Encapsulates asynchronous loading of a dataset. This class provides a mechanism to lazily load a dataset asynchronously only once, using a provided dataset loader function. Source code in `src/patronus/datasets/datasets.py` ```python def __init__(self, loader: Union[typing.Awaitable[Dataset], typing.Callable[[], typing.Awaitable[Dataset]]]): self.__lock = asyncio.Lock() self.__loader = loader self.dataset: Optional[Dataset] = None ``` ##### load ```python load() -> Dataset ``` Load dataset. Repeated calls will return already loaded dataset. Source code in `src/patronus/datasets/datasets.py` ```python async def load(self) -> Dataset: """ Load dataset. Repeated calls will return already loaded dataset. """ async with self.__lock: if self.dataset is not None: return self.dataset if inspect.iscoroutinefunction(self.__loader): self.dataset = await self.__loader() else: self.dataset = await self.__loader return self.dataset ``` #### read_csv ```python read_csv( filename_or_buffer: Union[str, Path, IO[AnyStr]], *, dataset_id: Optional[str] = None, sid_field: str = "sid", system_prompt_field: str = "system_prompt", task_input_field: str = "task_input", task_context_field: str = "task_context", task_attachments_field: str = "task_attachments", task_output_field: str = "task_output", gold_answer_field: str = "gold_answer", task_metadata_field: str = "task_metadata", tags_field: str = "tags", **kwargs: Any, ) -> Dataset ``` Reads a CSV file and converts it into a Dataset object. The CSV file is transformed into a structured dataset where each field maps to a specific aspect of the dataset schema provided via function arguments. You may specify custom field mappings as per your dataset structure, while additional keyword arguments are passed directly to the underlying 'pd.read_csv' function. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `filename_or_buffer` | `Union[str, Path, IO[AnyStr]]` | Path to the CSV file or a file-like object containing the dataset to be read. | *required* | | `dataset_id` | `Optional[str]` | Optional identifier for the dataset being read. Default is None. | `None` | | `sid_field` | `str` | Name of the column containing unique sample identifiers. | `'sid'` | | `system_prompt_field` | `str` | Name of the column representing the system prompts. | `'system_prompt'` | | `task_input_field` | `str` | Name of the column containing the main input for the task. | `'task_input'` | | `task_context_field` | `str` | Name of the column describing the broader task context. | `'task_context'` | | `task_attachments_field` | `str` | Name of the column with supplementary attachments related to the task. | `'task_attachments'` | | `task_output_field` | `str` | Name of the column containing responses or outputs for the task. | `'task_output'` | | `gold_answer_field` | `str` | Name of the column detailing the expected or correct answer to the task. | `'gold_answer'` | | `task_metadata_field` | `str` | Name of the column storing metadata attributes associated with the task. | `'task_metadata'` | | `tags_field` | `str` | Name of the column containing tags or annotations related to each sample. | `'tags'` | | `**kwargs` | `Any` | Additional keyword arguments passed to 'pandas.read_csv' for fine-tuning the CSV parsing behavior, such as delimiters, encoding, etc. | `{}` | Returns: | Name | Type | Description | | --- | --- | --- | | `Dataset` | `Dataset` | The parsed dataset object containing structured data from the input CSV file. | Source code in `src/patronus/datasets/datasets.py` ```python def read_csv( filename_or_buffer: Union[str, pathlib.Path, typing.IO[typing.AnyStr]], *, dataset_id: Optional[str] = None, sid_field: str = "sid", system_prompt_field: str = "system_prompt", task_input_field: str = "task_input", task_context_field: str = "task_context", task_attachments_field: str = "task_attachments", task_output_field: str = "task_output", gold_answer_field: str = "gold_answer", task_metadata_field: str = "task_metadata", tags_field: str = "tags", **kwargs: typing.Any, ) -> Dataset: """ Reads a CSV file and converts it into a Dataset object. The CSV file is transformed into a structured dataset where each field maps to a specific aspect of the dataset schema provided via function arguments. You may specify custom field mappings as per your dataset structure, while additional keyword arguments are passed directly to the underlying 'pd.read_csv' function. Args: filename_or_buffer: Path to the CSV file or a file-like object containing the dataset to be read. dataset_id: Optional identifier for the dataset being read. Default is None. sid_field: Name of the column containing unique sample identifiers. system_prompt_field: Name of the column representing the system prompts. task_input_field: Name of the column containing the main input for the task. task_context_field: Name of the column describing the broader task context. task_attachments_field: Name of the column with supplementary attachments related to the task. task_output_field: Name of the column containing responses or outputs for the task. gold_answer_field: Name of the column detailing the expected or correct answer to the task. task_metadata_field: Name of the column storing metadata attributes associated with the task. tags_field: Name of the column containing tags or annotations related to each sample. **kwargs: Additional keyword arguments passed to 'pandas.read_csv' for fine-tuning the CSV parsing behavior, such as delimiters, encoding, etc. Returns: Dataset: The parsed dataset object containing structured data from the input CSV file. """ return _read_dataframe( pd.read_csv, filename_or_buffer, dataset_id=dataset_id, sid_field=sid_field, system_prompt_field=system_prompt_field, task_context_field=task_context_field, task_attachments_field=task_attachments_field, task_input_field=task_input_field, task_output_field=task_output_field, gold_answer_field=gold_answer_field, task_metadata_field=task_metadata_field, tags_field=tags_field, **kwargs, ) ``` #### read_jsonl ```python read_jsonl( filename_or_buffer: Union[str, Path, IO[AnyStr]], *, dataset_id: Optional[str] = None, sid_field: str = "sid", system_prompt_field: str = "system_prompt", task_input_field: str = "task_input", task_context_field: str = "task_context", task_attachments_field: str = "task_attachments", task_output_field: str = "task_output", gold_answer_field: str = "gold_answer", task_metadata_field: str = "task_metadata", tags_field: str = "tags", **kwargs: Any, ) -> Dataset ``` Reads a JSONL (JSON Lines) file and transforms it into a Dataset object. This function parses the input data file or buffer in JSON Lines format into a structured format, extracting specified fields and additional metadata for usage in downstream tasks. The field mappings and additional keyword arguments can be customized to accommodate application-specific requirements. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `filename_or_buffer` | `Union[str, Path, IO[AnyStr]]` | The path to the file or a file-like object containing the JSONL data to be read. | *required* | | `dataset_id` | `Optional[str]` | An optional identifier for the dataset being read. Defaults to None. | `None` | | `sid_field` | `str` | The field name in the JSON lines representing the unique identifier for a sample. Defaults to "sid". | `'sid'` | | `system_prompt_field` | `str` | The field name for the system prompt in the JSON lines file. Defaults to "system_prompt". | `'system_prompt'` | | `task_input_field` | `str` | The field name for the task input data in the JSON lines file. Defaults to "task_input". | `'task_input'` | | `task_context_field` | `str` | The field name for the task context data in the JSON lines file. Defaults to "task_context". | `'task_context'` | | `task_attachments_field` | `str` | The field name for any task attachments in the JSON lines file. Defaults to "task_attachments". | `'task_attachments'` | | `task_output_field` | `str` | The field name for task output data in the JSON lines file. Defaults to "task_output". | `'task_output'` | | `gold_answer_field` | `str` | The field name for the gold (ground truth) answer in the JSON lines file. Defaults to "gold_answer". | `'gold_answer'` | | `task_metadata_field` | `str` | The field name for metadata associated with the task in the JSON lines file. Defaults to "task_metadata". | `'task_metadata'` | | `tags_field` | `str` | The field name for tags in the parsed JSON lines file. Defaults to "tags". | `'tags'` | | `**kwargs` | `Any` | Additional keyword arguments to be passed to pd.read_json for customization. The parameter "lines" will be forcibly set to True if not provided. | `{}` | Returns: | Name | Type | Description | | --- | --- | --- | | `Dataset` | `Dataset` | A Dataset object containing the parsed and structured data. | Source code in `src/patronus/datasets/datasets.py` ```python def read_jsonl( filename_or_buffer: Union[str, pathlib.Path, typing.IO[typing.AnyStr]], *, dataset_id: Optional[str] = None, sid_field: str = "sid", system_prompt_field: str = "system_prompt", task_input_field: str = "task_input", task_context_field: str = "task_context", task_attachments_field: str = "task_attachments", task_output_field: str = "task_output", gold_answer_field: str = "gold_answer", task_metadata_field: str = "task_metadata", tags_field: str = "tags", **kwargs: typing.Any, ) -> Dataset: """ Reads a JSONL (JSON Lines) file and transforms it into a Dataset object. This function parses the input data file or buffer in JSON Lines format into a structured format, extracting specified fields and additional metadata for usage in downstream tasks. The field mappings and additional keyword arguments can be customized to accommodate application-specific requirements. Args: filename_or_buffer: The path to the file or a file-like object containing the JSONL data to be read. dataset_id: An optional identifier for the dataset being read. Defaults to None. sid_field: The field name in the JSON lines representing the unique identifier for a sample. Defaults to "sid". system_prompt_field: The field name for the system prompt in the JSON lines file. Defaults to "system_prompt". task_input_field: The field name for the task input data in the JSON lines file. Defaults to "task_input". task_context_field: The field name for the task context data in the JSON lines file. Defaults to "task_context". task_attachments_field: The field name for any task attachments in the JSON lines file. Defaults to "task_attachments". task_output_field: The field name for task output data in the JSON lines file. Defaults to "task_output". gold_answer_field: The field name for the gold (ground truth) answer in the JSON lines file. Defaults to "gold_answer". task_metadata_field: The field name for metadata associated with the task in the JSON lines file. Defaults to "task_metadata". tags_field: The field name for tags in the parsed JSON lines file. Defaults to "tags". **kwargs: Additional keyword arguments to be passed to `pd.read_json` for customization. The parameter "lines" will be forcibly set to True if not provided. Returns: Dataset: A Dataset object containing the parsed and structured data. """ kwargs.setdefault("lines", True) return _read_dataframe( pd.read_json, filename_or_buffer, dataset_id=dataset_id, sid_field=sid_field, system_prompt_field=system_prompt_field, task_context_field=task_context_field, task_attachments_field=task_attachments_field, task_input_field=task_input_field, task_output_field=task_output_field, gold_answer_field=gold_answer_field, task_metadata_field=task_metadata_field, tags_field=tags_field, **kwargs, ) ``` ### remote #### DatasetNotFoundError Bases: `Exception` Raised when a dataset with the specified ID or name is not found #### RemoteDatasetLoader ```python RemoteDatasetLoader( by_name: Optional[str] = None, *, by_id: Optional[str] = None, ) ``` Bases: `DatasetLoader` A loader for datasets stored remotely on the Patronus platform. This class provides functionality to asynchronously load a dataset from the remote API by its name or identifier, handling the fetch operation lazily and ensuring it's only performed once. You can specify either the dataset name or ID, but not both. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `by_name` | `Optional[str]` | The name of the dataset to load. | `None` | | `by_id` | `Optional[str]` | The ID of the dataset to load. | `None` | Source code in `src/patronus/datasets/remote.py` ```python def __init__(self, by_name: Optional[str] = None, *, by_id: Optional[str] = None): """ Initializes a new RemoteDatasetLoader instance. Args: by_name: The name of the dataset to load. by_id: The ID of the dataset to load. """ if not (bool(by_name) ^ bool(by_id)): raise ValueError("Either by_name or by_id must be provided, but not both.") self._dataset_name = by_name self._dataset_id = by_id super().__init__(self._load) ``` # evals ## patronus.evals ### evaluators #### Evaluator ```python Evaluator(weight: Optional[Union[str, float]] = None) ``` Base Evaluator Class Source code in `src/patronus/evals/evaluators.py` ```python def __init__(self, weight: Optional[Union[str, float]] = None): if weight is not None: try: decimal.Decimal(str(weight)) except (decimal.InvalidOperation, ValueError, TypeError): raise TypeError( f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)." ) self.weight = weight ``` ##### evaluate ```python evaluate(*args, **kwargs) -> Optional[EvaluationResult] ``` Synchronous version of evaluate method. When inheriting directly from Evaluator class it's permitted to change parameters signature. Return type should stay unchanged. Source code in `src/patronus/evals/evaluators.py` ```python @abc.abstractmethod def evaluate(self, *args, **kwargs) -> Optional[EvaluationResult]: """ Synchronous version of evaluate method. When inheriting directly from Evaluator class it's permitted to change parameters signature. Return type should stay unchanged. """ ``` #### AsyncEvaluator ```python AsyncEvaluator(weight: Optional[Union[str, float]] = None) ``` Bases: `Evaluator` Source code in `src/patronus/evals/evaluators.py` ```python def __init__(self, weight: Optional[Union[str, float]] = None): if weight is not None: try: decimal.Decimal(str(weight)) except (decimal.InvalidOperation, ValueError, TypeError): raise TypeError( f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)." ) self.weight = weight ``` ##### evaluate ```python evaluate(*args, **kwargs) -> Optional[EvaluationResult] ``` Asynchronous version of evaluate method. When inheriting directly from Evaluator class it's permitted to change parameters signature. Return type should stay unchanged. Source code in `src/patronus/evals/evaluators.py` ```python @abc.abstractmethod async def evaluate(self, *args, **kwargs) -> Optional[EvaluationResult]: """ Asynchronous version of evaluate method. When inheriting directly from Evaluator class it's permitted to change parameters signature. Return type should stay unchanged. """ ``` #### StructuredEvaluator ```python StructuredEvaluator( weight: Optional[Union[str, float]] = None, ) ``` Bases: `Evaluator` Base for structured evaluators Source code in `src/patronus/evals/evaluators.py` ```python def __init__(self, weight: Optional[Union[str, float]] = None): if weight is not None: try: decimal.Decimal(str(weight)) except (decimal.InvalidOperation, ValueError, TypeError): raise TypeError( f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)." ) self.weight = weight ``` #### AsyncStructuredEvaluator ```python AsyncStructuredEvaluator( weight: Optional[Union[str, float]] = None, ) ``` Bases: `AsyncEvaluator` Base for async structured evaluators Source code in `src/patronus/evals/evaluators.py` ```python def __init__(self, weight: Optional[Union[str, float]] = None): if weight is not None: try: decimal.Decimal(str(weight)) except (decimal.InvalidOperation, ValueError, TypeError): raise TypeError( f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)." ) self.weight = weight ``` #### RemoteEvaluatorMixin ```python RemoteEvaluatorMixin( evaluator_id_or_alias: str, criteria: Optional[str] = None, *, tags: Optional[dict[str, str]] = None, explain_strategy: Literal[ "never", "on-fail", "on-success", "always" ] = "always", criteria_config: Optional[dict[str, Any]] = None, allow_update: bool = False, max_attempts: int = 3, api_: Optional[PatronusAPIClient] = None, weight: Optional[Union[str, float]] = None, ) ``` Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `evaluator_id_or_alias` | `str` | The ID or alias of the evaluator to use. | *required* | | `criteria` | `Optional[str]` | The criteria name to use for evaluation. If not provided, the evaluator's default criteria will be used. | `None` | | `tags` | `Optional[dict[str, str]]` | Optional tags to attach to evaluations. | `None` | | `explain_strategy` | `Literal['never', 'on-fail', 'on-success', 'always']` | When to generate explanations for evaluations. Options are "never", "on-fail", "on-success", or "always". | `'always'` | | `criteria_config` | `Optional[dict[str, Any]]` | Configuration for the criteria. (Currently unused) | `None` | | `allow_update` | `bool` | Whether to allow updates. (Currently unused) | `False` | | `max_attempts` | `int` | Maximum number of retry attempts. (Currently unused) | `3` | | `api_` | `Optional[PatronusAPIClient]` | Optional API client instance. If not provided, will use the default client from context. | `None` | | `weight` | `Optional[Union[str, float]]` | Optional weight for the evaluator. This is only used within the Patronus Experimentation Framework to indicate the relative importance of evaluators. Must be a valid decimal number (string or float). Weights are stored as experiment metadata and do not affect standalone evaluator usage. | `None` | Source code in `src/patronus/evals/evaluators.py` ```python def __init__( self, evaluator_id_or_alias: str, criteria: Optional[str] = None, *, tags: Optional[dict[str, str]] = None, explain_strategy: typing.Literal["never", "on-fail", "on-success", "always"] = "always", criteria_config: Optional[dict[str, typing.Any]] = None, allow_update: bool = False, max_attempts: int = 3, api_: Optional[PatronusAPIClient] = None, weight: Optional[Union[str, float]] = None, ): """Initialize a remote evaluator. Args: evaluator_id_or_alias: The ID or alias of the evaluator to use. criteria: The criteria name to use for evaluation. If not provided, the evaluator's default criteria will be used. tags: Optional tags to attach to evaluations. explain_strategy: When to generate explanations for evaluations. Options are "never", "on-fail", "on-success", or "always". criteria_config: Configuration for the criteria. (Currently unused) allow_update: Whether to allow updates. (Currently unused) max_attempts: Maximum number of retry attempts. (Currently unused) api_: Optional API client instance. If not provided, will use the default client from context. weight: Optional weight for the evaluator. This is only used within the Patronus Experimentation Framework to indicate the relative importance of evaluators. Must be a valid decimal number (string or float). Weights are stored as experiment metadata and do not affect standalone evaluator usage. """ self.evaluator_id_or_alias = evaluator_id_or_alias self.evaluator_id = None self.criteria = criteria self.tags = tags or {} self.explain_strategy = explain_strategy self.criteria_config = criteria_config self.allow_update = allow_update self.max_attempts = max_attempts self._api = api_ self._resolved = False self.weight = weight self._load_lock = threading.Lock() self._async_load_lock = asyncio.Lock() ``` #### RemoteEvaluator ```python RemoteEvaluator( evaluator_id_or_alias: str, criteria: Optional[str] = None, *, tags: Optional[dict[str, str]] = None, explain_strategy: Literal[ "never", "on-fail", "on-success", "always" ] = "always", criteria_config: Optional[dict[str, Any]] = None, allow_update: bool = False, max_attempts: int = 3, api_: Optional[PatronusAPIClient] = None, weight: Optional[Union[str, float]] = None, ) ``` Bases: `RemoteEvaluatorMixin`, `StructuredEvaluator` Synchronous remote evaluator Source code in `src/patronus/evals/evaluators.py` ```python def __init__( self, evaluator_id_or_alias: str, criteria: Optional[str] = None, *, tags: Optional[dict[str, str]] = None, explain_strategy: typing.Literal["never", "on-fail", "on-success", "always"] = "always", criteria_config: Optional[dict[str, typing.Any]] = None, allow_update: bool = False, max_attempts: int = 3, api_: Optional[PatronusAPIClient] = None, weight: Optional[Union[str, float]] = None, ): """Initialize a remote evaluator. Args: evaluator_id_or_alias: The ID or alias of the evaluator to use. criteria: The criteria name to use for evaluation. If not provided, the evaluator's default criteria will be used. tags: Optional tags to attach to evaluations. explain_strategy: When to generate explanations for evaluations. Options are "never", "on-fail", "on-success", or "always". criteria_config: Configuration for the criteria. (Currently unused) allow_update: Whether to allow updates. (Currently unused) max_attempts: Maximum number of retry attempts. (Currently unused) api_: Optional API client instance. If not provided, will use the default client from context. weight: Optional weight for the evaluator. This is only used within the Patronus Experimentation Framework to indicate the relative importance of evaluators. Must be a valid decimal number (string or float). Weights are stored as experiment metadata and do not affect standalone evaluator usage. """ self.evaluator_id_or_alias = evaluator_id_or_alias self.evaluator_id = None self.criteria = criteria self.tags = tags or {} self.explain_strategy = explain_strategy self.criteria_config = criteria_config self.allow_update = allow_update self.max_attempts = max_attempts self._api = api_ self._resolved = False self.weight = weight self._load_lock = threading.Lock() self._async_load_lock = asyncio.Lock() ``` ##### evaluate ```python evaluate( *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_attachments: Union[list[Any], None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[Dict[str, Any]] = None, **kwargs: Any, ) -> EvaluationResult ``` Evaluates data using remote Patronus Evaluator Source code in `src/patronus/evals/evaluators.py` ```python def evaluate( self, *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_attachments: Union[list[Any], None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[typing.Dict[str, typing.Any]] = None, **kwargs: Any, ) -> EvaluationResult: """Evaluates data using remote Patronus Evaluator""" kws = { "system_prompt": system_prompt, "task_context": task_context, "task_attachments": task_attachments, "task_input": task_input, "task_output": task_output, "gold_answer": gold_answer, "task_metadata": task_metadata, **kwargs, } log_id = get_current_log_id(bound_arguments=kws) attrs = get_context_evaluation_attributes() tags = {**self.tags} if t := attrs["tags"]: tags.update(t) tags = merge_tags(tags, kwargs.get("tags"), attrs["experiment_tags"]) if tags: kws["tags"] = tags if did := attrs["dataset_id"]: kws["dataset_id"] = did if sid := attrs["dataset_sample_id"]: kws["dataset_sample_id"] = sid resp = retry()(self._evaluate)(log_id=log_id, **kws) return self._translate_response(resp) ``` #### AsyncRemoteEvaluator ```python AsyncRemoteEvaluator( evaluator_id_or_alias: str, criteria: Optional[str] = None, *, tags: Optional[dict[str, str]] = None, explain_strategy: Literal[ "never", "on-fail", "on-success", "always" ] = "always", criteria_config: Optional[dict[str, Any]] = None, allow_update: bool = False, max_attempts: int = 3, api_: Optional[PatronusAPIClient] = None, weight: Optional[Union[str, float]] = None, ) ``` Bases: `RemoteEvaluatorMixin`, `AsyncStructuredEvaluator` Asynchronous remote evaluator Source code in `src/patronus/evals/evaluators.py` ```python def __init__( self, evaluator_id_or_alias: str, criteria: Optional[str] = None, *, tags: Optional[dict[str, str]] = None, explain_strategy: typing.Literal["never", "on-fail", "on-success", "always"] = "always", criteria_config: Optional[dict[str, typing.Any]] = None, allow_update: bool = False, max_attempts: int = 3, api_: Optional[PatronusAPIClient] = None, weight: Optional[Union[str, float]] = None, ): """Initialize a remote evaluator. Args: evaluator_id_or_alias: The ID or alias of the evaluator to use. criteria: The criteria name to use for evaluation. If not provided, the evaluator's default criteria will be used. tags: Optional tags to attach to evaluations. explain_strategy: When to generate explanations for evaluations. Options are "never", "on-fail", "on-success", or "always". criteria_config: Configuration for the criteria. (Currently unused) allow_update: Whether to allow updates. (Currently unused) max_attempts: Maximum number of retry attempts. (Currently unused) api_: Optional API client instance. If not provided, will use the default client from context. weight: Optional weight for the evaluator. This is only used within the Patronus Experimentation Framework to indicate the relative importance of evaluators. Must be a valid decimal number (string or float). Weights are stored as experiment metadata and do not affect standalone evaluator usage. """ self.evaluator_id_or_alias = evaluator_id_or_alias self.evaluator_id = None self.criteria = criteria self.tags = tags or {} self.explain_strategy = explain_strategy self.criteria_config = criteria_config self.allow_update = allow_update self.max_attempts = max_attempts self._api = api_ self._resolved = False self.weight = weight self._load_lock = threading.Lock() self._async_load_lock = asyncio.Lock() ``` ##### evaluate ```python evaluate( *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_attachments: Union[list[Any], None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[Dict[str, Any]] = None, **kwargs: Any, ) -> EvaluationResult ``` Evaluates data using remote Patronus Evaluator Source code in `src/patronus/evals/evaluators.py` ```python async def evaluate( self, *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_attachments: Union[list[Any], None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[typing.Dict[str, typing.Any]] = None, **kwargs: Any, ) -> EvaluationResult: """Evaluates data using remote Patronus Evaluator""" kws = { "system_prompt": system_prompt, "task_context": task_context, "task_attachments": task_attachments, "task_input": task_input, "task_output": task_output, "gold_answer": gold_answer, "task_metadata": task_metadata, **kwargs, } log_id = get_current_log_id(bound_arguments=kws) attrs = get_context_evaluation_attributes() tags = {**self.tags} if t := attrs["tags"]: tags.update(t) tags = merge_tags(tags, kwargs.get("tags"), attrs["experiment_tags"]) if tags: kws["tags"] = tags if did := attrs["dataset_id"]: kws["dataset_id"] = did if sid := attrs["dataset_sample_id"]: kws["dataset_sample_id"] = sid resp = await retry()(self._evaluate)(log_id=log_id, **kws) return self._translate_response(resp) ``` #### get_current_log_id ```python get_current_log_id( bound_arguments: dict[str, Any], ) -> Optional[LogID] ``` Return log_id for given arguments in current context. Returns None if there is no context - most likely SDK is not initialized. Source code in `src/patronus/evals/evaluators.py` ```python def get_current_log_id(bound_arguments: dict[str, Any]) -> Optional[LogID]: """ Return log_id for given arguments in current context. Returns None if there is no context - most likely SDK is not initialized. """ eval_group = _ctx_evaluation_log_group.get(None) if eval_group is None: return None log_id = eval_group.find_log(bound_arguments) if log_id is None: raise ValueError("Log not found for provided arguments") return log_id ``` #### bundled_eval ```python bundled_eval( span_name: str = "Evaluation bundle", attributes: Optional[dict[str, str]] = None, ) ``` Start a span that would automatically bundle evaluations. Evaluations are passed by arguments passed to the evaluators called inside the context manager. The following example would create two bundles: - fist with arguments `x=10, y=20` - second with arguments `spam="abc123"` ```python with bundled_eval(): foo_evaluator(x=10, y=20) bar_evaluator(x=10, y=20) tar_evaluator(spam="abc123") ``` Source code in `src/patronus/evals/evaluators.py` ````python @contextlib.contextmanager def bundled_eval(span_name: str = "Evaluation bundle", attributes: Optional[dict[str, str]] = None): """ Start a span that would automatically bundle evaluations. Evaluations are passed by arguments passed to the evaluators called inside the context manager. The following example would create two bundles: - fist with arguments `x=10, y=20` - second with arguments `spam="abc123"` ```python with bundled_eval(): foo_evaluator(x=10, y=20) bar_evaluator(x=10, y=20) tar_evaluator(spam="abc123") ``` """ tracer = context.get_tracer_or_none() if tracer is None: yield return attributes = { **(attributes or {}), Attributes.span_type.value: SpanTypes.eval.value, } with tracer.start_as_current_span(span_name, attributes=attributes): with _start_evaluation_log_group(): yield ```` #### evaluator ```python evaluator( _fn: Optional[Callable[..., Any]] = None, *, evaluator_id: Union[ str, Callable[[], str], None ] = None, criteria: Union[str, Callable[[], str], None] = None, metric_name: Optional[str] = None, metric_description: Optional[str] = None, is_method: bool = False, span_name: Optional[str] = None, log_none_arguments: bool = False, **kwargs: Any, ) -> typing.Callable[..., typing.Any] ``` Decorator for creating functional-style evaluators that log execution and results. This decorator works with both synchronous and asynchronous functions. The decorator doesn't modify the function's return value, but records it after converting to an EvaluationResult. Evaluators can return different types which are automatically converted to `EvaluationResult` objects: - `bool`: `True`/`False` indicating pass/fail. - `float`/`int`: Numerical scores (typically between 0-1). - `str`: Text output categorizing the result. - EvaluationResult: Complete evaluation with scores, explanations, etc. - `None`: Indicates evaluation was skipped and no result will be recorded. Evaluation results are exported in the background without blocking execution. The SDK must be initialized with `patronus.init()` for evaluations to be recorded, though decorated functions will still execute even without initialization. The evaluator integrates with a context-based system to identify and handle shared evaluation logging and tracing spans. **Example:** ```python from patronus import init, evaluator from patronus.evals import EvaluationResult # Initialize the SDK to record evaluations init() # Simple evaluator function @evaluator() def exact_match(actual: str, expected: str) -> bool: return actual.strip() == expected.strip() # More complex evaluator with detailed result @evaluator() def semantic_match(actual: str, expected: str) -> EvaluationResult: similarity = calculate_similarity(actual, expected) # Your similarity function return EvaluationResult( score=similarity, pass_=similarity > 0.8, text_output="High similarity" if similarity > 0.8 else "Low similarity", explanation=f"Calculated similarity: {similarity}" ) # Use the evaluators result = exact_match("Hello world", "Hello world") print(f"Match: {result}") # Output: Match: True ``` Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `_fn` | `Optional[Callable[..., Any]]` | The function to be decorated. | `None` | | `evaluator_id` | `Union[str, Callable[[], str], None]` | Name for the evaluator. Defaults to function name (or class name in case of class based evaluators). | `None` | | `criteria` | `Union[str, Callable[[], str], None]` | Name of the criteria used by the evaluator. The use of the criteria is only recommended in more complex evaluator setups where evaluation algorithm changes depending on a criteria (think strategy pattern). | `None` | | `metric_name` | `Optional[str]` | Name for the evaluation metric. Defaults to evaluator_id value. | `None` | | `metric_description` | `Optional[str]` | The description of the metric used for evaluation. If not provided then the docstring of the wrapped function is used for this value. | `None` | | `is_method` | `bool` | Whether the wrapped function is a method. This value is used to determine whether to remove "self" argument from the log. It also allows for dynamic evaluator_id and criteria discovery based on get_evaluator_id() and get_criteria_id() methods. User-code usually shouldn't use it as long as user defined class-based evaluators inherit from the library provided Evaluator base classes. | `False` | | `span_name` | `Optional[str]` | Name of the span to represent this evaluation in the tracing system. Defaults to None, in which case a default name is generated based on the evaluator. | `None` | | `log_none_arguments` | `bool` | Controls whether arguments with None values are included in log output. This setting affects only logging behavior and has no impact on function execution. Note: Only applies to top-level arguments. For nested structures like dictionaries, None values will always be logged regardless of this setting. | `False` | | `**kwargs` | `Any` | Additional keyword arguments that may be passed to the decorator or its internal methods. | `{}` | Returns: | Name | Type | Description | | --- | --- | --- | | `Callable` | `Callable[..., Any]` | Returns the decorated function with additional evaluation behavior, suitable for synchronous or asynchronous usage. | Note For evaluations that need to be compatible with experiments, consider using StructuredEvaluator or AsyncStructuredEvaluator classes instead. Source code in `src/patronus/evals/evaluators.py` ````python def evaluator( _fn: Optional[typing.Callable[..., typing.Any]] = None, *, evaluator_id: Union[str, typing.Callable[[], str], None] = None, criteria: Union[str, typing.Callable[[], str], None] = None, metric_name: Optional[str] = None, metric_description: Optional[str] = None, is_method: bool = False, span_name: Optional[str] = None, log_none_arguments: bool = False, **kwargs: typing.Any, ) -> typing.Callable[..., typing.Any]: """ Decorator for creating functional-style evaluators that log execution and results. This decorator works with both synchronous and asynchronous functions. The decorator doesn't modify the function's return value, but records it after converting to an EvaluationResult. Evaluators can return different types which are automatically converted to `EvaluationResult` objects: * `bool`: `True`/`False` indicating pass/fail. * `float`/`int`: Numerical scores (typically between 0-1). * `str`: Text output categorizing the result. * [EvaluationResult][patronus.evals.types.EvaluationResult]: Complete evaluation with scores, explanations, etc. * `None`: Indicates evaluation was skipped and no result will be recorded. Evaluation results are exported in the background without blocking execution. The SDK must be initialized with `patronus.init()` for evaluations to be recorded, though decorated functions will still execute even without initialization. The evaluator integrates with a context-based system to identify and handle shared evaluation logging and tracing spans. **Example:** ```python from patronus import init, evaluator from patronus.evals import EvaluationResult # Initialize the SDK to record evaluations init() # Simple evaluator function @evaluator() def exact_match(actual: str, expected: str) -> bool: return actual.strip() == expected.strip() # More complex evaluator with detailed result @evaluator() def semantic_match(actual: str, expected: str) -> EvaluationResult: similarity = calculate_similarity(actual, expected) # Your similarity function return EvaluationResult( score=similarity, pass_=similarity > 0.8, text_output="High similarity" if similarity > 0.8 else "Low similarity", explanation=f"Calculated similarity: {similarity}" ) # Use the evaluators result = exact_match("Hello world", "Hello world") print(f"Match: {result}") # Output: Match: True ``` Args: _fn: The function to be decorated. evaluator_id: Name for the evaluator. Defaults to function name (or class name in case of class based evaluators). criteria: Name of the criteria used by the evaluator. The use of the criteria is only recommended in more complex evaluator setups where evaluation algorithm changes depending on a criteria (think strategy pattern). metric_name: Name for the evaluation metric. Defaults to evaluator_id value. metric_description: The description of the metric used for evaluation. If not provided then the docstring of the wrapped function is used for this value. is_method: Whether the wrapped function is a method. This value is used to determine whether to remove "self" argument from the log. It also allows for dynamic evaluator_id and criteria discovery based on `get_evaluator_id()` and `get_criteria_id()` methods. User-code usually shouldn't use it as long as user defined class-based evaluators inherit from the library provided Evaluator base classes. span_name: Name of the span to represent this evaluation in the tracing system. Defaults to None, in which case a default name is generated based on the evaluator. log_none_arguments: Controls whether arguments with None values are included in log output. This setting affects only logging behavior and has no impact on function execution. Note: Only applies to top-level arguments. For nested structures like dictionaries, None values will always be logged regardless of this setting. **kwargs: Additional keyword arguments that may be passed to the decorator or its internal methods. Returns: Callable: Returns the decorated function with additional evaluation behavior, suitable for synchronous or asynchronous usage. Note: For evaluations that need to be compatible with experiments, consider using [StructuredEvaluator][patronus.evals.evaluators.StructuredEvaluator] or [AsyncStructuredEvaluator][patronus.evals.evaluators.AsyncStructuredEvaluator] classes instead. """ if _fn is not None: return evaluator()(_fn) def decorator(fn): fn_sign = inspect.signature(fn) def _get_eval_id(): return (callable(evaluator_id) and evaluator_id()) or evaluator_id or fn.__name__ def _get_criteria(): return (callable(criteria) and criteria()) or criteria or None def _prep(*fn_args, **fn_kwargs): bound_args = fn_sign.bind(*fn_args, **fn_kwargs) arguments_to_log = _as_applied_argument(fn_sign, bound_args) bound_args.apply_defaults() self_key_name = None instance = None if is_method: self_key_name = next(iter(fn_sign.parameters.keys())) instance = bound_args.arguments[self_key_name] eval_id = None eval_criteria = None if isinstance(instance, Evaluator): eval_id = instance.get_evaluator_id() eval_criteria = instance.get_criteria() if eval_id is None: eval_id = _get_eval_id() if eval_criteria is None: eval_criteria = _get_criteria() met_name = metric_name or eval_id met_description = metric_description or inspect.getdoc(fn) or None disable_export = isinstance(instance, RemoteEvaluatorMixin) and instance._disable_export return PrepEval( span_name=span_name, evaluator_id=eval_id, criteria=eval_criteria, metric_name=met_name, metric_description=met_description, self_key_name=self_key_name, arguments=arguments_to_log, disable_export=disable_export, ) attributes = { Attributes.span_type.value: SpanTypes.eval.value, GenAIAttributes.operation_name.value: OperationNames.eval.value, } @functools.wraps(fn) async def wrapper_async(*fn_args, **fn_kwargs): ctx = context.get_current_context_or_none() if ctx is None: return await fn(*fn_args, **fn_kwargs) prep = _prep(*fn_args, **fn_kwargs) start = time.perf_counter() try: with start_span(prep.display_name(), attributes=attributes): with _get_or_start_evaluation_log_group() as log_group: log_id = log_group.log( logger=context.get_pat_logger(ctx), is_method=is_method, self_key_name=prep.self_key_name, bound_arguments=prep.arguments, log_none_arguments=log_none_arguments, ) ret = await fn(*fn_args, **fn_kwargs) except Exception as e: context.get_logger(ctx).exception(f"Evaluator raised an exception: {e}") raise e if prep.disable_export: return ret elapsed = time.perf_counter() - start handle_eval_output( ctx=ctx, log_id=log_id, evaluator_id=prep.evaluator_id, criteria=prep.criteria, metric_name=prep.metric_name, metric_description=prep.metric_description, ret_value=ret, duration=datetime.timedelta(seconds=elapsed), qualname=fn.__qualname__, ) return ret @functools.wraps(fn) def wrapper_sync(*fn_args, **fn_kwargs): ctx = context.get_current_context_or_none() if ctx is None: return fn(*fn_args, **fn_kwargs) prep = _prep(*fn_args, **fn_kwargs) start = time.perf_counter() try: with start_span(prep.display_name(), attributes=attributes): with _get_or_start_evaluation_log_group() as log_group: log_id = log_group.log( logger=context.get_pat_logger(ctx), is_method=is_method, self_key_name=prep.self_key_name, bound_arguments=prep.arguments, log_none_arguments=log_none_arguments, ) ret = fn(*fn_args, **fn_kwargs) except Exception as e: context.get_logger(ctx).exception(f"Evaluator raised an exception: {e}") raise e if prep.disable_export: return ret elapsed = time.perf_counter() - start handle_eval_output( ctx=ctx, log_id=log_id, evaluator_id=prep.evaluator_id, criteria=prep.criteria, metric_name=prep.metric_name, metric_description=prep.metric_description, ret_value=ret, duration=datetime.timedelta(seconds=elapsed), qualname=fn.__qualname__, ) return ret def _set_attrs(wrapper: Any): wrapper._pat_evaluator = True # _pat_evaluator_id and _pat_criteria_id may be a bit misleading since # may not be correct since actually values for evaluator_id and criteria # are dynamically dispatched for class-based evaluators. # These values will be correct for function evaluators though. wrapper._pat_evaluator_id = _get_eval_id() wrapper._pat_criteria = _get_criteria() if inspect.iscoroutinefunction(fn): _set_attrs(wrapper_async) return wrapper_async else: _set_attrs(wrapper_sync) return wrapper_sync return decorator ```` ### types #### EvaluationResult Bases: `BaseModel` Container for evaluation outcomes including score, pass/fail status, explanations, and metadata. This class stores complete evaluation results with numeric scores, boolean pass/fail statuses, textual outputs, explanations, and arbitrary metadata. Evaluator functions can return instances of this class directly or return simpler types (bool, float, str) which will be automatically converted to EvaluationResult objects during recording. Attributes: | Name | Type | Description | | --- | --- | --- | | `score` | `Optional[float]` | Score of the evaluation. Can be any numerical value, though typically ranges from 0 to 1, where 1 represents the best possible score. | | `pass_` | `Optional[bool]` | Whether the evaluation is considered to pass or fail. | | `text_output` | `Optional[str]` | Text output of the evaluation. Usually used for discrete human-readable category evaluation or as a label for score value. | | `metadata` | `Optional[dict[str, Any]]` | Arbitrary json-serializable metadata about evaluation. | | `explanation` | `Optional[str]` | Human-readable explanation of the evaluation. | | `tags` | `Optional[dict[str, str]]` | Key-value pair metadata. | | `dataset_id` | `Optional[str]` | ID of the dataset associated with evaluated sample. | | `dataset_sample_id` | `Optional[str]` | ID of the sample in a dataset associated with evaluated sample. | | `evaluation_duration` | `Optional[timedelta]` | Duration of the evaluation. In case value is not set, @evaluator decorator and Evaluator classes will set this value automatically. | | `explanation_duration` | `Optional[timedelta]` | Duration of the evaluation explanation. | ##### format ```python format() -> str ``` Format the evaluation result into a readable summary. Source code in `src/patronus/evals/types.py` ```python def format(self) -> str: """ Format the evaluation result into a readable summary. """ md = self.model_dump(exclude_none=True, mode="json") return yaml.dump(md) ``` ##### pretty_print ```python pretty_print(file=None) -> None ``` Pretty prints the formatted content to the specified file or standard output. Source code in `src/patronus/evals/types.py` ```python def pretty_print(self, file=None) -> None: """ Pretty prints the formatted content to the specified file or standard output. """ f = self.format() print(f, file=file) ``` # experiments ## patronus.experiments ### adapters #### BaseEvaluatorAdapter Bases: `ABC` Abstract base class for all evaluator adapters. Evaluator adapters provide a standardized interface between the experiment framework and various types of evaluators (function-based, class-based, etc.). All concrete adapter implementations must inherit from this class and implement the required abstract methods. #### EvaluatorAdapter ```python EvaluatorAdapter(evaluator: Evaluator) ``` Bases: `BaseEvaluatorAdapter` Adapter for class-based evaluators conforming to the Evaluator or AsyncEvaluator protocol. This adapter enables the use of evaluator classes that implement either the Evaluator or AsyncEvaluator interface within the experiment framework. Attributes: | Name | Type | Description | | --- | --- | --- | | `evaluator` | `Union[Evaluator, AsyncEvaluator]` | The evaluator instance to adapt. | **Examples:** ```python import typing from typing import Optional from patronus import datasets from patronus.evals import Evaluator, EvaluationResult from patronus.experiments import run_experiment from patronus.experiments.adapters import EvaluatorAdapter from patronus.experiments.types import TaskResult, EvalParent class MatchEvaluator(Evaluator): def __init__(self, sanitizer=None): if sanitizer is None: sanitizer = lambda x: x self.sanitizer = sanitizer def evaluate(self, actual: str, expected: str) -> EvaluationResult: matched = self.sanitizer(actual) == self.sanitizer(expected) return EvaluationResult(pass_=matched, score=int(matched)) exact_match = MatchEvaluator() fuzzy_match = MatchEvaluator(lambda x: x.strip().lower()) class MatchAdapter(EvaluatorAdapter): def __init__(self, evaluator: MatchEvaluator): super().__init__(evaluator) def transform( self, row: datasets.Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs ) -> tuple[list[typing.Any], dict[str, typing.Any]]: args = [row.task_output, row.gold_answer] kwargs = {} # Passing arguments via kwargs would also work in this case. # kwargs = {"actual": row.task_output, "expected": row.gold_answer} return args, kwargs run_experiment( dataset=[{"task_output": "string ", "gold_answer": "string"}], evaluators=[MatchAdapter(exact_match), MatchAdapter(fuzzy_match)], ) ``` Source code in `src/patronus/experiments/adapters.py` ```python def __init__(self, evaluator: evals.Evaluator): if not isinstance(evaluator, evals.Evaluator): raise TypeError(f"{evaluator} is not {evals.Evaluator.__name__}.") self.evaluator = evaluator ``` ##### transform ```python transform( row: Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: Any, ) -> tuple[list[typing.Any], dict[str, typing.Any]] ``` Transform experiment framework arguments to evaluation method arguments. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` | Returns: | Type | Description | | --- | --- | | `list[Any]` | A list of positional arguments to pass to the evaluator function. | | `dict[str, Any]` | A dictionary of keyword arguments to pass to the evaluator function. | Source code in `src/patronus/experiments/adapters.py` ```python def transform( self, row: datasets.Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: typing.Any, ) -> tuple[list[typing.Any], dict[str, typing.Any]]: """ Transform experiment framework arguments to evaluation method arguments. Args: row: The data row being evaluated. task_result: The result of the task execution, if available. parent: The parent evaluation context. **kwargs: Additional keyword arguments from the experiment. Returns: A list of positional arguments to pass to the evaluator function. A dictionary of keyword arguments to pass to the evaluator function. """ return ( [], {"row": row, "task_result": task_result, "parent": parent, **kwargs}, ) ``` ##### evaluate ```python evaluate( row: Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: Any, ) -> EvaluationResult ``` Evaluate the given row and task result using the adapted evaluator function. This method implements the BaseEvaluatorAdapter.evaluate() protocol. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` | Returns: | Type | Description | | --- | --- | | `EvaluationResult` | An EvaluationResult containing the evaluation outcome. | Source code in `src/patronus/experiments/adapters.py` ```python async def evaluate( self, row: datasets.Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: typing.Any, ) -> EvaluationResult: """ Evaluate the given row and task result using the adapted evaluator function. This method implements the BaseEvaluatorAdapter.evaluate() protocol. Args: row: The data row being evaluated. task_result: The result of the task execution, if available. parent: The parent evaluation context. **kwargs: Additional keyword arguments from the experiment. Returns: An EvaluationResult containing the evaluation outcome. """ ev_args, ev_kwargs = self.transform(row, task_result, parent, **kwargs) return await self._evaluate(*ev_args, **ev_kwargs) ``` #### StructuredEvaluatorAdapter ```python StructuredEvaluatorAdapter( evaluator: Union[ StructuredEvaluator, AsyncStructuredEvaluator ], ) ``` Bases: `EvaluatorAdapter` Adapter for structured evaluators. Source code in `src/patronus/experiments/adapters.py` ```python def __init__( self, evaluator: Union[evals.StructuredEvaluator, evals.AsyncStructuredEvaluator], ): if not isinstance(evaluator, (evals.StructuredEvaluator, evals.AsyncStructuredEvaluator)): raise TypeError( f"{type(evaluator)} is not " f"{evals.AsyncStructuredEvaluator.__name__} nor {evals.StructuredEvaluator.__name__}." ) super().__init__(evaluator) ``` #### FuncEvaluatorAdapter ```python FuncEvaluatorAdapter( fn: Callable[..., Any], weight: Optional[Union[str, float]] = None, ) ``` Bases: `BaseEvaluatorAdapter` Adapter class that allows using function-based evaluators with the experiment framework. This adapter serves as a bridge between function-based evaluators decorated with `@evaluator()` and the experiment framework's evaluation system. It handles both synchronous and asynchronous evaluator functions. Attributes: | Name | Type | Description | | --- | --- | --- | | `fn` | `Callable` | The evaluator function to be adapted. | Notes - The function passed to this adapter must be decorated with `@evaluator()`. - The adapter automatically handles the conversion between function results and proper evaluation result objects. Examples: ````text Direct usage with a compatible evaluator function: ```python from patronus import evaluator from patronus.experiments import FuncEvaluatorAdapter, run_experiment from patronus.datasets import Row @evaluator() def exact_match(row: Row, **kwargs): return row.task_output == row.gold_answer run_experiment( dataset=[{"task_output": "string", "gold_answer": "string"}], evaluators=[FuncEvaluatorAdapter(exact_match)] ) ```` Customized usage by overriding the `transform()` method: ```python from typing import Optional import typing from patronus import evaluator, datasets from patronus.experiments import FuncEvaluatorAdapter, run_experiment from patronus.experiments.types import TaskResult, EvalParent @evaluator() def exact_match(actual, expected): return actual == expected class AdaptedExactMatch(FuncEvaluatorAdapter): def __init__(self): super().__init__(exact_match) def transform( self, row: datasets.Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs ) -> tuple[list[typing.Any], dict[str, typing.Any]]: args = [row.task_output, row.gold_answer] kwargs = {} # Alternative: passing arguments via kwargs instead of args # args = [] # kwargs = {"actual": row.task_output, "expected": row.gold_answer} return args, kwargs run_experiment( dataset=[{"task_output": "string", "gold_answer": "string"}], evaluators=[AdaptedExactMatch()], ) ``` ```` Source code in `src/patronus/experiments/adapters.py` ```python def __init__(self, fn: typing.Callable[..., typing.Any], weight: Optional[Union[str, float]] = None): if not hasattr(fn, "_pat_evaluator"): raise ValueError( f"Passed function {fn.__qualname__} is not an evaluator. " "Hint: add @evaluator decorator to the function." ) if weight is not None: try: Decimal(str(weight)) except (decimal.InvalidOperation, ValueError, TypeError): raise TypeError( f"{weight} is not a valid weight. Weight must be a valid decimal number (string or float)." ) self.fn = fn self._weight = weight ```` ##### transform ```python transform( row: Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: Any, ) -> tuple[list[typing.Any], dict[str, typing.Any]] ``` Transform experiment framework parameters to evaluator function parameters. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` | Returns: | Type | Description | | --- | --- | | `list[Any]` | A list of positional arguments to pass to the evaluator function. | | `dict[str, Any]` | A dictionary of keyword arguments to pass to the evaluator function. | Source code in `src/patronus/experiments/adapters.py` ```python def transform( self, row: datasets.Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: typing.Any, ) -> tuple[list[typing.Any], dict[str, typing.Any]]: """ Transform experiment framework parameters to evaluator function parameters. Args: row: The data row being evaluated. task_result: The result of the task execution, if available. parent: The parent evaluation context. **kwargs: Additional keyword arguments from the experiment. Returns: A list of positional arguments to pass to the evaluator function. A dictionary of keyword arguments to pass to the evaluator function. """ return ( [], {"row": row, "task_result": task_result, "parent": parent, **kwargs}, ) ``` ##### evaluate ```python evaluate( row: Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: Any, ) -> EvaluationResult ``` Evaluate the given row and task result using the adapted evaluator function. This method implements the BaseEvaluatorAdapter.evaluate() protocol. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The data row being evaluated. | *required* | | `task_result` | `Optional[TaskResult]` | The result of the task execution, if available. | *required* | | `parent` | `EvalParent` | The parent evaluation context. | *required* | | `**kwargs` | `Any` | Additional keyword arguments from the experiment. | `{}` | Returns: | Type | Description | | --- | --- | | `EvaluationResult` | An EvaluationResult containing the evaluation outcome. | Source code in `src/patronus/experiments/adapters.py` ```python async def evaluate( self, row: datasets.Row, task_result: Optional[TaskResult], parent: EvalParent, **kwargs: typing.Any, ) -> EvaluationResult: """ Evaluate the given row and task result using the adapted evaluator function. This method implements the BaseEvaluatorAdapter.evaluate() protocol. Args: row: The data row being evaluated. task_result: The result of the task execution, if available. parent: The parent evaluation context. **kwargs: Additional keyword arguments from the experiment. Returns: An EvaluationResult containing the evaluation outcome. """ ev_args, ev_kwargs = self.transform(row, task_result, parent, **kwargs) return await self._evaluate(*ev_args, **ev_kwargs) ``` ### experiment #### Tags ```python Tags = dict[str, str] ``` Tags are key-value pairs applied to experiments, task results and evaluation results. #### Task ```python Task = Union[ TaskProtocol[Union[TaskResult, str, None]], TaskProtocol[Awaitable[Union[TaskResult, str, None]]], ] ``` A function that processes each dataset row and produces output for evaluation. #### ExperimentDataset ```python ExperimentDataset = Union[ Dataset, DatasetLoader, list[dict[str, Any]], tuple[dict[str, Any], ...], DataFrame, Awaitable, Callable[[], Awaitable], ] ``` Any object that would "resolve" into Dataset. #### TaskProtocol Bases: `Protocol[T]` Defines an interface for a task. Task is a function that processes each dataset row and produces output for evaluation. ##### __call__ ```python __call__(*, row: Row, parent: EvalParent, tags: Tags) -> T ``` Processes a dataset row, using the provided context to produce task output. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `row` | `Row` | The dataset row to process. | *required* | | `parent` | `EvalParent` | Reference to the parent task's output and evaluation results. | *required* | | `tags` | `Tags` | Key-value pairs. | *required* | Returns: | Type | Description | | --- | --- | | `T` | Task output of type T or None to skip the row processing. | Example ```python def simple_task(row: datasets.Row, parent: EvalParent, tags: Tags) -> TaskResult: # Process input from the dataset row input_text = row.task_input # Generate output output = f"Processed: {input_text}" # Return result return TaskResult( output=output, metadata={"processing_time_ms": 42}, tags={"model": "example-model"} ) ``` Source code in `src/patronus/experiments/experiment.py` ````python def __call__(self, *, row: datasets.Row, parent: EvalParent, tags: Tags) -> T: """ Processes a dataset row, using the provided context to produce task output. Args: row: The dataset row to process. parent: Reference to the parent task's output and evaluation results. tags: Key-value pairs. Returns: Task output of type T or None to skip the row processing. Example: ```python def simple_task(row: datasets.Row, parent: EvalParent, tags: Tags) -> TaskResult: # Process input from the dataset row input_text = row.task_input # Generate output output = f"Processed: {input_text}" # Return result return TaskResult( output=output, metadata={"processing_time_ms": 42}, tags={"model": "example-model"} ) ``` """ ```` #### ChainLink Bases: `TypedDict` Represents a single stage in an experiment's processing chain. Each ChainLink contains an optional task function that processes dataset rows and a list of evaluators that assess the task's output. Attributes: | Name | Type | Description | | --- | --- | --- | | `task` | `Optional[Task]` | Function that processes a dataset row and produces output. | | `evaluators` | `list[AdaptableEvaluators]` | List of evaluators to assess the task's output. | #### Experiment ```python Experiment( *, dataset: Any, task: Optional[Task] = None, evaluators: Optional[list[AdaptableEvaluators]] = None, chain: Optional[list[ChainLink]] = None, tags: Optional[dict[str, str]] = None, metadata: Optional[dict[str, Any]] = None, max_concurrency: int = 10, project_name: Optional[str] = None, experiment_name: Optional[str] = None, service: Optional[str] = None, api_key: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, ui_url: Optional[str] = None, timeout_s: Optional[int] = None, integrations: Optional[list[Any]] = None, **kwargs, ) ``` Manages evaluation experiments across datasets using tasks and evaluators. An experiment represents a complete evaluation pipeline that processes a dataset using defined tasks, applies evaluators to the outputs, and collects the results. Experiments track progress, create reports, and interface with the Patronus platform. Create experiment instances using the create() class method or through the run_experiment() convenience function. Source code in `src/patronus/experiments/experiment.py` ```python def __init__( self, *, dataset: typing.Any, task: Optional[Task] = None, evaluators: Optional[list[AdaptableEvaluators]] = None, chain: Optional[list[ChainLink]] = None, tags: Optional[dict[str, str]] = None, metadata: Optional[dict[str, Any]] = None, max_concurrency: int = 10, project_name: Optional[str] = None, experiment_name: Optional[str] = None, service: Optional[str] = None, api_key: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, ui_url: Optional[str] = None, timeout_s: Optional[int] = None, integrations: Optional[list[typing.Any]] = None, **kwargs, ): if chain and evaluators: raise ValueError("Cannot specify both chain and evaluators") self._raw_dataset = dataset if not chain: chain = [{"task": task, "evaluators": evaluators}] self._chain = [ {"task": _trace_task(link["task"]), "evaluators": _adapt_evaluators(link["evaluators"])} for link in chain ] self._started = False self._finished = False self._project_name = project_name self.project = None self._experiment_name = experiment_name self.experiment = None self.tags = tags or {} self.metadata = metadata self.max_concurrency = max_concurrency self._service = service self._api_key = api_key self._api_url = api_url self._otel_endpoint = otel_endpoint self._otel_exporter_otlp_protocol = otel_exporter_otlp_protocol self._ui_url = ui_url self._timeout_s = timeout_s self._prepared = False self.reporter = Reporter() self._integrations = integrations ``` ##### create ```python create( dataset: ExperimentDataset, task: Optional[Task] = None, evaluators: Optional[list[AdaptableEvaluators]] = None, chain: Optional[list[ChainLink]] = None, tags: Optional[Tags] = None, metadata: Optional[dict[str, Any]] = None, max_concurrency: int = 10, project_name: Optional[str] = None, experiment_name: Optional[str] = None, service: Optional[str] = None, api_key: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, ui_url: Optional[str] = None, timeout_s: Optional[int] = None, integrations: Optional[list[Any]] = None, **kwargs: Any, ) -> te.Self ``` Creates an instance of the class asynchronously with the specified parameters while performing necessary preparations. This method initializes various attributes including dataset, task, evaluators, chain, and additional configurations for managing concurrency, project details, service information, API keys, timeout settings, and integrations. Use run_experiment for more convenient usage. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `dataset` | `ExperimentDataset` | The dataset to run evaluations against. | *required* | | `task` | `Optional[Task]` | A function that processes each dataset row and produces output for evaluation. Mutually exclusive with the chain parameter. | `None` | | `evaluators` | `Optional[list[AdaptableEvaluators]]` | A list of evaluators to assess the task output. Mutually exclusive with the chain parameter. | `None` | | `chain` | `Optional[list[ChainLink]]` | A list of processing stages, each containing a task and associated evaluators. Use this for multi-stage evaluation pipelines. | `None` | | `tags` | `Optional[Tags]` | Key-value pairs. All evaluations created by the experiment will contain these tags. | `None` | | `metadata` | `Optional[dict[str, Any]]` | Arbitrary dict. Metadata associated with the experiment. | `None` | | `max_concurrency` | `int` | Maximum number of concurrent task and evaluation operations. | `10` | | `project_name` | `Optional[str]` | Name of the project to create or use. Falls back to configuration or environment variables if not provided. | `None` | | `experiment_name` | `Optional[str]` | Custom name for this experiment run. A timestamp will be appended. | `None` | | `service` | `Optional[str]` | OpenTelemetry service name for tracing. Falls back to configuration or environment variables if not provided. | `None` | | `api_key` | `Optional[str]` | API key for Patronus services. Falls back to configuration or environment variables if not provided. | `None` | | `api_url` | `Optional[str]` | URL for the Patronus API. Falls back to configuration or environment variables if not provided. | `None` | | `otel_endpoint` | `Optional[str]` | OpenTelemetry collector endpoint. Falls back to configuration or environment variables if not provided. | `None` | | `otel_exporter_otlp_protocol` | `Optional[str]` | OpenTelemetry exporter protocol (grpc or http/protobuf). Falls back to configuration or environment variables if not provided. | `None` | | `ui_url` | `Optional[str]` | URL for the Patronus UI. Falls back to configuration or environment variables if not provided. | `None` | | `timeout_s` | `Optional[int]` | Timeout in seconds for API operations. Falls back to configuration or environment variables if not provided. | `None` | | `integrations` | `Optional[list[Any]]` | A list of OpenTelemetry instrumentors for additional tracing capabilities. | `None` | | `**kwargs` | `Any` | Additional keyword arguments passed to the experiment. | `{}` | Returns: | Name | Type | Description | | --- | --- | --- | | `Experiment` | `Self` | ... | Source code in `src/patronus/experiments/experiment.py` ```python @classmethod async def create( cls, dataset: ExperimentDataset, task: Optional[Task] = None, evaluators: Optional[list[AdaptableEvaluators]] = None, chain: Optional[list[ChainLink]] = None, tags: Optional[Tags] = None, metadata: Optional[dict[str, Any]] = None, max_concurrency: int = 10, project_name: Optional[str] = None, experiment_name: Optional[str] = None, service: Optional[str] = None, api_key: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, ui_url: Optional[str] = None, timeout_s: Optional[int] = None, integrations: Optional[list[typing.Any]] = None, **kwargs: typing.Any, ) -> te.Self: """ Creates an instance of the class asynchronously with the specified parameters while performing necessary preparations. This method initializes various attributes including dataset, task, evaluators, chain, and additional configurations for managing concurrency, project details, service information, API keys, timeout settings, and integrations. Use [run_experiment][patronus.experiments.experiment.run_experiment] for more convenient usage. Args: dataset: The dataset to run evaluations against. task: A function that processes each dataset row and produces output for evaluation. Mutually exclusive with the `chain` parameter. evaluators: A list of evaluators to assess the task output. Mutually exclusive with the `chain` parameter. chain: A list of processing stages, each containing a task and associated evaluators. Use this for multi-stage evaluation pipelines. tags: Key-value pairs. All evaluations created by the experiment will contain these tags. metadata: Arbitrary dict. Metadata associated with the experiment. max_concurrency: Maximum number of concurrent task and evaluation operations. project_name: Name of the project to create or use. Falls back to configuration or environment variables if not provided. experiment_name: Custom name for this experiment run. A timestamp will be appended. service: OpenTelemetry service name for tracing. Falls back to configuration or environment variables if not provided. api_key: API key for Patronus services. Falls back to configuration or environment variables if not provided. api_url: URL for the Patronus API. Falls back to configuration or environment variables if not provided. otel_endpoint: OpenTelemetry collector endpoint. Falls back to configuration or environment variables if not provided. otel_exporter_otlp_protocol: OpenTelemetry exporter protocol (grpc or http/protobuf). Falls back to configuration or environment variables if not provided. ui_url: URL for the Patronus UI. Falls back to configuration or environment variables if not provided. timeout_s: Timeout in seconds for API operations. Falls back to configuration or environment variables if not provided. integrations: A list of OpenTelemetry instrumentors for additional tracing capabilities. **kwargs: Additional keyword arguments passed to the experiment. Returns: Experiment: ... """ ex = cls( dataset=dataset, task=task, evaluators=evaluators, chain=chain, tags=tags, metadata=metadata, max_concurrency=max_concurrency, project_name=project_name, experiment_name=experiment_name, service=service, api_key=api_key, api_url=api_url, otel_endpoint=otel_endpoint, otel_exporter_otlp_protocol=otel_exporter_otlp_protocol, ui_url=ui_url, timeout_s=timeout_s, integrations=integrations, **kwargs, ) ex._ctx = await ex._prepare() return ex ``` ##### run ```python run() -> te.Self ``` Executes the experiment by processing all dataset items. Runs the experiment's task chain on each dataset row, applying evaluators to the results and collecting metrics. Progress is displayed with a progress bar and results are logged to the Patronus platform. Returns: | Type | Description | | --- | --- | | `Self` | The experiment instance. | Source code in `src/patronus/experiments/experiment.py` ```python async def run(self) -> te.Self: """ Executes the experiment by processing all dataset items. Runs the experiment's task chain on each dataset row, applying evaluators to the results and collecting metrics. Progress is displayed with a progress bar and results are logged to the Patronus platform. Returns: The experiment instance. """ if self._started: raise RuntimeError("Experiment already started") if self._prepared is False: raise ValueError( "Experiment must be prepared before starting. " "Seems that Experiment was not created using Experiment.create() classmethod." ) self._started = True with context._CTX_PAT.using(self._ctx): await self._run() self._finished = True self.reporter.summary() await asyncio.to_thread(self._ctx.exporter.force_flush) await asyncio.to_thread(self._ctx.tracer_provider.force_flush) return self ``` ##### to_dataframe ```python to_dataframe() -> pd.DataFrame ``` Converts experiment results to a pandas DataFrame. Creates a tabular representation of all evaluation results with dataset identifiers, task information, evaluation scores, and metadata. Returns: | Type | Description | | --- | --- | | `DataFrame` | A pandas DataFrame containing all experiment results. | Source code in `src/patronus/experiments/experiment.py` ```python def to_dataframe(self) -> pd.DataFrame: """ Converts experiment results to a pandas DataFrame. Creates a tabular representation of all evaluation results with dataset identifiers, task information, evaluation scores, and metadata. Returns: A pandas DataFrame containing all experiment results. """ if self._finished is not True: raise RuntimeError("Experiment has to be in finished state") return self.reporter.to_dataframe() ``` ##### to_csv ```python to_csv( path_or_buf: Union[str, Path, IO[AnyStr]], **kwargs: Any ) -> Optional[str] ``` Saves experiment results to a CSV file. Converts experiment results to a DataFrame and saves them as a CSV file. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `path_or_buf` | `Union[str, Path, IO[AnyStr]]` | String path or file-like object where the CSV will be saved. | *required* | | `**kwargs` | `Any` | Additional arguments passed to pandas.DataFrame.to_csv(). | `{}` | Returns: | Type | Description | | --- | --- | | `Optional[str]` | String path if a path was specified and return_path is True, otherwise None. | Source code in `src/patronus/experiments/experiment.py` ```python def to_csv( self, path_or_buf: Union[str, pathlib.Path, typing.IO[typing.AnyStr]], **kwargs: typing.Any ) -> Optional[str]: """ Saves experiment results to a CSV file. Converts experiment results to a DataFrame and saves them as a CSV file. Args: path_or_buf: String path or file-like object where the CSV will be saved. **kwargs: Additional arguments passed to pandas.DataFrame.to_csv(). Returns: String path if a path was specified and return_path is True, otherwise None. """ return self.to_dataframe().to_csv(path_or_buf, **kwargs) ``` #### run_experiment ```python run_experiment( dataset: ExperimentDataset, task: Optional[Task] = None, evaluators: Optional[list[AdaptableEvaluators]] = None, chain: Optional[list[ChainLink]] = None, tags: Optional[Tags] = None, max_concurrency: int = 10, project_name: Optional[str] = None, experiment_name: Optional[str] = None, service: Optional[str] = None, api_key: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, ui_url: Optional[str] = None, timeout_s: Optional[int] = None, integrations: Optional[list[Any]] = None, **kwargs, ) -> Union[Experiment, typing.Awaitable[Experiment]] ``` Create and run an experiment. This function creates an experiment with the specified configuration and runs it to completion. The execution handling is context-aware: - When called from an asynchronous context (with a running event loop), it returns an awaitable that must be awaited. - When called from a synchronous context (no running event loop), it blocks until the experiment completes and returns the Experiment object. **Examples:** Synchronous execution: ```python experiment = run_experiment(dataset, task=some_task) # Blocks until the experiment finishes. ``` Asynchronous execution (e.g., in a Jupyter Notebook): ```python experiment = await run_experiment(dataset, task=some_task) # Must be awaited within an async function or event loop. ``` **Parameters:** See Experiment.create for list of arguments. Returns: | Name | Type | Description | | --- | --- | --- | | `Experiment` | `Experiment` | In a synchronous context: the completed Experiment object. | | `Experiment` | `Awaitable[Experiment]` | In an asynchronous context: an awaitable that resolves to the Experiment object. | Notes For manual control of the event loop, you can create and run the experiment as follows: ```python experiment = await Experiment.create(...) await experiment.run() ``` Source code in `src/patronus/experiments/experiment.py` ````python def run_experiment( dataset: ExperimentDataset, task: Optional[Task] = None, evaluators: Optional[list[AdaptableEvaluators]] = None, chain: Optional[list[ChainLink]] = None, tags: Optional[Tags] = None, max_concurrency: int = 10, project_name: Optional[str] = None, experiment_name: Optional[str] = None, service: Optional[str] = None, api_key: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, ui_url: Optional[str] = None, timeout_s: Optional[int] = None, integrations: Optional[list[typing.Any]] = None, **kwargs, ) -> Union["Experiment", typing.Awaitable["Experiment"]]: """ Create and run an experiment. This function creates an experiment with the specified configuration and runs it to completion. The execution handling is context-aware: - When called from an asynchronous context (with a running event loop), it returns an awaitable that must be awaited. - When called from a synchronous context (no running event loop), it blocks until the experiment completes and returns the Experiment object. **Examples:** Synchronous execution: ```python experiment = run_experiment(dataset, task=some_task) # Blocks until the experiment finishes. ``` Asynchronous execution (e.g., in a Jupyter Notebook): ```python experiment = await run_experiment(dataset, task=some_task) # Must be awaited within an async function or event loop. ``` **Parameters:** See [Experiment.create][patronus.experiments.experiment.Experiment.create] for list of arguments. Returns: Experiment (Experiment): In a synchronous context: the completed Experiment object. Experiment (Awaitable[Experiment]): In an asynchronous context: an awaitable that resolves to the Experiment object. Notes: For manual control of the event loop, you can create and run the experiment as follows: ```python experiment = await Experiment.create(...) await experiment.run() ``` """ async def _run_experiment() -> Union[Experiment, typing.Awaitable[Experiment]]: ex = await Experiment.create( dataset=dataset, task=task, evaluators=evaluators, chain=chain, tags=tags, max_concurrency=max_concurrency, project_name=project_name, experiment_name=experiment_name, service=service, api_key=api_key, api_url=api_url, otel_endpoint=otel_endpoint, otel_exporter_otlp_protocol=otel_exporter_otlp_protocol, ui_url=ui_url, timeout_s=timeout_s, integrations=integrations, **kwargs, ) return await ex.run() return run_until_complete(_run_experiment()) ```` ### types #### EvalParent ```python EvalParent = Optional[_EvalParent] ``` Type alias representing an optional reference to an evaluation parent, used to track the hierarchy of evaluations and their results #### TaskResult Bases: `BaseModel` Represents the result of a task with optional output, metadata, and tags. This class is used to encapsulate the result of a task, including optional fields for the output of the task, metadata related to the task, and any tags that can provide additional information or context about the task. Attributes: | Name | Type | Description | | --- | --- | --- | | `output` | `Optional[str]` | The output of the task, if any. | | `metadata` | `Optional[dict[str, Any]]` | Additional information or metadata associated with the task. | | `tags` | `Optional[dict[str, str]]` | Key-value pairs used to tag and describe the task. | #### EvalsMap Bases: `dict` A specialized dictionary for storing evaluation results with flexible key handling. This class extends dict to provide automatic key normalization for evaluation results, allowing lookup by evaluator objects, strings, or any object with a canonical_name attribute. # Init ## patronus.init ### init ```python init( project_name: Optional[str] = None, app: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, api_key: Optional[str] = None, service: Optional[str] = None, resource_dir: Optional[str] = None, prompt_providers: Optional[list[str]] = None, prompt_templating_engine: Optional[str] = None, integrations: Optional[list[Any]] = None, **kwargs: Any, ) -> context.PatronusContext ``` Initializes the Patronus SDK with the specified configuration. This function sets up the SDK with project details, API connections, and telemetry. It must be called before using evaluators or experiments to ensure proper recording of results and metrics. Note `init()` should not be used for running experiments. Experiments have its own initialization process. You can configure them by passing configuration options to run_experiment() or using configuration file. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `project_name` | `Optional[str]` | Name of the project for organizing evaluations and experiments. Falls back to configuration file, then defaults to "Global" if not provided. | `None` | | `app` | `Optional[str]` | Name of the application within the project. Falls back to configuration file, then defaults to "default" if not provided. | `None` | | `api_url` | `Optional[str]` | URL for the Patronus API service. Falls back to configuration file or environment variables if not provided. | `None` | | `otel_endpoint` | `Optional[str]` | Endpoint for OpenTelemetry data collection. Falls back to configuration file or environment variables if not provided. | `None` | | `otel_exporter_otlp_protocol` | `Optional[str]` | OpenTelemetry exporter protocol (grpc or http/protobuf). Falls back to configuration file or environment variables if not provided. | `None` | | `api_key` | `Optional[str]` | Authentication key for Patronus services. Falls back to configuration file or environment variables if not provided. | `None` | | `service` | `Optional[str]` | Service name for OpenTelemetry traces. Falls back to configuration file or environment variables if not provided. | `None` | | `integrations` | `Optional[list[Any]]` | List of integration to use. | `None` | | `**kwargs` | `Any` | Additional configuration options for the SDK. | `{}` | Returns: | Name | Type | Description | | --- | --- | --- | | `PatronusContext` | `PatronusContext` | The initialized context object. | Example ```python import patronus # Load configuration from configuration file or environment variables patronus.init() # Custom initialization patronus.init( project_name="my-project", app="recommendation-service", api_key="your-api-key" ) ``` Source code in `src/patronus/init.py` ````python def init( project_name: Optional[str] = None, app: Optional[str] = None, api_url: Optional[str] = None, otel_endpoint: Optional[str] = None, otel_exporter_otlp_protocol: Optional[str] = None, api_key: Optional[str] = None, service: Optional[str] = None, resource_dir: Optional[str] = None, prompt_providers: Optional[list[str]] = None, prompt_templating_engine: Optional[str] = None, integrations: Optional[list[typing.Any]] = None, **kwargs: typing.Any, ) -> context.PatronusContext: """ Initializes the Patronus SDK with the specified configuration. This function sets up the SDK with project details, API connections, and telemetry. It must be called before using evaluators or experiments to ensure proper recording of results and metrics. Note: `init()` should not be used for running experiments. Experiments have its own initialization process. You can configure them by passing configuration options to [`run_experiment()`][patronus.experiments.experiment.run_experiment] or using configuration file. Args: project_name: Name of the project for organizing evaluations and experiments. Falls back to configuration file, then defaults to "Global" if not provided. app: Name of the application within the project. Falls back to configuration file, then defaults to "default" if not provided. api_url: URL for the Patronus API service. Falls back to configuration file or environment variables if not provided. otel_endpoint: Endpoint for OpenTelemetry data collection. Falls back to configuration file or environment variables if not provided. otel_exporter_otlp_protocol: OpenTelemetry exporter protocol (grpc or http/protobuf). Falls back to configuration file or environment variables if not provided. api_key: Authentication key for Patronus services. Falls back to configuration file or environment variables if not provided. service: Service name for OpenTelemetry traces. Falls back to configuration file or environment variables if not provided. integrations: List of integration to use. **kwargs: Additional configuration options for the SDK. Returns: PatronusContext: The initialized context object. Example: ```python import patronus # Load configuration from configuration file or environment variables patronus.init() # Custom initialization patronus.init( project_name="my-project", app="recommendation-service", api_key="your-api-key" ) ``` """ if api_url != config.DEFAULT_API_URL and otel_endpoint == config.DEFAULT_OTEL_ENDPOINT: raise ValueError( "'api_url' is set to non-default value, " "but 'otel_endpoint' is a default. Change 'otel_endpoint' to point to the same environment as 'api_url'" ) def build_and_set(): cfg = config.config() ctx = build_context( service=service or cfg.service, project_name=project_name or cfg.project_name, app=app or cfg.app, experiment_id=None, experiment_name=None, api_url=api_url or cfg.api_url, otel_endpoint=otel_endpoint or cfg.otel_endpoint, otel_exporter_otlp_protocol=otel_exporter_otlp_protocol or cfg.otel_exporter_otlp_protocol, api_key=api_key or cfg.api_key, resource_dir=resource_dir or cfg.resource_dir, prompt_providers=prompt_providers or cfg.prompt_providers, prompt_templating_engine=cfg.prompt_templating_engine, timeout_s=cfg.timeout_s, integrations=integrations, **kwargs, ) context.set_global_patronus_context(ctx) inited_now = _INIT_ONCE.do_once(build_and_set) if not inited_now: warnings.warn( ("The Patronus SDK has already been initialized. Duplicate initialization attempts are ignored."), UserWarning, stacklevel=2, ) return context.get_current_context() ```` ### build_context ```python build_context( service: str, project_name: str, app: Optional[str], experiment_id: Optional[str], experiment_name: Optional[str], api_url: Optional[str], otel_endpoint: str, otel_exporter_otlp_protocol: Optional[str], api_key: str, resource_dir: Optional[str] = None, prompt_providers: Optional[list[str]] = None, prompt_templating_engine: Optional[str] = None, client_http: Optional[Client] = None, client_http_async: Optional[AsyncClient] = None, timeout_s: int = 60, integrations: Optional[list[Any]] = None, **kwargs: Any, ) -> context.PatronusContext ``` Builds a Patronus context with the specified configuration parameters. This function creates the context object that contains all necessary components for the SDK operation, including loggers, tracers, and API clients. It is used internally by the init() function but can also be used directly for more advanced configuration scenarios. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `service` | `str` | Service name for OpenTelemetry traces. | *required* | | `project_name` | `str` | Name of the project for organizing evaluations and experiments. | *required* | | `app` | `Optional[str]` | Name of the application within the project. | *required* | | `experiment_id` | `Optional[str]` | Unique identifier for an experiment when running in experiment mode. | *required* | | `experiment_name` | `Optional[str]` | Display name for an experiment when running in experiment mode. | *required* | | `api_url` | `Optional[str]` | URL for the Patronus API service. | *required* | | `otel_endpoint` | `str` | Endpoint for OpenTelemetry data collection. | *required* | | `otel_exporter_otlp_protocol` | `Optional[str]` | OpenTelemetry exporter protocol (grpc or http/protobuf). | *required* | | `api_key` | `str` | Authentication key for Patronus services. | *required* | | `client_http` | `Optional[Client]` | Custom HTTP client for synchronous API requests. If not provided, a new client will be created. | `None` | | `client_http_async` | `Optional[AsyncClient]` | Custom HTTP client for asynchronous API requests. If not provided, a new client will be created. | `None` | | `timeout_s` | `int` | Timeout in seconds for HTTP requests (default: 60). | `60` | | `integrations` | `Optional[list[Any]]` | List of PatronusIntegrator instances. | `None` | | `**kwargs` | `Any` | Additional configuration options, including: - integrations: List of OpenTelemetry instrumentors to enable. | `{}` | Returns: | Name | Type | Description | | --- | --- | --- | | `PatronusContext` | `PatronusContext` | The initialized context object containing all necessary components for SDK operation. | Source code in `src/patronus/init.py` ```python def build_context( service: str, project_name: str, app: Optional[str], experiment_id: Optional[str], experiment_name: Optional[str], api_url: Optional[str], otel_endpoint: str, otel_exporter_otlp_protocol: Optional[str], api_key: str, resource_dir: Optional[str] = None, prompt_providers: Optional[list[str]] = None, prompt_templating_engine: Optional[str] = None, client_http: Optional[httpx.Client] = None, client_http_async: Optional[httpx.AsyncClient] = None, timeout_s: int = 60, integrations: Optional[list[typing.Any]] = None, **kwargs: typing.Any, ) -> context.PatronusContext: """ Builds a Patronus context with the specified configuration parameters. This function creates the context object that contains all necessary components for the SDK operation, including loggers, tracers, and API clients. It is used internally by the [`init()`][patronus.init.init] function but can also be used directly for more advanced configuration scenarios. Args: service: Service name for OpenTelemetry traces. project_name: Name of the project for organizing evaluations and experiments. app: Name of the application within the project. experiment_id: Unique identifier for an experiment when running in experiment mode. experiment_name: Display name for an experiment when running in experiment mode. api_url: URL for the Patronus API service. otel_endpoint: Endpoint for OpenTelemetry data collection. otel_exporter_otlp_protocol: OpenTelemetry exporter protocol (grpc or http/protobuf). api_key: Authentication key for Patronus services. client_http: Custom HTTP client for synchronous API requests. If not provided, a new client will be created. client_http_async: Custom HTTP client for asynchronous API requests. If not provided, a new client will be created. timeout_s: Timeout in seconds for HTTP requests (default: 60). integrations: List of PatronusIntegrator instances. **kwargs: Additional configuration options, including: - integrations: List of OpenTelemetry instrumentors to enable. Returns: PatronusContext: The initialized context object containing all necessary components for SDK operation. """ if client_http is None: client_http = httpx.Client(timeout=timeout_s) if client_http_async is None: client_http_async = httpx.AsyncClient(timeout=timeout_s) integrations = prepare_integrations(integrations) scope = context.PatronusScope( service=service, project_name=project_name, app=app, experiment_id=experiment_id, experiment_name=experiment_name, ) api_deprecated = PatronusAPIClient( client_http_async=client_http_async, client_http=client_http, base_url=api_url, api_key=api_key, ) api_client = patronus_api.Client(api_key=api_key, base_url=api_url) async_api_client = patronus_api.AsyncClient(api_key=api_key, base_url=api_url) logger_provider = create_logger_provider( exporter_endpoint=otel_endpoint, api_key=api_key, scope=scope, protocol=otel_exporter_otlp_protocol, ) tracer_provider = create_tracer_provider( exporter_endpoint=otel_endpoint, api_key=api_key, scope=scope, protocol=otel_exporter_otlp_protocol, ) eval_exporter = BatchEvaluationExporter(client=api_deprecated) ctx = context.PatronusContext( scope=scope, tracer_provider=tracer_provider, logger_provider=logger_provider, api_client_deprecated=api_deprecated, api_client=api_client, async_api_client=async_api_client, exporter=eval_exporter, prompts=context.PromptsConfig( directory=resource_dir and pathlib.Path(resource_dir, "prompts"), providers=prompt_providers, templating_engine=prompt_templating_engine, ), ) apply_integrations(ctx, integrations) return ctx ``` # Integrations ## patronus.integrations This package provides integration points for connecting various third-party libraries and tools with the Patronus SDK. ### instrumenter #### BasePatronusIntegrator Bases: `ABC` Abstract base class for Patronus integrations. This class defines the interface for integrating external libraries and tools with the Patronus context. All specific integrators should inherit from this class and implement the required methods. ##### apply ```python apply(ctx: PatronusContext, **kwargs: Any) ``` Apply the integration to the given Patronus context. This method must be implemented by subclasses to define how the integration is applied to a Patronus context instance. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context to apply the integration to. | *required* | | `**kwargs` | `Any` | Additional keyword arguments specific to the implementation. | `{}` | Source code in `src/patronus/integrations/instrumenter.py` ```python @abc.abstractmethod def apply(self, ctx: "context.PatronusContext", **kwargs: typing.Any): """ Apply the integration to the given Patronus context. This method must be implemented by subclasses to define how the integration is applied to a Patronus context instance. Args: ctx: The Patronus context to apply the integration to. **kwargs: Additional keyword arguments specific to the implementation. """ ``` ### otel #### OpenTelemetryIntegrator ```python OpenTelemetryIntegrator(instrumentor: BaseInstrumentor) ``` Bases: `BasePatronusIntegrator` Integration for OpenTelemetry instrumentors with Patronus. This class provides an adapter between OpenTelemetry instrumentors and the Patronus context, allowing for easy integration of OpenTelemetry instrumentation in Patronus-managed applications. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `instrumentor` | `BaseInstrumentor` | An OpenTelemetry instrumentor instance that will be applied to the Patronus context. | *required* | Source code in `src/patronus/integrations/otel.py` ```python def __init__(self, instrumentor: "BaseInstrumentor"): """ Initialize the OpenTelemetry integrator. Args: instrumentor: An OpenTelemetry instrumentor instance that will be applied to the Patronus context. """ self.instrumentor = instrumentor ``` ##### apply ```python apply(ctx: PatronusContext, **kwargs: Any) ``` Apply OpenTelemetry instrumentation to the Patronus context. This method configures the OpenTelemetry instrumentor with the tracer provider from the Patronus context. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context containing the tracer provider. | *required* | | `**kwargs` | `Any` | Additional keyword arguments (unused). | `{}` | Source code in `src/patronus/integrations/otel.py` ```python def apply(self, ctx: "context.PatronusContext", **kwargs: typing.Any): """ Apply OpenTelemetry instrumentation to the Patronus context. This method configures the OpenTelemetry instrumentor with the tracer provider from the Patronus context. Args: ctx: The Patronus context containing the tracer provider. **kwargs: Additional keyword arguments (unused). """ self.instrumentor.instrument(tracer_provider=ctx.tracer_provider) ``` ### pydantic_ai #### PydanticAIIntegrator ```python PydanticAIIntegrator( event_mode: Literal["attributes", "logs"] = "logs", ) ``` Bases: `BasePatronusIntegrator` Integration for Pydantic-AI with Patronus. This class provides integration between Pydantic-AI agents and the Patronus observability stack, enabling tracing and logging of Pydantic-AI agent operations. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `event_mode` | `Literal['attributes', 'logs']` | The mode for capturing events, either as span attributes or as logs. Default is "logs". | `'logs'` | Source code in `src/patronus/integrations/pydantic_ai.py` ```python def __init__(self, event_mode: Literal["attributes", "logs"] = "logs"): """ Initialize the Pydantic-AI integrator. Args: event_mode: The mode for capturing events, either as span attributes or as logs. Default is "logs". """ self._instrumentation_settings = {"event_mode": event_mode} ``` ##### apply ```python apply(ctx: PatronusContext, **kwargs: Any) ``` Apply Pydantic-AI instrumentation to the Patronus context. This method configures all Pydantic-AI agents to use the tracer and logger providers from the Patronus context. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `ctx` | `PatronusContext` | The Patronus context containing the tracer and logger providers. | *required* | | `**kwargs` | `Any` | Additional keyword arguments (unused). | `{}` | Source code in `src/patronus/integrations/pydantic_ai.py` ```python def apply(self, ctx: "context.PatronusContext", **kwargs: Any): """ Apply Pydantic-AI instrumentation to the Patronus context. This method configures all Pydantic-AI agents to use the tracer and logger providers from the Patronus context. Args: ctx: The Patronus context containing the tracer and logger providers. **kwargs: Additional keyword arguments (unused). """ from pydantic_ai.agent import Agent, InstrumentationSettings settings_kwargs = { **self._instrumentation_settings, "tracer_provider": ctx.tracer_provider, "event_logger_provider": EventLoggerProvider(ctx.logger_provider), } settings = InstrumentationSettings(**settings_kwargs) Agent.instrument_all(instrument=settings) ``` # Patronus Objects ## client_async ### AsyncPatronus ```python AsyncPatronus(max_workers: int = 10) ``` Source code in `src/patronus/pat_client/client_async.py` ```python def __init__(self, max_workers: int = 10): self._pending_tasks = collections.deque() self._executor = ThreadPoolExecutor(max_workers=max_workers) self._semaphore = asyncio.Semaphore(max_workers) ``` #### evaluate ```python evaluate( evaluators: Union[List[Evaluator], Evaluator], *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[dict] = None, return_exceptions: bool = False, ) -> EvaluationContainer ``` Run multiple evaluators in parallel. Source code in `src/patronus/pat_client/client_async.py` ```python async def evaluate( self, evaluators: Union[List[Evaluator], Evaluator], *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[dict] = None, return_exceptions: bool = False, ) -> EvaluationContainer: """ Run multiple evaluators in parallel. """ singular_eval = not isinstance(evaluators, list) if singular_eval: evaluators = [evaluators] evaluators = self._map_evaluators(evaluators) def into_coro(fn, **kwargs): if inspect.iscoroutinefunction(fn): coro = fn(**kwargs) else: coro = asyncio.to_thread(fn, **kwargs) return with_semaphore(self._semaphore, coro) with bundled_eval(): results = await asyncio.gather( *( into_coro( ev.evaluate, system_prompt=system_prompt, task_context=task_context, task_input=task_input, task_output=task_output, gold_answer=gold_answer, task_metadata=task_metadata, ) for ev in evaluators ), return_exceptions=return_exceptions, ) return EvaluationContainer(results) ``` #### evaluate_bg ```python evaluate_bg( evaluators: Union[List[Evaluator], Evaluator], *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[dict] = None, ) -> Task[EvaluationContainer] ``` Run multiple evaluators in parallel. The returned task will be a background task. Source code in `src/patronus/pat_client/client_async.py` ```python def evaluate_bg( self, evaluators: Union[List[Evaluator], Evaluator], *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[dict] = None, ) -> Task[EvaluationContainer]: """ Run multiple evaluators in parallel. The returned task will be a background task. """ loop = asyncio.get_running_loop() task = loop.create_task( self.evaluate( evaluators=evaluators, system_prompt=system_prompt, task_context=task_context, task_input=task_input, task_output=task_output, gold_answer=gold_answer, task_metadata=task_metadata, return_exceptions=True, ), name="evaluate_bg", ) self._pending_tasks.append(task) task.add_done_callback(self._consume_tasks) return task ``` #### close ```python close() ``` Gracefully close the client. This will wait for all background tasks to finish. Source code in `src/patronus/pat_client/client_async.py` ```python async def close(self): """ Gracefully close the client. This will wait for all background tasks to finish. """ while len(self._pending_tasks) != 0: await self._pending_tasks.popleft() ``` ## client_sync ### Patronus ```python Patronus(workers: int = 10, shutdown_on_exit: bool = True) ``` Source code in `src/patronus/pat_client/client_sync.py` ```python def __init__(self, workers: int = 10, shutdown_on_exit: bool = True): self._worker_pool = ThreadPool(workers) self._supervisor_pool = ThreadPool(workers) self._at_exit_handler = None if shutdown_on_exit: self._at_exit_handler = atexit.register(self.close) ``` #### evaluate ```python evaluate( evaluators: Union[list[Evaluator], Evaluator], *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[dict[str, Any]] = None, return_exceptions: bool = False, ) -> EvaluationContainer ``` Run multiple evaluators in parallel. Source code in `src/patronus/pat_client/client_sync.py` ```python def evaluate( self, evaluators: typing.Union[list[Evaluator], Evaluator], *, system_prompt: typing.Optional[str] = None, task_context: typing.Union[list[str], str, None] = None, task_input: typing.Optional[str] = None, task_output: typing.Optional[str] = None, gold_answer: typing.Optional[str] = None, task_metadata: typing.Optional[dict[str, typing.Any]] = None, return_exceptions: bool = False, ) -> EvaluationContainer: """ Run multiple evaluators in parallel. """ if not isinstance(evaluators, list): evaluators = [evaluators] evaluators = self._map_evaluators(evaluators) with bundled_eval(): callables = [ _into_thread_run_fn( ev.evaluate, system_prompt=system_prompt, task_context=task_context, task_input=task_input, task_output=task_output, gold_answer=gold_answer, task_metadata=task_metadata, ) for ev in evaluators ] results = self._process_batch(callables, return_exceptions=return_exceptions) return EvaluationContainer(results) ``` #### evaluate_bg ```python evaluate_bg( evaluators: list[StructuredEvaluator], *, system_prompt: Optional[str] = None, task_context: Union[list[str], str, None] = None, task_input: Optional[str] = None, task_output: Optional[str] = None, gold_answer: Optional[str] = None, task_metadata: Optional[dict[str, Any]] = None, ) -> TypedAsyncResult[EvaluationContainer] ``` Run multiple evaluators in parallel. The returned task will be a background task. Source code in `src/patronus/pat_client/client_sync.py` ```python def evaluate_bg( self, evaluators: list[StructuredEvaluator], *, system_prompt: typing.Optional[str] = None, task_context: typing.Union[list[str], str, None] = None, task_input: typing.Optional[str] = None, task_output: typing.Optional[str] = None, gold_answer: typing.Optional[str] = None, task_metadata: typing.Optional[dict[str, typing.Any]] = None, ) -> TypedAsyncResult[EvaluationContainer]: """ Run multiple evaluators in parallel. The returned task will be a background task. """ def _run(): with bundled_eval(): callables = [ _into_thread_run_fn( ev.evaluate, system_prompt=system_prompt, task_context=task_context, task_input=task_input, task_output=task_output, gold_answer=gold_answer, task_metadata=task_metadata, ) for ev in evaluators ] results = self._process_batch(callables, return_exceptions=True) return EvaluationContainer(results) return typing.cast( TypedAsyncResult[EvaluationContainer], self._supervisor_pool.apply_async(_into_thread_run_fn(_run)) ) ``` #### close ```python close() ``` Gracefully close the client. This will wait for all background tasks to finish. Source code in `src/patronus/pat_client/client_sync.py` ```python def close(self): """ Gracefully close the client. This will wait for all background tasks to finish. """ self._close() if self._at_exit_handler: atexit.unregister(self._at_exit_handler) ``` ## container ### EvaluationContainer ```python EvaluationContainer( results: list[Union[EvaluationResult, None, Exception]], ) ``` #### format ```python format() -> str ``` Format the evaluation results into a readable summary. Source code in `src/patronus/pat_client/container.py` ```python def format(self) -> str: """ Format the evaluation results into a readable summary. """ buf = StringIO() total = len(self.results) exceptions_count = sum(1 for r in self.results if isinstance(r, Exception)) successes_count = sum(1 for r in self.results if isinstance(r, EvaluationResult) and r.pass_ is True) failures_count = sum(1 for r in self.results if isinstance(r, EvaluationResult) and r.pass_ is False) buf.write(f"Total evaluations: {total}\n") buf.write(f"Successes: {successes_count}\n") buf.write(f"Failures: {failures_count}\n") buf.write(f"Exceptions: {exceptions_count}\n\n") buf.write("Evaluation Details:\n") buf.write("---\n") # Add detailed evaluation results for result in self.results: if result is None: buf.write("None\n") elif isinstance(result, Exception): buf.write(str(result)) buf.write("\n") else: buf.write(result.format()) buf.write("---\n") return buf.getvalue() ``` #### pretty_print ```python pretty_print(file: Optional[IO] = None) -> None ``` Formats and prints the current object in a human-readable form. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `file` | `Optional[IO]` | | `None` | Source code in `src/patronus/pat_client/container.py` ```python def pretty_print(self, file: Optional[IO] = None) -> None: """ Formats and prints the current object in a human-readable form. Args: file: """ f = self.format() print(f, file=file) ``` #### has_exception ```python has_exception() -> bool ``` Checks if the results contain any exception. Source code in `src/patronus/pat_client/container.py` ```python def has_exception(self) -> bool: """ Checks if the results contain any exception. """ return any(isinstance(r, Exception) for r in self.results) ``` #### raise_on_exception ```python raise_on_exception() -> None ``` Checks the results for any exceptions and raises them accordingly. Source code in `src/patronus/pat_client/container.py` ```python def raise_on_exception(self) -> None: """ Checks the results for any exceptions and raises them accordingly. """ if not self.has_exception(): return None exceptions = list(r for r in self.results if isinstance(r, Exception)) if len(exceptions) == 1: raise exceptions[0] raise MultiException(exceptions) ``` #### all_succeeded ```python all_succeeded(ignore_exceptions: bool = False) -> bool ``` Check if all evaluations that were actually evaluated passed. Evaluations are only considered if they: - Have a non-None pass\_ flag set - Are not None (skipped) - Are not exceptions (unless ignore_exceptions=True) Note: Returns True if no evaluations met the above criteria (empty case). Source code in `src/patronus/pat_client/container.py` ```python def all_succeeded(self, ignore_exceptions: bool = False) -> bool: """ Check if all evaluations that were actually evaluated passed. Evaluations are only considered if they: - Have a non-None pass_ flag set - Are not None (skipped) - Are not exceptions (unless ignore_exceptions=True) Note: Returns True if no evaluations met the above criteria (empty case). """ for r in self.results: if isinstance(r, Exception) and not ignore_exceptions: self.raise_on_exception() if r is not None and r.pass_ is False: return False return True ``` #### any_failed ```python any_failed(ignore_exceptions: bool = False) -> bool ``` Check if any evaluation that was actually evaluated failed. Evaluations are only considered if they: - Have a non-None pass\_ flag set - Are not None (skipped) - Are not exceptions (unless ignore_exceptions=True) Note: Returns False if no evaluations met the above criteria (empty case). Source code in `src/patronus/pat_client/container.py` ```python def any_failed(self, ignore_exceptions: bool = False) -> bool: """ Check if any evaluation that was actually evaluated failed. Evaluations are only considered if they: - Have a non-None pass_ flag set - Are not None (skipped) - Are not exceptions (unless ignore_exceptions=True) Note: Returns False if no evaluations met the above criteria (empty case). """ for r in self.results: if isinstance(r, Exception) and not ignore_exceptions: self.raise_on_exception() if r is not None and r.pass_ is False: return True return False ``` #### failed_evaluations ```python failed_evaluations() -> Generator[ EvaluationResult, None, None ] ``` Generates all failed evaluations from the results. Source code in `src/patronus/pat_client/container.py` ```python def failed_evaluations(self) -> Generator[EvaluationResult, None, None]: """ Generates all failed evaluations from the results. """ return (r for r in self.results if not isinstance(r, (Exception, type(None))) and r.pass_ is False) ``` #### succeeded_evaluations ```python succeeded_evaluations() -> Generator[ EvaluationResult, None, None ] ``` Generates all successfully passed evaluations from the `results` attribute. Source code in `src/patronus/pat_client/container.py` ```python def succeeded_evaluations(self) -> Generator[EvaluationResult, None, None]: """ Generates all successfully passed evaluations from the `results` attribute. """ return (r for r in self.results if not isinstance(r, (Exception, type(None))) and r.pass_ is True) ``` # Prompts ## patronus.prompts ### clients #### load_prompt ```python load_prompt = get ``` Alias for PromptClient.get. #### aload_prompt ```python aload_prompt = get ``` Alias for AsyncPromptClient.get. #### push_prompt ```python push_prompt = push ``` Alias for PromptClient.push. #### apush_prompt ```python apush_prompt = push ``` Alias for AsyncPromptClient.push. #### PromptNotFoundError ```python PromptNotFoundError( name: str, project: Optional[str] = None, revision: Optional[int] = None, label: Optional[str] = None, ) ``` Bases: `Exception` Raised when a prompt could not be found. Source code in `src/patronus/prompts/clients.py` ```python def __init__( self, name: str, project: Optional[str] = None, revision: Optional[int] = None, label: Optional[str] = None ): self.name = name self.project = project self.revision = revision self.label = label message = f"Prompt not found (name={name!r}, project={project!r}, revision={revision!r}, label={label!r})" super().__init__(message) ``` #### PromptProviderError Bases: `Exception` Base class for prompt provider errors. #### PromptProviderConnectionError Bases: `PromptProviderError` Raised when there's a connectivity issue with the prompt provider. #### PromptProviderAuthenticationError Bases: `PromptProviderError` Raised when there's an authentication issue with the prompt provider. #### PromptProvider Bases: `ABC` ##### get_prompt ```python get_prompt( name: str, revision: Optional[int], label: Optional[str], project: str, engine: TemplateEngine, ) -> Optional[LoadedPrompt] ``` Get prompts, returns None if prompt was not found Source code in `src/patronus/prompts/clients.py` ```python @abc.abstractmethod def get_prompt( self, name: str, revision: Optional[int], label: Optional[str], project: str, engine: TemplateEngine ) -> Optional[LoadedPrompt]: """Get prompts, returns None if prompt was not found""" ``` ##### aget_prompt ```python aget_prompt( name: str, revision: Optional[int], label: Optional[str], project: str, engine: TemplateEngine, ) -> Optional[LoadedPrompt] ``` Get prompts, returns None if prompt was not found Source code in `src/patronus/prompts/clients.py` ```python @abc.abstractmethod async def aget_prompt( self, name: str, revision: Optional[int], label: Optional[str], project: str, engine: TemplateEngine ) -> Optional[LoadedPrompt]: """Get prompts, returns None if prompt was not found""" ``` #### PromptClientMixin #### PromptClient ```python PromptClient( provider_factory: Optional[ProviderFactory] = None, ) ``` Bases: `PromptClientMixin` Source code in `src/patronus/prompts/clients.py` ```python def __init__(self, provider_factory: Optional[ProviderFactory] = None) -> None: self._cache: PromptCache = PromptCache() self._provider_factory: ProviderFactory = provider_factory or { "local": lambda: LocalPromptProvider(), "api": lambda: APIPromptProvider(), } self._api_provider = APIPromptProvider() ``` ##### get ```python get( name: str, revision: Optional[int] = None, label: Optional[str] = None, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, disable_cache: bool = False, provider: Union[ PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN], ] = NOT_GIVEN, engine: Union[ TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN], ] = NOT_GIVEN, ) -> LoadedPrompt ``` Get the prompt. If neither revision nor label is specified then the prompt with latest revision is returned. Project is loaded from the config by default. You can specify the project name of the prompt if you want to override the value from the config. By default, once a prompt is retrieved it's cached. You can disable caching. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the prompt to retrieve. | *required* | | `revision` | `Optional[int]` | Optional specific revision number to retrieve. If not specified, the latest revision is used. | `None` | | `label` | `Optional[str]` | Optional label to filter by. If specified, only prompts with this label will be returned. | `None` | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `disable_cache` | `bool` | If True, bypasses the cache for both reading and writing. | `False` | | `provider` | `Union[PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN]]` | The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'), a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2') or a TemplateEngine instance. If not specified, defaults to config setting. | `NOT_GIVEN` | Returns: | Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The retrieved prompt object. | Raises: | Type | Description | | --- | --- | | `PromptNotFoundError` | If the prompt could not be found with the specified parameters. | | `ValueError` | If the provided provider or engine is invalid. | | `PromptProviderError` | If there was an error communicating with the prompt provider. | Source code in `src/patronus/prompts/clients.py` ```python def get( self, name: str, revision: Optional[int] = None, label: Optional[str] = None, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, disable_cache: bool = False, provider: Union[ PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN], ] = NOT_GIVEN, engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN, ) -> LoadedPrompt: """ Get the prompt. If neither revision nor label is specified then the prompt with latest revision is returned. Project is loaded from the config by default. You can specify the project name of the prompt if you want to override the value from the config. By default, once a prompt is retrieved it's cached. You can disable caching. Args: name: The name of the prompt to retrieve. revision: Optional specific revision number to retrieve. If not specified, the latest revision is used. label: Optional label to filter by. If specified, only prompts with this label will be returned. project: Optional project name override. If not specified, the project name from config is used. disable_cache: If True, bypasses the cache for both reading and writing. provider: The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'), a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting. engine: The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2') or a TemplateEngine instance. If not specified, defaults to config setting. Returns: LoadedPrompt: The retrieved prompt object. Raises: PromptNotFoundError: If the prompt could not be found with the specified parameters. ValueError: If the provided provider or engine is invalid. PromptProviderError: If there was an error communicating with the prompt provider. """ project_name: str = self._resolve_project(project) resolved_providers: list[PromptProvider] = self._resolve_providers(provider, self._provider_factory) resolved_engine: TemplateEngine = self._resolve_engine(engine) cache_key: _CacheKey = _CacheKey(project_name=project_name, prompt_name=name, revision=revision, label=label) if not disable_cache: cached_prompt: Optional[LoadedPrompt] = self._cache.get(cache_key) if cached_prompt is not None: return cached_prompt prompt: Optional[LoadedPrompt] = None provider_errors: list[str] = [] for i, prompt_provider in enumerate(resolved_providers): log.debug("Trying prompt provider %d (%s)", i + 1, prompt_provider.__class__.__name__) try: prompt = prompt_provider.get_prompt(name, revision, label, project_name, engine=resolved_engine) if prompt is not None: log.debug("Prompt found using provider %s", prompt_provider.__class__.__name__) break except PromptProviderConnectionError as e: provider_errors.append(str(e)) continue except PromptProviderAuthenticationError as e: provider_errors.append(str(e)) continue except Exception as e: provider_errors.append(f"Unexpected error from provider {prompt_provider.__class__.__name__}: {str(e)}") continue if prompt is None: if provider_errors: error_msg: str = self._format_provider_errors(provider_errors) raise PromptNotFoundError( name=name, project=project_name, revision=revision, label=label ) from Exception(error_msg) else: raise PromptNotFoundError(name=name, project=project_name, revision=revision, label=label) if not disable_cache: self._cache.put(cache_key, prompt) return prompt ``` ##### push ```python push( prompt: Prompt, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, engine: Union[ TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN], ] = NOT_GIVEN, ) -> LoadedPrompt ``` Push a prompt to the API, creating a new revision only if needed. If a prompt revision with the same normalized body and metadata already exists, the existing revision will be returned. If the metadata differs, a new revision will be created. The engine parameter is only used to set property on output LoadedPrompt object. It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform. Note that when a new prompt definition is created, the description is used as provided. However, when creating a new revision for an existing prompt definition, the description parameter doesn't update the existing prompt definition's description. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `prompt` | `Prompt` | The prompt to push | *required* | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering the returned prompt. If not specified, defaults to config setting. | `NOT_GIVEN` | Returns: | Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The created or existing prompt revision | Raises: | Type | Description | | --- | --- | | `PromptProviderError` | If there was an error communicating with the prompt provider. | Source code in `src/patronus/prompts/clients.py` ```python def push( self, prompt: Prompt, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN, ) -> LoadedPrompt: """ Push a prompt to the API, creating a new revision only if needed. If a prompt revision with the same normalized body and metadata already exists, the existing revision will be returned. If the metadata differs, a new revision will be created. The engine parameter is only used to set property on output LoadedPrompt object. It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform. Note that when a new prompt definition is created, the description is used as provided. However, when creating a new revision for an existing prompt definition, the description parameter doesn't update the existing prompt definition's description. Args: prompt: The prompt to push project: Optional project name override. If not specified, the project name from config is used. engine: The template engine to use for rendering the returned prompt. If not specified, defaults to config setting. Returns: LoadedPrompt: The created or existing prompt revision Raises: PromptProviderError: If there was an error communicating with the prompt provider. """ project_name: str = self._resolve_project(project) resolved_engine: TemplateEngine = self._resolve_engine(engine) normalized_body_sha256 = calculate_normalized_body_hash(prompt.body) cli = context.get_api_client().prompts # Try to find existing revision with same hash resp = cli.list_revisions( prompt_name=prompt.name, project_name=project_name, normalized_body_sha256=normalized_body_sha256, ) # Variables for create_revision parameters prompt_id = patronus_api.NOT_GIVEN prompt_name = prompt.name create_new_prompt = True prompt_def = None # If we found a matching revision, check if metadata is the same if resp.prompt_revisions: log.debug("Found %d revisions with matching body hash", len(resp.prompt_revisions)) prompt_id = resp.prompt_revisions[0].prompt_definition_id create_new_prompt = False resp_pd = cli.list_definitions(prompt_id=prompt_id, limit=1) if not resp_pd.prompt_definitions: raise PromptProviderError( "Prompt revision has been found but prompt definition was not found. This should not happen" ) prompt_def = resp_pd.prompt_definitions[0] # Check if the provided description is different from existing one and warn if so if prompt.description is not None and prompt.description != prompt_def.description: warnings.warn( f"Prompt description ({prompt.description!r}) differs from the existing one " f"({prompt_def.description!r}). The description won't be updated." ) new_metadata_cmp = json.dumps(prompt.metadata, sort_keys=True) for rev in resp.prompt_revisions: metadata_cmp = json.dumps(rev.metadata, sort_keys=True) if new_metadata_cmp == metadata_cmp: log.debug("Found existing revision with matching metadata, returning revision %d", rev.revision) return self._api_provider._create_loaded_prompt( prompt_revision=rev, prompt_def=prompt_def, engine=resolved_engine, ) # For existing prompt, don't need name/project prompt_name = patronus_api.NOT_GIVEN project_name = patronus_api.NOT_GIVEN else: # No matching revisions found, will create new prompt log.debug("No revisions with matching body hash found, creating new prompt and revision") # Create a new revision with appropriate parameters log.debug( "Creating new revision (new_prompt=%s, prompt_id=%s, prompt_name=%s)", create_new_prompt, prompt_id if prompt_id != patronus_api.NOT_GIVEN else "NOT_GIVEN", prompt_name if prompt_name != patronus_api.NOT_GIVEN else "NOT_GIVEN", ) resp = cli.create_revision( body=prompt.body, prompt_id=prompt_id, prompt_name=prompt_name, project_name=project_name if create_new_prompt else patronus_api.NOT_GIVEN, prompt_description=prompt.description, metadata=prompt.metadata, ) prompt_revision = resp.prompt_revision # If we created a new prompt, we need to fetch the definition if create_new_prompt: resp_pd = cli.list_definitions(prompt_id=prompt_revision.prompt_definition_id, limit=1) if not resp_pd.prompt_definitions: raise PromptProviderError( "Prompt revision has been created but prompt definition was not found. This should not happen" ) prompt_def = resp_pd.prompt_definitions[0] return self._api_provider._create_loaded_prompt(prompt_revision, prompt_def, resolved_engine) ``` #### AsyncPromptClient ```python AsyncPromptClient( provider_factory: Optional[ProviderFactory] = None, ) ``` Bases: `PromptClientMixin` Source code in `src/patronus/prompts/clients.py` ```python def __init__(self, provider_factory: Optional[ProviderFactory] = None) -> None: self._cache: AsyncPromptCache = AsyncPromptCache() self._provider_factory: ProviderFactory = provider_factory or { "local": lambda: LocalPromptProvider(), "api": lambda: APIPromptProvider(), } self._api_provider = APIPromptProvider() ``` ##### get ```python get( name: str, revision: Optional[int] = None, label: Optional[str] = None, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, disable_cache: bool = False, provider: Union[ PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN], ] = NOT_GIVEN, engine: Union[ TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN], ] = NOT_GIVEN, ) -> LoadedPrompt ``` Get the prompt asynchronously. If neither revision nor label is specified then the prompt with latest revision is returned. Project is loaded from the config by default. You can specify the project name of the prompt if you want to override the value from the config. By default, once a prompt is retrieved it's cached. You can disable caching. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the prompt to retrieve. | *required* | | `revision` | `Optional[int]` | Optional specific revision number to retrieve. If not specified, the latest revision is used. | `None` | | `label` | `Optional[str]` | Optional label to filter by. If specified, only prompts with this label will be returned. | `None` | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `disable_cache` | `bool` | If True, bypasses the cache for both reading and writing. | `False` | | `provider` | `Union[PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN]]` | The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'), a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2') or a TemplateEngine instance. If not specified, defaults to config setting. | `NOT_GIVEN` | Returns: | Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The retrieved prompt object. | Raises: | Type | Description | | --- | --- | | `PromptNotFoundError` | If the prompt could not be found with the specified parameters. | | `ValueError` | If the provided provider or engine is invalid. | | `PromptProviderError` | If there was an error communicating with the prompt provider. | Source code in `src/patronus/prompts/clients.py` ```python async def get( self, name: str, revision: Optional[int] = None, label: Optional[str] = None, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, disable_cache: bool = False, provider: Union[ PromptProvider, _DefaultProviders, Sequence[Union[PromptProvider, _DefaultProviders]], Type[NOT_GIVEN] ] = NOT_GIVEN, engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN, ) -> LoadedPrompt: """ Get the prompt asynchronously. If neither revision nor label is specified then the prompt with latest revision is returned. Project is loaded from the config by default. You can specify the project name of the prompt if you want to override the value from the config. By default, once a prompt is retrieved it's cached. You can disable caching. Args: name: The name of the prompt to retrieve. revision: Optional specific revision number to retrieve. If not specified, the latest revision is used. label: Optional label to filter by. If specified, only prompts with this label will be returned. project: Optional project name override. If not specified, the project name from config is used. disable_cache: If True, bypasses the cache for both reading and writing. provider: The provider(s) to use for retrieving prompts. Can be a string identifier ('local', 'api'), a PromptProvider instance, or a sequence of these. If not specified, defaults to config setting. engine: The template engine to use for rendering prompts. Can be a string identifier ('f-string', 'mustache', 'jinja2') or a TemplateEngine instance. If not specified, defaults to config setting. Returns: LoadedPrompt: The retrieved prompt object. Raises: PromptNotFoundError: If the prompt could not be found with the specified parameters. ValueError: If the provided provider or engine is invalid. PromptProviderError: If there was an error communicating with the prompt provider. """ project_name: str = self._resolve_project(project) resolved_providers: list[PromptProvider] = self._resolve_providers(provider, self._provider_factory) resolved_engine: TemplateEngine = self._resolve_engine(engine) cache_key: _CacheKey = _CacheKey(project_name=project_name, prompt_name=name, revision=revision, label=label) if not disable_cache: cached_prompt: Optional[LoadedPrompt] = await self._cache.get(cache_key) if cached_prompt is not None: return cached_prompt prompt: Optional[LoadedPrompt] = None provider_errors: list[str] = [] for i, prompt_provider in enumerate(resolved_providers): log.debug("Trying prompt provider %d (%s) async", i + 1, prompt_provider.__class__.__name__) try: prompt = await prompt_provider.aget_prompt(name, revision, label, project_name, engine=resolved_engine) if prompt is not None: log.debug("Prompt found using async provider %s", prompt_provider.__class__.__name__) break except PromptProviderConnectionError as e: provider_errors.append(str(e)) continue except PromptProviderAuthenticationError as e: provider_errors.append(str(e)) continue except Exception as e: provider_errors.append(f"Unexpected error from provider {prompt_provider.__class__.__name__}: {str(e)}") continue if prompt is None: if provider_errors: error_msg: str = self._format_provider_errors(provider_errors) raise PromptNotFoundError( name=name, project=project_name, revision=revision, label=label ) from Exception(error_msg) else: raise PromptNotFoundError(name=name, project=project_name, revision=revision, label=label) if not disable_cache: await self._cache.put(cache_key, prompt) return prompt ``` ##### push ```python push( prompt: Prompt, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, engine: Union[ TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN], ] = NOT_GIVEN, ) -> LoadedPrompt ``` Push a prompt to the API asynchronously, creating a new revision only if needed. If a prompt revision with the same normalized body and metadata already exists, the existing revision will be returned. If the metadata differs, a new revision will be created. The engine parameter is only used to set property on output LoadedPrompt object. It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform. Note that when a new prompt definition is created, the description is used as provided. However, when creating a new revision for an existing prompt definition, the description parameter doesn't update the existing prompt definition's description. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `prompt` | `Prompt` | The prompt to push | *required* | | `project` | `Union[str, Type[NOT_GIVEN]]` | Optional project name override. If not specified, the project name from config is used. | `NOT_GIVEN` | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]]` | The template engine to use for rendering the returned prompt. If not specified, defaults to config setting. | `NOT_GIVEN` | Returns: | Name | Type | Description | | --- | --- | --- | | `LoadedPrompt` | `LoadedPrompt` | The created or existing prompt revision | Raises: | Type | Description | | --- | --- | | `PromptProviderError` | If there was an error communicating with the prompt provider. | Source code in `src/patronus/prompts/clients.py` ```python async def push( self, prompt: Prompt, project: Union[str, Type[NOT_GIVEN]] = NOT_GIVEN, engine: Union[TemplateEngine, DefaultTemplateEngines, Type[NOT_GIVEN]] = NOT_GIVEN, ) -> LoadedPrompt: """ Push a prompt to the API asynchronously, creating a new revision only if needed. If a prompt revision with the same normalized body and metadata already exists, the existing revision will be returned. If the metadata differs, a new revision will be created. The engine parameter is only used to set property on output LoadedPrompt object. It is not persisted in any way and doesn't affect how the prompt is stored in Patronus AI Platform. Note that when a new prompt definition is created, the description is used as provided. However, when creating a new revision for an existing prompt definition, the description parameter doesn't update the existing prompt definition's description. Args: prompt: The prompt to push project: Optional project name override. If not specified, the project name from config is used. engine: The template engine to use for rendering the returned prompt. If not specified, defaults to config setting. Returns: LoadedPrompt: The created or existing prompt revision Raises: PromptProviderError: If there was an error communicating with the prompt provider. """ project_name: str = self._resolve_project(project) resolved_engine: TemplateEngine = self._resolve_engine(engine) normalized_body_sha256 = calculate_normalized_body_hash(prompt.body) cli = context.get_async_api_client().prompts # Try to find existing revision with same hash resp = await cli.list_revisions( prompt_name=prompt.name, project_name=project_name, normalized_body_sha256=normalized_body_sha256, ) # Variables for create_revision parameters prompt_id = patronus_api.NOT_GIVEN prompt_name = prompt.name create_new_prompt = True prompt_def = None # If we found a matching revision, check if metadata is the same if resp.prompt_revisions: log.debug("Found %d revisions with matching body hash", len(resp.prompt_revisions)) prompt_id = resp.prompt_revisions[0].prompt_definition_id create_new_prompt = False resp_pd = await cli.list_definitions(prompt_id=prompt_id, limit=1) if not resp_pd.prompt_definitions: raise PromptProviderError( "Prompt revision has been found but prompt definition was not found. This should not happen" ) prompt_def = resp_pd.prompt_definitions[0] # Check if the provided description is different from existing one and warn if so if prompt.description is not None and prompt.description != prompt_def.description: warnings.warn( f"Prompt description ({prompt.description!r}) differs from the existing one " f"({prompt_def.description!r}). The description won't be updated." ) new_metadata_cmp = json.dumps(prompt.metadata, sort_keys=True) for rev in resp.prompt_revisions: metadata_cmp = json.dumps(rev.metadata, sort_keys=True) if new_metadata_cmp == metadata_cmp: log.debug("Found existing revision with matching metadata, returning revision %d", rev.revision) return self._api_provider._create_loaded_prompt( prompt_revision=rev, prompt_def=prompt_def, engine=resolved_engine, ) # For existing prompt, don't need name/project prompt_name = patronus_api.NOT_GIVEN project_name = patronus_api.NOT_GIVEN else: # No matching revisions found, will create new prompt log.debug("No revisions with matching body hash found, creating new prompt and revision") # Create a new revision with appropriate parameters log.debug( "Creating new revision (new_prompt=%s, prompt_id=%s, prompt_name=%s)", create_new_prompt, prompt_id if prompt_id != patronus_api.NOT_GIVEN else "NOT_GIVEN", prompt_name if prompt_name != patronus_api.NOT_GIVEN else "NOT_GIVEN", ) resp = await cli.create_revision( body=prompt.body, prompt_id=prompt_id, prompt_name=prompt_name, project_name=project_name if create_new_prompt else patronus_api.NOT_GIVEN, prompt_description=prompt.description, metadata=prompt.metadata, ) prompt_revision = resp.prompt_revision # If we created a new prompt, we need to fetch the definition if create_new_prompt: resp_pd = await cli.list_definitions(prompt_id=prompt_revision.prompt_definition_id, limit=1) if not resp_pd.prompt_definitions: raise PromptProviderError( "Prompt revision has been created but prompt definition was not found. This should not happen" ) prompt_def = resp_pd.prompt_definitions[0] return self._api_provider._create_loaded_prompt(prompt_revision, prompt_def, resolved_engine) ``` ### models #### BasePrompt ##### with_engine ```python with_engine( engine: Union[TemplateEngine, DefaultTemplateEngines], ) -> typing.Self ``` Create a new prompt with the specified template engine. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines]` | Either a TemplateEngine instance or a string identifier ('f-string', 'mustache', 'jinja2') | *required* | Returns: | Type | Description | | --- | --- | | `Self` | A new prompt instance with the specified engine | Source code in `src/patronus/prompts/models.py` ```python def with_engine(self, engine: Union[TemplateEngine, DefaultTemplateEngines]) -> typing.Self: """ Create a new prompt with the specified template engine. Args: engine: Either a TemplateEngine instance or a string identifier ('f-string', 'mustache', 'jinja2') Returns: A new prompt instance with the specified engine """ resolved_engine = get_template_engine(engine) return dataclasses.replace(self, _engine=resolved_engine) ``` ##### render ```python render(**kwargs: Any) -> str ``` Render the prompt template with the provided arguments. If no engine is set on the prompt, the default engine from context/config will be used. If no arguments are provided, the template body is returned as-is. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `**kwargs` | `Any` | Template arguments to be rendered in the prompt body | `{}` | Returns: | Type | Description | | --- | --- | | `str` | The rendered prompt | Source code in `src/patronus/prompts/models.py` ```python def render(self, **kwargs: Any) -> str: """ Render the prompt template with the provided arguments. If no engine is set on the prompt, the default engine from context/config will be used. If no arguments are provided, the template body is returned as-is. Args: **kwargs: Template arguments to be rendered in the prompt body Returns: The rendered prompt """ if not kwargs: return self.body engine = self._engine if engine is None: # Get default engine from context engine_name = context.get_prompts_config().templating_engine engine = get_template_engine(engine_name) return engine.render(self.body, **kwargs) ``` #### calculate_normalized_body_hash ```python calculate_normalized_body_hash(body: str) -> str ``` Calculate the SHA-256 hash of normalized prompt body. Normalization is done by stripping whitespace from the start and end of the body. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `body` | `str` | The prompt body | *required* | Returns: | Type | Description | | --- | --- | | `str` | SHA-256 hash of the normalized body | Source code in `src/patronus/prompts/models.py` ```python def calculate_normalized_body_hash(body: str) -> str: """Calculate the SHA-256 hash of normalized prompt body. Normalization is done by stripping whitespace from the start and end of the body. Args: body: The prompt body Returns: SHA-256 hash of the normalized body """ normalized_body = body.strip() return hashlib.sha256(normalized_body.encode()).hexdigest() ``` ### templating #### TemplateEngine Bases: `ABC` ##### render ```python render(template: str, **kwargs) -> str ``` Render the template with the given arguments. Source code in `src/patronus/prompts/templating.py` ```python @abc.abstractmethod def render(self, template: str, **kwargs) -> str: """Render the template with the given arguments.""" ``` #### get_template_engine ```python get_template_engine( engine: Union[TemplateEngine, DefaultTemplateEngines], ) -> TemplateEngine ``` Convert a template engine name to an actual engine instance. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `engine` | `Union[TemplateEngine, DefaultTemplateEngines]` | Either a template engine instance or a string identifier ('f-string', 'mustache', 'jinja2') | *required* | Returns: | Type | Description | | --- | --- | | `TemplateEngine` | A template engine instance | Raises: | Type | Description | | --- | --- | | `ValueError` | If the provided engine string is not recognized | Source code in `src/patronus/prompts/templating.py` ```python def get_template_engine(engine: Union[TemplateEngine, DefaultTemplateEngines]) -> TemplateEngine: """ Convert a template engine name to an actual engine instance. Args: engine: Either a template engine instance or a string identifier ('f-string', 'mustache', 'jinja2') Returns: A template engine instance Raises: ValueError: If the provided engine string is not recognized """ if isinstance(engine, TemplateEngine): return engine if engine == "f-string": return FStringTemplateEngine() elif engine == "mustache": return MustacheTemplateEngine() elif engine == "jinja2": return Jinja2TemplateEngine() raise ValueError( "Provided engine must be an instance of TemplateEngine or " "one of the default engines ('f-string', 'mustache', 'jinja2'). " f"Instead got {engine!r}" ) ``` # Tracing ## patronus.tracing ### decorators #### start_span ```python start_span( name: str, *, record_exception: bool = True, attributes: Optional[Attributes] = None, ) -> Iterator[Optional[typing.Any]] ``` Context manager for creating and managing a trace span. This function is used to create a span within the current context using the tracer, allowing you to track execution timing or events within a specific block of code. The context is set by `patronus.init()` function. If SDK was not initialized, yielded value will be None. Example: ```python import patronus patronus.init() # Use context manager for finer-grained tracing def complex_operation(): with patronus.start_span("Data preparation"): # Prepare data pass ``` Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `name` | `str` | The name of the span. | *required* | | `record_exception` | `bool` | Whether to record exceptions that occur within the span. Default is True. | `True` | | `attributes` | `Optional[Attributes]` | Attributes to associate with the span, providing additional metadata. | `None` | Source code in `src/patronus/tracing/decorators.py` ````python @contextlib.contextmanager def start_span( name: str, *, record_exception: bool = True, attributes: Optional[Attributes] = None ) -> Iterator[Optional[typing.Any]]: """ Context manager for creating and managing a trace span. This function is used to create a span within the current context using the tracer, allowing you to track execution timing or events within a specific block of code. The context is set by `patronus.init()` function. If SDK was not initialized, yielded value will be None. Example: ```python import patronus patronus.init() # Use context manager for finer-grained tracing def complex_operation(): with patronus.start_span("Data preparation"): # Prepare data pass ``` Args: name (str): The name of the span. record_exception (bool): Whether to record exceptions that occur within the span. Default is True. attributes (Optional[Attributes]): Attributes to associate with the span, providing additional metadata. """ tracer = context.get_tracer_or_none() if tracer is None: yield return with tracer.start_as_current_span( name, record_exception=record_exception, attributes=attributes, ) as span: yield span ```` #### traced ```python traced( span_name: Optional[str] = None, *, log_args: bool = True, log_results: bool = True, log_exceptions: bool = True, disable_log: bool = False, attributes: Attributes = None, **kwargs: Any, ) ``` A decorator to trace function execution by recording a span for the traced function. Example: ```python import patronus patronus.init() # Trace a function with the @traced decorator @patronus.traced() def process_input(user_query): # Process the input ``` Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `span_name` | `Optional[str]` | The name of the traced span. Defaults to the function name if not provided. | `None` | | `log_args` | `bool` | Whether to log the arguments passed to the function. Default is True. | `True` | | `log_results` | `bool` | Whether to log the function's return value. Default is True. | `True` | | `log_exceptions` | `bool` | Whether to log any exceptions raised while executing the function. Default is True. | `True` | | `disable_log` | `bool` | Whether to disable logging the trace information. Default is False. | `False` | | `attributes` | `Attributes` | Attributes to attach to the traced span. Default is None. | `None` | | `**kwargs` | `Any` | Additional arguments for the decorator. | `{}` | Source code in `src/patronus/tracing/decorators.py` ````python def traced( # Give name for the traced span. Defaults to a function name if not provided. span_name: Optional[str] = None, *, # Whether to log function arguments. log_args: bool = True, # Whether to log function output. log_results: bool = True, # Whether to log an exception if one was raised. log_exceptions: bool = True, # Whether to prevent a log message to be created. disable_log: bool = False, attributes: Attributes = None, **kwargs: typing.Any, ): """ A decorator to trace function execution by recording a span for the traced function. Example: ```python import patronus patronus.init() # Trace a function with the @traced decorator @patronus.traced() def process_input(user_query): # Process the input ``` Args: span_name (Optional[str]): The name of the traced span. Defaults to the function name if not provided. log_args (bool): Whether to log the arguments passed to the function. Default is True. log_results (bool): Whether to log the function's return value. Default is True. log_exceptions (bool): Whether to log any exceptions raised while executing the function. Default is True. disable_log (bool): Whether to disable logging the trace information. Default is False. attributes (Attributes): Attributes to attach to the traced span. Default is None. **kwargs: Additional arguments for the decorator. """ def decorator(func): name = span_name or func.__qualname__ sig = inspect.signature(func) record_exception = not disable_log and log_exceptions def log_call(fn_args: typing.Any, fn_kwargs: typing.Any, ret: typing.Any, exc: Exception): if disable_log: return logger = context.get_pat_logger() severity = SeverityNumber.INFO body = {"function.name": name} if log_args: bound_args = sig.bind(*fn_args, **fn_kwargs) body["function.arguments"] = {**bound_args.arguments, **bound_args.arguments} if log_results is not None and exc is None: body["function.output"] = ret if log_exceptions and exc is not None: module = type(exc).__module__ qualname = type(exc).__qualname__ exception_type = f"{module}.{qualname}" if module and module != "builtins" else qualname body["exception.type"] = exception_type body["exception.message"] = str(exc) severity = SeverityNumber.ERROR logger.log(body, log_type=LogTypes.trace, severity=severity) @functools.wraps(func) def wrapper_sync(*f_args, **f_kwargs): tracer = context.get_tracer_or_none() if tracer is None: return func(*f_args, **f_kwargs) exc = None ret = None with tracer.start_as_current_span(name, record_exception=record_exception, attributes=attributes): try: ret = func(*f_args, **f_kwargs) except Exception as e: exc = e raise exc finally: log_call(f_args, f_kwargs, ret, exc) return ret @functools.wraps(func) async def wrapper_async(*f_args, **f_kwargs): tracer = context.get_tracer_or_none() if tracer is None: return await func(*f_args, **f_kwargs) exc = None ret = None with tracer.start_as_current_span(name, record_exception=record_exception, attributes=attributes): try: ret = await func(*f_args, **f_kwargs) except Exception as e: exc = e raise exc finally: log_call(f_args, f_kwargs, ret, exc) return ret if inspect.iscoroutinefunction(func): wrapper_async._pat_traced = True return wrapper_async else: wrapper_async._pat_traced = True return wrapper_sync return decorator ```` ### exporters This module provides exporter selection functionality for OpenTelemetry traces and logs. It handles protocol resolution based on Patronus configuration and standard OTEL environment variables. #### create_trace_exporter ```python create_trace_exporter( endpoint: str, api_key: str, protocol: Optional[str] = None, ) -> SpanExporter ``` Create a configured trace exporter instance. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `endpoint` | `str` | The OTLP endpoint URL | *required* | | `api_key` | `str` | Authentication key for Patronus services | *required* | | `protocol` | `Optional[str]` | OTLP protocol override from Patronus configuration | `None` | Returns: | Type | Description | | --- | --- | | `SpanExporter` | Configured trace exporter instance | Source code in `src/patronus/tracing/exporters.py` ```python def create_trace_exporter(endpoint: str, api_key: str, protocol: Optional[str] = None) -> SpanExporter: """ Create a configured trace exporter instance. Args: endpoint: The OTLP endpoint URL api_key: Authentication key for Patronus services protocol: OTLP protocol override from Patronus configuration Returns: Configured trace exporter instance """ resolved_protocol = _resolve_otlp_protocol(protocol) if resolved_protocol == "http/protobuf": # For HTTP exporter, ensure endpoint has the correct path if not endpoint.endswith("/v1/traces"): endpoint = endpoint.rstrip("/") + "/v1/traces" return OTLPSpanExporterHTTP(endpoint=endpoint, headers={"x-api-key": api_key}) else: # For gRPC exporter, determine if connection should be insecure based on URL scheme is_insecure = endpoint.startswith("http://") return OTLPSpanExporterGRPC(endpoint=endpoint, headers={"x-api-key": api_key}, insecure=is_insecure) ``` #### create_log_exporter ```python create_log_exporter( endpoint: str, api_key: str, protocol: Optional[str] = None, ) -> LogExporter ``` Create a configured log exporter instance. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `endpoint` | `str` | The OTLP endpoint URL | *required* | | `api_key` | `str` | Authentication key for Patronus services | *required* | | `protocol` | `Optional[str]` | OTLP protocol override from Patronus configuration | `None` | Returns: | Type | Description | | --- | --- | | `LogExporter` | Configured log exporter instance | Source code in `src/patronus/tracing/exporters.py` ```python def create_log_exporter(endpoint: str, api_key: str, protocol: Optional[str] = None) -> LogExporter: """ Create a configured log exporter instance. Args: endpoint: The OTLP endpoint URL api_key: Authentication key for Patronus services protocol: OTLP protocol override from Patronus configuration Returns: Configured log exporter instance """ resolved_protocol = _resolve_otlp_protocol(protocol) if resolved_protocol == "http/protobuf": # For HTTP exporter, ensure endpoint has the correct path if not endpoint.endswith("/v1/logs"): endpoint = endpoint.rstrip("/") + "/v1/logs" return OTLPLogExporterHTTP(endpoint=endpoint, headers={"x-api-key": api_key}) else: # For gRPC exporter, determine if connection should be insecure based on URL scheme is_insecure = endpoint.startswith("http://") return OTLPLogExporterGRPC(endpoint=endpoint, headers={"x-api-key": api_key}, insecure=is_insecure) ``` ### tracer This module provides the implementation for tracing support using the OpenTelemetry SDK. #### PatronusAttributesSpanProcessor ```python PatronusAttributesSpanProcessor( project_name: str, app: Optional[str] = None, experiment_id: Optional[str] = None, ) ``` Bases: `SpanProcessor` Processor that adds Patronus-specific attributes to all spans. This processor ensures that each span includes the mandatory attributes: `project_name`, and optionally adds `app` or `experiment_id` attributes if they are provided during initialization. Source code in `src/patronus/tracing/tracer.py` ```python def __init__(self, project_name: str, app: Optional[str] = None, experiment_id: Optional[str] = None): self.project_name = project_name self.experiment_id = None self.app = None if experiment_id is not None: self.experiment_id = experiment_id else: self.app = app ``` #### create_tracer_provider ```python create_tracer_provider( exporter_endpoint: str, api_key: str, scope: PatronusScope, protocol: Optional[str] = None, ) -> TracerProvider ``` Creates and returns a cached TracerProvider configured with the specified exporter. The function utilizes an OpenTelemetry BatchSpanProcessor and an OTLPSpanExporter to initialize the tracer. The configuration is cached for reuse. Source code in `src/patronus/tracing/tracer.py` ```python @functools.lru_cache() def create_tracer_provider( exporter_endpoint: str, api_key: str, scope: context.PatronusScope, protocol: Optional[str] = None, ) -> TracerProvider: """ Creates and returns a cached TracerProvider configured with the specified exporter. The function utilizes an OpenTelemetry BatchSpanProcessor and an OTLPSpanExporter to initialize the tracer. The configuration is cached for reuse. """ resource = None if scope.service is not None: resource = Resource.create({"service.name": scope.service}) provider = TracerProvider(resource=resource) provider.add_span_processor( PatronusAttributesSpanProcessor( project_name=scope.project_name, app=scope.app, experiment_id=scope.experiment_id, ) ) provider.add_span_processor( BatchSpanProcessor(_create_exporter(endpoint=exporter_endpoint, api_key=api_key, protocol=protocol)) ) return provider ``` #### create_tracer ```python create_tracer( scope: PatronusScope, exporter_endpoint: str, api_key: str, protocol: Optional[str] = None, ) -> trace.Tracer ``` Creates an OpenTelemetry (OTeL) tracer tied to the specified scope. Source code in `src/patronus/tracing/tracer.py` ```python def create_tracer( scope: context.PatronusScope, exporter_endpoint: str, api_key: str, protocol: Optional[str] = None, ) -> trace.Tracer: """ Creates an OpenTelemetry (OTeL) tracer tied to the specified scope. """ provider = create_tracer_provider( exporter_endpoint=exporter_endpoint, api_key=api_key, scope=scope, protocol=protocol, ) return provider.get_tracer("patronus.sdk") ```