Skip to content

Types

Bases: TypedDict

Result returned from TLM inference.

Attributes:

Name Type Description
response str | dict[str, Any]

Either a response string or dictionary representation of an OpenAI chat completion.

trustworthiness_score float

Score indicating the trustworthiness of the response, between 0 and 1.

usage dict[str, Any]

Token usage information for the inference, including prompt and completion tokens.

metadata dict[str, Any] | None

Optional metadata, e.g. per-field scores for structured outputs.

evals dict[str, float] | None

Optional dictionary of Eval scores, keyed by evaluation name.

explanation str | None

Explanation for the trustworthiness score.

Source code in tlm/inference.py
class InferenceResult(TypedDict):
    """Result returned from TLM inference.

    Attributes:
        response: Either a response string or dictionary representation of an OpenAI chat completion.
        trustworthiness_score: Score indicating the trustworthiness of the response, between 0 and 1.
        usage: Token usage information for the inference, including prompt and completion tokens.
        metadata: Optional metadata, e.g. per-field scores for structured outputs.
        evals: Optional dictionary of Eval scores, keyed by evaluation name.
        explanation: Explanation for the trustworthiness score.
    """

    response: str | dict[str, Any]
    trustworthiness_score: float
    usage: dict[str, Any]
    metadata: dict[str, Any] | None
    evals: dict[str, float] | None
    explanation: str | None

Bases: BaseModel

Criteria for performing a semantic evaluation of the query, context, and/or response. At least one of query_identifier, context_identifier, and response_identifier must be provided.

Attributes:

Name Type Description
name str

The name of the evaluation.

criteria str

Semantic description of the criteria to assess.

query_identifier str | None

Identifier for the user query to be provided in the prompt passed to the LLM, e.g. "User Query". Should be None if the evaluation does not require the query.

context_identifier str | None

Identifier for the context to be provided in the prompt passed to the LLM, e.g. "Context". Should be None if the evaluation does not require the context.

response_identifier str | None

Identifier for the response to be provided in the prompt passed to the LLM, e.g. "Response". Should be None if the evaluation does not require the response.

Source code in tlm/types/base.py
class Eval(BaseModel):
    """Criteria for performing a semantic evaluation of the query, context, and/or response.
    At least one of query_identifier, context_identifier, and response_identifier must be provided.

    Attributes:
        name: The name of the evaluation.
        criteria: Semantic description of the criteria to assess.
        query_identifier: Identifier for the user query to be provided in the prompt passed to the LLM, e.g. "User Query". Should be `None` if the evaluation does not require the query.
        context_identifier: Identifier for the context to be provided in the prompt passed to the LLM, e.g. "Context". Should be `None` if the evaluation does not require the context.
        response_identifier: Identifier for the response to be provided in the prompt passed to the LLM, e.g. "Response". Should be `None` if the evaluation does not require the response.
    """

    name: str
    criteria: str
    query_identifier: str | None = None
    context_identifier: str | None = None
    response_identifier: str | None = None

Bases: str, Enum

Quality presets that control the trade-off between speed and accuracy.

Higher quality presets generate more completions and use more advanced techniques, resulting in higher trustworthiness scores but slower inference and higher costs.

Values

BASE, LOW, MEDIUM (default), HIGH, BEST

Source code in tlm/config/presets.py
class QualityPreset(str, Enum):
    """Quality presets that control the trade-off between speed and accuracy.

    Higher quality presets generate more completions and use more advanced techniques,
    resulting in higher trustworthiness scores but slower inference and higher costs.

    Values:
        `BASE`, `LOW`, `MEDIUM` (default), `HIGH`, `BEST`
    """

    BASE = "base"
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    BEST = "best"

Bases: str, Enum

Reasoning effort levels that control explanation generation for trustworthiness scores.

Higher reasoning effort generates longer explanations that provide more detailed reasoning about why a particular trustworthiness score was assigned.

Values

NONE (default), LOW, MEDIUM, HIGH

Source code in tlm/config/presets.py
class ReasoningEffort(str, Enum):
    """Reasoning effort levels that control explanation generation for trustworthiness scores.

    Higher reasoning effort generates longer explanations that provide more detailed
    reasoning about why a particular trustworthiness score was assigned.

    Values:
        `NONE` (default), `LOW`, `MEDIUM`, `HIGH`
    """

    NONE = "none"
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

Bases: str, Enum

Strategies for scoring the similarity of two generated responses.

Values

JACCARD, EMBEDDING_SMALL, EMBEDDING_LARGE, CODE, STATEMENT

Source code in tlm/types/base.py
class SimilarityMeasure(str, Enum):
    """Strategies for scoring the similarity of two generated responses.

    Values:
        `JACCARD`, `EMBEDDING_SMALL`, `EMBEDDING_LARGE`, `CODE`, `STATEMENT`
    """

    JACCARD = "jaccard"  # formerly STRING
    EMBEDDING_SMALL = "embedding_small"
    EMBEDDING_LARGE = "embedding_large"
    CODE = "code"
    STATEMENT = "statement"  # formerly DISCREPANCY

    @classmethod
    def for_workflow(cls, workflow_type: WorkflowType) -> "SimilarityMeasure":
        if workflow_type == WorkflowType.QA:
            return cls.STATEMENT
        elif workflow_type == WorkflowType.CLASSIFICATION:
            return cls.EMBEDDING_SMALL
        elif workflow_type == WorkflowType.BINARY_CLASSIFICATION:
            return cls.EMBEDDING_LARGE
        elif workflow_type == WorkflowType.RAG:
            return cls.CODE
        elif workflow_type == WorkflowType.STRUCTURED_OUTPUT_SCORING:
            return cls.JACCARD

        return cls.STATEMENT  # default