Types
Bases: TypedDict
Result returned from TLM inference.
Attributes:
| Name | Type | Description |
|---|---|---|
response |
str | dict[str, Any]
|
Either a response string or dictionary representation of an OpenAI chat completion. |
trustworthiness_score |
float
|
Score indicating the trustworthiness of the response, between 0 and 1. |
usage |
dict[str, Any]
|
Token usage information for the inference, including prompt and completion tokens. |
metadata |
dict[str, Any] | None
|
Optional metadata, e.g. per-field scores for structured outputs. |
evals |
dict[str, float] | None
|
Optional dictionary of Eval scores, keyed by evaluation name. |
explanation |
str | None
|
Explanation for the trustworthiness score. |
Source code in tlm/inference.py
Bases: BaseModel
Criteria for performing a semantic evaluation of the query, context, and/or response. At least one of query_identifier, context_identifier, and response_identifier must be provided.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the evaluation. |
criteria |
str
|
Semantic description of the criteria to assess. |
query_identifier |
str | None
|
Identifier for the user query to be provided in the prompt passed to the LLM, e.g. "User Query". Should be |
context_identifier |
str | None
|
Identifier for the context to be provided in the prompt passed to the LLM, e.g. "Context". Should be |
response_identifier |
str | None
|
Identifier for the response to be provided in the prompt passed to the LLM, e.g. "Response". Should be |
Source code in tlm/types/base.py
Quality presets that control the trade-off between speed and accuracy.
Higher quality presets generate more completions and use more advanced techniques, resulting in higher trustworthiness scores but slower inference and higher costs.
Values
BASE, LOW, MEDIUM (default), HIGH, BEST
Source code in tlm/config/presets.py
Reasoning effort levels that control explanation generation for trustworthiness scores.
Higher reasoning effort generates longer explanations that provide more detailed reasoning about why a particular trustworthiness score was assigned.
Values
NONE (default), LOW, MEDIUM, HIGH
Source code in tlm/config/presets.py
Strategies for scoring the similarity of two generated responses.
Values
JACCARD, EMBEDDING_SMALL, EMBEDDING_LARGE, CODE, STATEMENT