List of Metrics
Metrics Detail & Threshold information
# |
Phase |
Type of Guardrails |
Metric |
Score Range / Output |
Threshold** |
Description |
1 |
Request Moderation |
Model Based Guardrails |
Prompt Injection Score |
0 to 1 |
0.7 |
Score of 0.7 or more indicates the model is highly confident that the prompt is ‘injected’, as less score like 0 or 0.1 etc. shows the model is confident that the prompt is ‘legit’. |
2 |
Request Moderation |
Model Based Guardrails |
Jailbreak Score |
0 to 1 |
0.7 |
Score of 0.7 or more indicates that Jailbreak attack is there in the prompt. |
3 |
Request Moderation |
Model Based Guardrails |
Privacy Check |
None / List of entities |
1 PII entity |
As response we get a list of PII entities detected, if even one entity is detected the prompt gets blocked. |
4 |
Request Moderation |
Model Based Guardrails |
Profanity Check |
None / List of entities |
1 word |
If one or more profane words is/are detected in the prompt, it is blocked. |
5 |
Request Moderation |
Model Based Guardrails |
Restricted Topic Check |
0 to 1 |
0.7 |
The response is restricted topic score of each restricted topic label we have passed in input, if even for one label the score is 0.7 or more, it means the model is confident that the text falls under that category and the prompt is blocked. |
6 |
Request Moderation |
Model Based Guardrails |
Toxicity Check |
0 to 1 |
0.7 |
If the final toxicity score is 0.7 or more than it means the model has classified the prompt as highly toxic and the prompt gets blocked. |
7 |
Request Moderation |
Model Based Guardrails |
Sentiment Check |
-1 to 1 |
-0.01 |
If score is <=-0.01 it means the sentiment of the sentence is negative, the higher the score is the more positive the sentiment of the sentence is. |
8 |
Request Moderation |
Model Based Guardrails |
Invisible Text Check |
None / List of entities |
1 entity |
In the response we get list containing all entities present in the prompt which fall under different invisible text categories. Even if one such entity is detected the prompt gets blocked. |
9 |
Request Moderation |
Model Based Guardrails |
Gibberish Check |
0 to 1 |
0.7 |
If the label of the text detected in prompt is "word salad", "noise" or "mild gibberish" and its score is more than or equal to 0.7 then the prompt gets blocked. |
10 |
Request Moderation |
Model Based Guardrails |
Ban Code Check |
0 to 1 |
0.7 |
If the model detects code in prompt and gives a score of 0.7 or above, then the prompt is blocked. |
11 |
Request Moderation |
Template Based Guardrails |
Prompt Injection Score |
0 to 1 |
0.6 |
If the LLM detects prompt injection in the prompt and gives a score of 0.6 or more then the prompt is blocked. |
12 |
Request Moderation |
Template Based Guardrails |
Jailbreak Score |
0 to 1 |
0.6 |
If the LLM detects jailbreak in the prompt and gives a score of 0.6 or more then the prompt is blocked. |
13 |
Request Moderation |
Template Based Guardrails |
Fairness & Bias Check |
Low/Medium/High/Neutral |
|
Categories determined by Fairness Score generated by LLM which ranges from 0 to 100. 0: Neutral, 1-30: Low, 31-70: Medium, >70: High |
14 |
Request Moderation |
Template Based Guardrails |
Language Critique Coherence Check |
0 to 1 |
0.6 |
If the LLM detects coherence issues in the prompt and gives a score of 0.6 or more then the prompt is blocked. |
15 |
Request Moderation |
Template Based Guardrails |
Language Critique Fluency Check |
0 to 1 |
0.6 |
If the LLM detects fluency issues in the prompt and gives a score of 0.6 or more then the prompt is blocked. |
16 |
Request Moderation |
Template Based Guardrails |
Language Critique Grammar Check |
0 to 1 |
0.6 |
If the LLM detects grammar issues in the prompt and gives a score of 0.6 or more then the prompt is blocked. |
17 |
Request Moderation |
Template Based Guardrails |
Language Critique Politeness Check |
0 to 1 |
0.6 |
If the LLM detects issues with politeness in the prompt and gives a score of 0.6 or more then the prompt is blocked. |
18 |
Response Moderation |
Model Based Guardrails |
Privacy Check |
None / List of entities |
1 PII entity |
As response we get a list of PII entities detected, if even one entity is detected the response gets blocked. |
19 |
Response Moderation |
Model Based Guardrails |
Restricted Topic Check |
0 to 1 |
0.7 |
The response is restricted topic score of each restricted topic label we have passed in payload, if even for one label the score is 0.7 or more, it means the model is confident that the response falls under that category and the response is blocked. |
20 |
Response Moderation |
Model Based Guardrails |
Toxicity Check |
0 to 1 |
0.7 |
If the final toxicity score is 0.7 or more than it means the model has classified the response as highly toxic and the response gets blocked. |
21 |
Response Moderation |
Model Based Guardrails |
Text Relevance Check (Similarity Score) |
0 to 100 |
NA |
This check is to check the similarity between the prompt given and the response generated. |
22 |
Response Moderation |
Model Based Guardrails |
Refusal Check |
0 to 1 |
0.7 |
If the score is 0.7 or more the response is blocked as it has high similarity with the foundation model refusal responses. |
23 |
Response Moderation |
Model Based Guardrails |
Text Quality |
Grades |
NA |
The Text Quality check assesses the readability and grade level of the generated response. For scoring and grading information refer here - https://infosys.atlassian.net/wiki/spaces/IF/pages/975077603/Moderation Layer - Features#[inlineExtension]Text-Quality |
24 |
Response Moderation |
Model Based Guardrails |
Profanity Check |
None / List of entities |
1 word |
If one or more profane words is/are detected in the response, it is blocked. |
25 |
Response Moderation |
Model Based Guardrails |
Sentiment Check |
-1 to 1 |
-0.01 |
If score is <=-0.01 it means the sentiment of the response is negative, the higher the score is the more positive the sentiment of the response is. |
26 |
Response Moderation |
Model Based Guardrails |
Invisible Text Check |
None / List of entities |
1 entity |
In the response we get list containing all entities present in the response which fall under different invisible text categories. Even if one such entity is detected the response gets blocked. |
27 |
Response Moderation |
Model Based Guardrails |
Gibberish Check |
0 to 1 |
0.7 |
If the label of the text detected in response is "word salad", "noise" or "mild gibberish" and its score is more than or equal to 0.7 then the response gets blocked. |
28 |
Response Moderation |
Model Based Guardrails |
Ban Code Check |
0 to 1 |
0.7 |
If the model detects code in response and gives a score of 0.7 or above, then the response is blocked. |
29 |
Response Moderation |
Template Based Guardrails |
Completeness Check |
0 to 1 |
0.6 |
If the LLM detects completeness issues in the response and gives a score of 0.6 or more then the prompt is blocked. |
30 |
Response Moderation |
Template Based Guardrails |
Conciseness Check |
0 to 1 |
0.6 |
If the LLM detects conciseness issues in the response and gives a score of 0.6 or more then the response is blocked. |
31 |
Response Moderation |
Template Based Guardrails |
Language Critique Coherence Check |
0 to 1 |
0.6 |
If the LLM detects coherence issues in the response and gives a score of 0.6 or more then the response is blocked. |
32 |
Response Moderation |
Template Based Guardrails |
Language Critique Fluency Check |
0 to 1 |
0.6 |
If the LLM detects fluency issues in the response and gives a score of 0.6 or more then the prompt is response. |
33 |
Response Moderation |
Template Based Guardrails |
Language Critique Grammar Check |
0 to 1 |
0.6 |
If the LLM detects grammar issues in the response and gives a score of 0.6 or more then the response is blocked. |
34 |
Response Moderation |
Template Based Guardrails |
Language Critique Politeness Check |
0 to 1 |
0.6 |
If the LLM detects impolite tone in the response and gives a score of 0.6 or more then the response is blocked. |
35 |
Response Comparison |
Template Based Guardrails |
Hallucination Score (ThoT) |
0 to 100 |
|
Values closer to 0 indicates lesser Hallucination. 0-30: Low, 30-50: Medium, > 50: High |
36 |
Response Comparison |
Template Based Guardrails |
Uncertainty Score (ThoT, CoT, RE2, LoT, GoT#) |
0 to 100 |
|
Highly Certain: >=0 and <=30 (less uncertainty) |
37 |
Response Comparison |
Template Based Guardrails |
Coherence Score (ThoT, CoT, RE2, LoT, GoT#) |
0 to 100 |
|
Less Coherent: >=0 and <=30 |
**Please note that the thresholds outlined are general recommendations derived from industry best practices. These values are configurable and may be adjusted to align with the unique demands of each application.
# ThoT - Thread of Thoughts, CoT - Chain of Thoughts, RE2 - Re-Read Reasoning, LoT - Logic of Thoughts, GoT - Graph of Thoughts