List of Metrics

Metrics Detail & Threshold information

#	Phase	Type of Guardrails	Metric	Score Range / Output	Threshold**	Description
1	Request Moderation	Model Based Guardrails	Prompt Injection Score	0 to 1	0.7	Score of 0.7 or more indicates the model is highly confident that the prompt is ‘injected’, as less score like 0 or 0.1 etc. shows the model is confident that the prompt is ‘legit’.
2	Request Moderation	Model Based Guardrails	Jailbreak Score	0 to 1	0.7	Score of 0.7 or more indicates that Jailbreak attack is there in the prompt.
3	Request Moderation	Model Based Guardrails	Privacy Check	None / List of entities	1 PII entity	As response we get a list of PII entities detected, if even one entity is detected the prompt gets blocked.
4	Request Moderation	Model Based Guardrails	Profanity Check	None / List of entities	1 word	If one or more profane words is/are detected in the prompt, it is blocked.
5	Request Moderation	Model Based Guardrails	Restricted Topic Check	0 to 1	0.7	The response is restricted topic score of each restricted topic label we have passed in input, if even for one label the score is 0.7 or more, it means the model is confident that the text falls under that category and the prompt is blocked.
6	Request Moderation	Model Based Guardrails	Toxicity Check	0 to 1	0.7	If the final toxicity score is 0.7 or more than it means the model has classified the prompt as highly toxic and the prompt gets blocked.
7	Request Moderation	Model Based Guardrails	Sentiment Check	-1 to 1	-0.01	If score is <=-0.01 it means the sentiment of the sentence is negative, the higher the score is the more positive the sentiment of the sentence is.
8	Request Moderation	Model Based Guardrails	Invisible Text Check	None / List of entities	1 entity	In the response we get list containing all entities present in the prompt which fall under different invisible text categories. Even if one such entity is detected the prompt gets blocked.
9	Request Moderation	Model Based Guardrails	Gibberish Check	0 to 1	0.7	If the label of the text detected in prompt is "word salad", "noise" or "mild gibberish" and its score is more than or equal to 0.7 then the prompt gets blocked.
10	Request Moderation	Model Based Guardrails	Ban Code Check	0 to 1	0.7	If the model detects code in prompt and gives a score of 0.7 or above, then the prompt is blocked.
11	Request Moderation	Template Based Guardrails	Prompt Injection Score	0 to 1	0.6	If the LLM detects prompt injection in the prompt and gives a score of 0.6 or more then the prompt is blocked.
12	Request Moderation	Template Based Guardrails	Jailbreak Score	0 to 1	0.6	If the LLM detects jailbreak in the prompt and gives a score of 0.6 or more then the prompt is blocked.
13	Request Moderation	Template Based Guardrails	Fairness & Bias Check	Low/Medium/High/Neutral		Categories determined by Fairness Score generated by LLM which ranges from 0 to 100. 0: Neutral, 1-30: Low, 31-70: Medium, >70: High
14	Request Moderation	Template Based Guardrails	Language Critique Coherence Check	0 to 1	0.6	If the LLM detects coherence issues in the prompt and gives a score of 0.6 or more then the prompt is blocked.
15	Request Moderation	Template Based Guardrails	Language Critique Fluency Check	0 to 1	0.6	If the LLM detects fluency issues in the prompt and gives a score of 0.6 or more then the prompt is blocked.
16	Request Moderation	Template Based Guardrails	Language Critique Grammar Check	0 to 1	0.6	If the LLM detects grammar issues in the prompt and gives a score of 0.6 or more then the prompt is blocked.
17	Request Moderation	Template Based Guardrails	Language Critique Politeness Check	0 to 1	0.6	If the LLM detects issues with politeness in the prompt and gives a score of 0.6 or more then the prompt is blocked.
18	Response Moderation	Model Based Guardrails	Privacy Check	None / List of entities	1 PII entity	As response we get a list of PII entities detected, if even one entity is detected the response gets blocked.
19	Response Moderation	Model Based Guardrails	Restricted Topic Check	0 to 1	0.7	The response is restricted topic score of each restricted topic label we have passed in payload, if even for one label the score is 0.7 or more, it means the model is confident that the response falls under that category and the response is blocked.
20	Response Moderation	Model Based Guardrails	Toxicity Check	0 to 1	0.7	If the final toxicity score is 0.7 or more than it means the model has classified the response as highly toxic and the response gets blocked.
21	Response Moderation	Model Based Guardrails	Text Relevance Check (Similarity Score)	0 to 100	NA	This check is to check the similarity between the prompt given and the response generated.
22	Response Moderation	Model Based Guardrails	Refusal Check	0 to 1	0.7	If the score is 0.7 or more the response is blocked as it has high similarity with the foundation model refusal responses.
23	Response Moderation	Model Based Guardrails	Text Quality	Grades	NA	The Text Quality check assesses the readability and grade level of the generated response. For scoring and grading information refer here - https://infosys.atlassian.net/wiki/spaces/IF/pages/975077603/Moderation Layer - Features#[inlineExtension]Text-Quality
24	Response Moderation	Model Based Guardrails	Profanity Check	None / List of entities	1 word	If one or more profane words is/are detected in the response, it is blocked.
25	Response Moderation	Model Based Guardrails	Sentiment Check	-1 to 1	-0.01	If score is <=-0.01 it means the sentiment of the response is negative, the higher the score is the more positive the sentiment of the response is.
26	Response Moderation	Model Based Guardrails	Invisible Text Check	None / List of entities	1 entity	In the response we get list containing all entities present in the response which fall under different invisible text categories. Even if one such entity is detected the response gets blocked.
27	Response Moderation	Model Based Guardrails	Gibberish Check	0 to 1	0.7	If the label of the text detected in response is "word salad", "noise" or "mild gibberish" and its score is more than or equal to 0.7 then the response gets blocked.
28	Response Moderation	Model Based Guardrails	Ban Code Check	0 to 1	0.7	If the model detects code in response and gives a score of 0.7 or above, then the response is blocked.
29	Response Moderation	Template Based Guardrails	Completeness Check	0 to 1	0.6	If the LLM detects completeness issues in the response and gives a score of 0.6 or more then the prompt is blocked.
30	Response Moderation	Template Based Guardrails	Conciseness Check	0 to 1	0.6	If the LLM detects conciseness issues in the response and gives a score of 0.6 or more then the response is blocked.
31	Response Moderation	Template Based Guardrails	Language Critique Coherence Check	0 to 1	0.6	If the LLM detects coherence issues in the response and gives a score of 0.6 or more then the response is blocked.
32	Response Moderation	Template Based Guardrails	Language Critique Fluency Check	0 to 1	0.6	If the LLM detects fluency issues in the response and gives a score of 0.6 or more then the prompt is response.
33	Response Moderation	Template Based Guardrails	Language Critique Grammar Check	0 to 1	0.6	If the LLM detects grammar issues in the response and gives a score of 0.6 or more then the response is blocked.
34	Response Moderation	Template Based Guardrails	Language Critique Politeness Check	0 to 1	0.6	If the LLM detects impolite tone in the response and gives a score of 0.6 or more then the response is blocked.
35	Response Comparison	Template Based Guardrails	Hallucination Score (ThoT)	0 to 100		Values closer to 0 indicates lesser Hallucination. 0-30: Low, 30-50: Medium, > 50: High
36	Response Comparison	Template Based Guardrails	Uncertainty Score (ThoT, CoT, RE2, LoT, GoT^#)	0 to 100		Highly Certain: >=0 and <=30 (less uncertainty) Moderately Certain: >30 and <=70 (moderately uncertain) Less Certain: >70 and <=100 (highly uncertain)
37	Response Comparison	Template Based Guardrails	Coherence Score (ThoT, CoT, RE2, LoT, GoT^#)	0 to 100		Less Coherent: >=0 and <=30 Moderately Coherent: >30 and <=70 Highly Coherent: >70 and <=100

**Please note that the thresholds outlined are general recommendations derived from industry best practices. These values are configurable and may be adjusted to align with the unique demands of each application.

# ThoT - Thread of Thoughts, CoT - Chain of Thoughts, RE2 - Re-Read Reasoning, LoT - Logic of Thoughts, GoT - Graph of Thoughts

List of Metrics

[data-colorid=oa2ls53sat]{color:#0747a6} html[data-color-mode=dark] [data-colorid=oa2ls53sat]{color:#5999f8}Metrics Detail & Threshold information

Metrics Detail & Threshold information