Home Improving ClaudIA Even More Test Cases in ClaudIA: What They Are, How to Use Them, and Why They Make a Difference

Test Cases in ClaudIA: What They Are, How to Use Them, and Why They Make a Difference

Last updated on Aug 12, 2025

What are Test Cases?

Test Cases are a feature that allows you to test changes in ClaudIA quickly, safely, and reproducibly.

They simulate real interactions based on historical tickets and check if ClaudIA responds as expected after changes to sessions or prompts.


What are they for?

You can use Test Cases to:

  • Correct inappropriate behaviors (e.g., incomplete responses);

  • Ensure that changes in sessions do not break other responses;

  • Measure the impact of an adjustment before deploying it to production;

  • Test new flows, responses, or instructions more quickly.


Example of Use

Problem:

ClaudIA is using the correct session, but omitting part of the content in the response to the customer.

Solution with Test Case:

  1. Tag the tickets with this problem.

  2. Create a Test Case with these tickets.

  3. Make adjustments to the sessions (e.g., instruction to always send the complete content).

  4. Run the Test Case and verify if the problem was resolved.

  5. Ensure that no other desired behavior was affected.


How to Create a Test Case (Step by Step)

  1. Access a ticket with an inappropriate response.

  2. Click on the đź§Ş icon next to ClaudIA's message.

  3. Click on “New” to create a new Test Case.

  • Give it a descriptive name (e.g., “Incomplete Response”).
  1. Configure:
  • Checkbox “Reuse returned sections”:

    • Check if you want to use the same sections from the time of the original ticket.

    • Uncheck if you want to test with new sections or recent changes.

  • Number of executions per ticket (e.g., 5x) to ensure consistency.

5. Choose the type of verification:

  • LLM: Uses a prompt to evaluate the response.

  • Embeddings: Compares the original response with the current one by vector distance.

  • Regex: Checks if certain words should or should not appear. Here is a link explaining Regex


Available Test Types

LLM

  • Main Use: Evaluate if the new response meets defined rules via prompt

  • Example: “Does the response contain 100% of the text from the session?”

Embeddings

  • Main Use: Compare similarity between the original response and the new one

  • Example: Vector distance less than 0.2

Regex

  • Main Use: Ensure that ClaudIA uses (or avoids) certain words

  • Example: cannot contain “sorry”


How to Add More Tickets to a Test Case

  1. Access the second (or third, fourth…) ticket with the same problem.

  2. Click on the đź§Ş icon.

  3. Go to the “Existing” tab and select the already created Test Case.

  4. Save.


How to Interpret the Results

After running the Test Case:

  • âś… Passed: ClaudIA's response is according to the rule.

  • ❌ Failed: The response is still incorrect or incomplete.

You can see:

  • Which executions passed or failed.

  • Complete history of the Test Case performance.


Ensuring No Other Responses Were Affected

After running a specific test, click on “Run All” at the top of the Test Cases page.

This executes all existing cases and checks if any changes you made broke other flows.


How to Schedule Periodic Executions

You can schedule your Test Cases to run automatically:

  • Every 6 hours

  • Daily

  • Weekly

This helps to detect breaks caused by unforeseen changes.


And what about the A/B Test Function?

If you want to test without impacting production, use the A/B Test mode.

In it, you can run Test Cases with a new version of the prompt or session without changing ClaudIA's behavior for end customers.

To enable it, just contact the Cloud Humans team.


TL;DR (Final Summary)

  • Test Cases are like automated tests to ensure ClaudIA is responding correctly.

  • They help fix problems faster and safely.

  • You can create, edit, rerun, and schedule tests based on real tickets.

  • It is possible to test different types of validation (LLM, regex, embeddings).

  • Use the “Run All” button to ensure nothing was broken after adjustments.