This FAQ explains in a practical way how to register, edit, activate/deactivate, and synchronize content sources of your project — as well as interpret statuses and configure collection recurrence.

What are "Content Sources"?

They are settings that specify where ClaudIA should search for knowledge (e.g., your Help Center).
Each source has: Initial URL, path filters, scheduling/clock for synchronization, and a default tag.

Where do I find the screen?

In the project, go to Content → Sources. Below is an example of the screen in a project with no active synchronization yet.

If the project already has active syncs, you'll see a list with the following columns:

Initial URL
Status of the last execution (e.g.: Synchronized, Error, Canceled)
Last synchronization
State (Activated/Deactivated)
Actions: Synchronize, Edit, View URLs

You can sort by clicking on the column headers.

How do I add or edit a source?

Click on Add Source (or Edit an existing one).
In the modal, fill out:
- Scheduling (Recurrence & Local Time)
  Select Daily / Weekly / Monthly and a time. The interface displays:
  - The next execution (relative and with local date/time).
  - The system automatically converts everything to UTC cron.
- Initial URL
  Must be a valid URL with protocol http/https.
- Path Filter (mandatory)
  Synchronizes only pages whose path starts with one of these prefixes (one per line).
  → Each value must start with "/" and have more than 1 character.
  Example: /docs, /hc/en
- Exclude Path Filter (optional)
  Paths (same format as the filter) to be ignored.
  Example: /internal-portal, /private
- Default Tag (optional)
  Tag applied to the content originating from this source (helps in filtering/searching).
Save with Update Source (Save).

Important Validations

Invalid URL or with protocol other than http/https causes an error.
Each item in the filter must start with "/"; / alone is not accepted.

How does the recurrence (scheduling) work?

You choose Daily / Weekly / Monthly.
The system calculates the schedule considering the local time zone and shows the next estimated run.

Examples:

Daily at 06:00 (local) → runs every day at this time.
Weekly (Monday at 07:00) → runs every Monday at the chosen time.
Monthly (on the 15th at 05:00) → runs every 15th of the month at this time.

Tip: you can change the cadence at any time; the next execution already considers the new setting.

How do I turn on, off, start, or stop synchronization?

Activate/Deactivate: use the toggle in the Status column.
- Activated: the source participates in executions (scheduled or manual).
- Deactivated: the source is paused, not running manually nor automatically.
Synchronize (manual): click Synchronize.
The status changes to Starting/Running.
Stop: while Running, the Stop button appears; click to cancel the current execution.

If I deactivate a source, does ClaudIA stop using that content?

It depends on the type of deactivation:

🔄 Disable synchronization: ClaudIA stops updating the content from this source, but still uses what was previously synchronized.
❌ Remove source: ClaudIA completely stops using that content.

If your goal is to control what ClaudIA uses, it's best to remove the source (rather than just disable synchronization) or use source filters on the content screen to isolate and exclude materials from a specific source.

How can I see the extracted URLs?

Click View URLs. A modal lists all URLs collected for that source.
You can open each in a new tab to check the content.

What do the statuses mean?

Status “badges” appear in the Status column:

Syncing
Data collection is in progress.
Synchronized
The last execution finished successfully.
Error
Something failed. Hover over the badge to see error details. If nothing appears, contact our support team.
Cancelled
The execution was interrupted (manually or by system).
Pending
No synchronization has been run yet.

The page updates the status automatically: faster when there are executions, and at longer intervals when everything is stable.

Best practices for path filters

Be specific: use prefixes that represent relevant sections.
Example: /hc/en, /help/, /product-x/docs
Avoid generic “/”: this tries to crawl the entire site.
Use “Ignore routes in synchronization” to skip internal or sensitive areas.
Example: /admin, /account, /internal-portal

Common errors and how to resolve them

“Invalid URL” / “Invalid protocol”
Confirm that the URL starts with http:// or https:// and is correct.
“Path must start with ‘/’”
Adjust filter items to the format /your-prefix.
Status “Error”
Hover over the badge to read the reason (e.g., step failed).
Review filters/URL and try to Synchronize again.
If the error persists, contact support.
No URLs listed under “View URLs”
Check if the path filters include the desired pages.

Can I organize or search what has been collected afterward?

Yes — the collected contents appear in the Content tab, where you can search by the desired URL and the contents will appear.** If you need to bulk edit something that was incorrectly set, here's how to do it.

Can I flag content with different labels/tags within the same source?

Yes.
You can manually fill in tags (labels) on each synchronized content.
When you do this manually, these tags are not overwritten in subsequent synchronizations — only the textual content is updated.
This allows organizing a single source into multiple categories or filters.

Final tips

Start with one source per area (e.g., Help Center PT/EN separated) and clear filters.
Schedule for times of low traffic on your site.
Review “View URLs” after the first run to ensure only the desired material is being fetched.
Use a Default Tag for easy identification to facilitate searches and reports by source.

How you will see it in Conversation

Just like in the Content tab, in the Conversation tab you will also be able to see whether the content is from an external source or not.

📌 List of HTML tags ignored by default

During the refinement process, the extraction agent automatically discards various elements that are usually not useful for the knowledge base.
These selectors were set to reduce visual and structural noise (menus, ads, comments, forms, etc.), keeping only the main content.

Page Structure

#footer
#header
#nav
nav
footer

Scripts and styles

sScript
style
noscript

Media

svg
img
audio
video

Navigation and menus

.sidebar
.menu
.navigation
.breadcrumb
.breadcrumbs
.pagination
.pager
.page-navigation

Ads and banners

.advertisement
.ads
.ad-banner
.cookie-banner
.cookie-notice
.gdpr-notice

Social and sharing

.social-share
.social-buttons
.share-buttons

Forms and subscriptions

.newsletter
.subscription
.signup-form
.search-box
.search-form
.search-bar

Comments and discussions

.comments
.comment-section
.discussion

Metadata and authorship

tags
.tag-list
.categories
.author-bio
.author-info
.byline
.meta
.metadata
.post-meta

Widgets and sidebars

.widget
.widgets
.sidebar-widget

Popups and overlays

.popup
.modal
.overlay
.lightbox

Accessibility and hidden navigation

skip-link
.screen-reader-text
.sr-only
.print-only
.no-print

Accessibility attributes

[role='alert']
[role='banner']
[role='navigation']
[role='complementary']
[role='dialog']
[role='alertdialog']
[role="region"][aria-label*="skip" i]
[aria-hidden='true']
[aria-modal='true']

Invisible elements

.hidden
.invisible

Synchronize External Content for AI (External Source)