This FAQ explains in a practical way how to register, edit, activate/deactivate, and synchronize content sources of your project — as well as interpret statuses and configure collection recurrence.
What are "Content Sources"?
They are settings that specify where ClaudIA should search for knowledge (e.g., your Help Center).
Each source has: Initial URL, path filters, scheduling/clock for synchronization, and a default tag.
Where do I find the screen?
In the project, go to Content → Sources. Below is an example of the screen in a project with no active synchronization yet.

If the project already has active syncs, you'll see a list with the following columns:
-
Initial URL
-
Status of the last execution (e.g.: Synchronized, Error, Canceled)
-
Last synchronization
-
State (Activated/Deactivated)
-
Actions: Synchronize, Edit, View URLs
You can sort by clicking on the column headers.
How do I add or edit a source?

-
Click on Add Source (or Edit an existing one).
-
In the modal, fill out:
-
Scheduling (Recurrence & Local Time)
Select Daily / Weekly / Monthly and a time. The interface displays:-
The next execution (relative and with local date/time).
-
The system automatically converts everything to UTC cron.
-
-
Initial URL
Must be a valid URL with protocol http/https. -
Path Filter (mandatory)
Synchronizes only pages whose path starts with one of these prefixes (one per line).
→ Each value must start with "/" and have more than 1 character.
Example:/docs,/hc/en -
Exclude Path Filter (optional)
Paths (same format as the filter) to be ignored.
Example:/internal-portal,/private -
Default Tag (optional)
Tag applied to the content originating from this source (helps in filtering/searching).
-
-
Save with Update Source (Save).

Important Validations
-
Invalid URL or with protocol other than http/https causes an error.
-
Each item in the filter must start with "/";
/alone is not accepted.
How does the recurrence (scheduling) work?
-
You choose Daily / Weekly / Monthly.
-
The system calculates the schedule considering the local time zone and shows the next estimated run.
Examples:
-
Daily at 06:00 (local) → runs every day at this time.
-
Weekly (Monday at 07:00) → runs every Monday at the chosen time.
-
Monthly (on the 15th at 05:00) → runs every 15th of the month at this time.
Tip: you can change the cadence at any time; the next execution already considers the new setting.
How do I turn on, off, start, or stop synchronization?


-
Activate/Deactivate: use the toggle in the Status column.
-
Activated: the source participates in executions (scheduled or manual).
-
Deactivated: the source is paused, not running manually nor automatically.
-
-
Synchronize (manual): click Synchronize.
The status changes to Starting/Running. -
Stop: while Running, the Stop button appears; click to cancel the current execution.
If I deactivate a source, does ClaudIA stop using that content?
It depends on the type of deactivation:
-
🔄 Disable synchronization: ClaudIA stops updating the content from this source, but still uses what was previously synchronized.
-
❌ Remove source: ClaudIA completely stops using that content.
If your goal is to control what ClaudIA uses, it's best to remove the source (rather than just disable synchronization) or use source filters on the content screen to isolate and exclude materials from a specific source.
How can I see the extracted URLs?
Click View URLs. A modal lists all URLs collected for that source.
You can open each in a new tab to check the content.


What do the statuses mean?
Status “badges” appear in the Status column:
-
Syncing
Data collection is in progress. -
Synchronized
The last execution finished successfully. -
Error
Something failed. Hover over the badge to see error details. If nothing appears, contact our support team. -
Cancelled
The execution was interrupted (manually or by system). -
Pending
No synchronization has been run yet.
The page updates the status automatically: faster when there are executions, and at longer intervals when everything is stable.
Best practices for path filters
-
Be specific: use prefixes that represent relevant sections.
Example:/hc/en,/help/,/product-x/docs -
Avoid generic “/”: this tries to crawl the entire site.
-
Use “Ignore routes in synchronization” to skip internal or sensitive areas.
Example:/admin,/account,/internal-portal
Common errors and how to resolve them
-
“Invalid URL” / “Invalid protocol”
Confirm that the URL starts with http:// or https:// and is correct. -
“Path must start with ‘/’”
Adjust filter items to the format /your-prefix. -
Status “Error”
Hover over the badge to read the reason (e.g., step failed).
Review filters/URL and try to Synchronize again.
If the error persists, contact support. -
No URLs listed under “View URLs”
Check if the path filters include the desired pages.
Can I organize or search what has been collected afterward?
Yes — the collected contents appear in the Content tab, where you can search by the desired URL and the contents will appear.** If you need to bulk edit something that was incorrectly set, here's how to do it.
Can I flag content with different labels/tags within the same source?
Yes.
You can manually fill in tags (labels) on each synchronized content.
When you do this manually, these tags are not overwritten in subsequent synchronizations — only the textual content is updated.
This allows organizing a single source into multiple categories or filters.
Final tips
-
Start with one source per area (e.g., Help Center PT/EN separated) and clear filters.
-
Schedule for times of low traffic on your site.
-
Review “View URLs” after the first run to ensure only the desired material is being fetched.
-
Use a Default Tag for easy identification to facilitate searches and reports by source.
How you will see it in Conversation
Just like in the Content tab, in the Conversation tab you will also be able to see whether the content is from an external source or not.

📌 List of HTML tags ignored by default
During the refinement process, the extraction agent automatically discards various elements that are usually not useful for the knowledge base.
These selectors were set to reduce visual and structural noise (menus, ads, comments, forms, etc.), keeping only the main content.
Page Structure
#footer
#header
#nav
nav
footer
Scripts and styles
sScript
style
noscript
Media
svg
img
audio
video
Navigation and menus
.sidebar
.menu
.navigation
.breadcrumb
.breadcrumbs
.pagination
.pager
.page-navigation
Ads and banners
.advertisement
.ads
.ad-banner
.cookie-banner
.cookie-notice
.gdpr-notice
Social and sharing
.social-share
.social-buttons
.share-buttons
Forms and subscriptions
.newsletter
.subscription
.signup-form
.search-box
.search-form
.search-bar
Related content
.related-posts
.recommended
.suggestions
Comments and discussions
.comments
.comment-section
.discussion
Metadata and authorship
tags
.tag-list
.categories
.author-bio
.author-info
.byline
.meta
.metadata
.post-meta
Widgets and sidebars
.widget
.widgets
.sidebar-widget
Popups and overlays
.popup
.modal
.overlay
.lightbox
Accessibility and hidden navigation
skip-link
.screen-reader-text
.sr-only
.print-only
.no-print
Accessibility attributes
[role='alert']
[role='banner']
[role='navigation']
[role='complementary']
[role='dialog']
[role='alertdialog']
[role="region"][aria-label*="skip" i]
[aria-hidden='true']
[aria-modal='true']
Invisible elements
.hidden
.invisible