03f. Unstructured Data Source Wizard

Last modified by Ross Beck on 06/11/2025, 14:23

To create an Index from an unstructured data source, click Wizards, then Unstructured.

Click the New button to begin the process.

Creation Process

Details

Enter the name that will be given to the Data Source, Data Source Group and Index in the Name text box.

Use the Tags text box to add associated search terms to the Index structure. This allows the components to be searched using alternative strings when using the search bar.

To automatically restrict access to the Index once built, enable the Sensitive option. Only users with explicit permission will be able to access the Index. 

The Index Method drop-down list specifies the build method to be used when building the Index.

The Complete option will result in all items in a data source being indexed when a refresh takes place. Once refreshed, a new version number is applied to the Index. Each run completely refreshes the Index.

Building an Incremental Index enables an optimised refresh process when rebuilding Indexes to account for any changes. Rather than rebuilding the entire Index, only data that has been modified or added will be processed. New and changed records are added as new rows. This maintains the history of a record that may have changed over time, and no de-duplication is applied. When selected, the Incremental Identifier drop-down list is revealed. This is required to identify changes, using the selected field to detect when new data is greater than the previous highest value.

Use the Index Size drop-down list to specify the expected number of records the resulting Index will contain. This is an indicative value and does not need to be exact.

Using this drop-down list allows the system to work out how many folders to split the Index into. The higher the specified Index size, the less folders created. A lower number of folders results in less threads used to query the Index, as a single thread is allocated per folder. This decreases the speed of individual queries, but reduces the performance impact of multiple concurrent users querying the system simultaneously.

Use the Add to Search Engine drop-down list to select the relevant search engine the Index will be added to upon completion and the Add to Collection drop-down list to select an existing Collection to which the Index will be added.

Enable the Build Now option to build the Index as soon as the creation process is complete. If disabled, the settings are saved for the Index to be built at a later time.

From the Type options, select Directory if indexing a series of folder contents, or URL if indexing a web page.

Directory

To open and Index the contents of supported compressed archives, enable the Explode Archives option.

Enable the Only Index Structured Documents option to expand Access or XML documents into individual Index documents. Any other non-expandable documents in the same folder will not be indexed when this option is enabled.

To create a separate document for each PDF page, enable the Split PDFs by Page option.

If indexing web pages directly on the server, enable the Website Filesystem option.

Use the No Suffix is Type drop-down list to select a file type that will be applied to any files that do not detail a file extension.

The Location option allows the directory of the unstructured files to be established. If the Type is specified as a Directory, click the icon and select the required folder. If specified as a URL, enter the address of the required website.

Different Operating Systems will use different prefixes for certain hidden files. To exclude this use the appropriate option from Exclude Files Prefixed With Tilde (~) and Exclude Files Prefixed With Dot Underscore (._).

Extensions can also be excluded using the Excluded Extensions list, from the Index by moving the extension from the available list to the selected list.

URL

Specifying URL as the Type will reveal additional options. Enable the Use Login option to add a Username, Password and Domain if required by the target website. Enable the Use Proxy option to add Host, Port, User, Password and Domain details, if required.