03a. Database Data Source Wizard

Last modified by Ross Beck on 06/11/2025, 14:23

The Database Data Source Wizard contains the options to create Index structures from a number of database types.

To create an index from a database, click Wizards then Database.

Click the New button to begin the process.

Creation Process

Details

Enter the name that will be given to the Data Source, Data Source Group and Index in the Name text box.

Use the Tags text box to add associated search terms to the Index structure. This allows the components to be searched using alternative strings when using the search bar.

To automatically restrict access to the Index once built, enable the Sensitive option. Only users with explicit permission will be able to access the Index. 

The Index Method drop-down list specifies the build method to be used when building the Index.

The Complete option will result in all items in a data source being indexed when a refresh takes place. Once refreshed, a new version number is applied to the Index. Each run completely refreshes the Index.

Building an Incremental Index enables an optimised refresh process when rebuilding Indexes to account for any changes. Rather than rebuilding the entire Index, only data that has been modified or added will be processed. New and changed records are added as new rows. This maintains the history of a record that may have changed over time, and no de-duplication is applied. When selected, the Incremental Identifier drop-down list is revealed. This is required to identify changes, using the selected field to detect when new data is greater than the previous highest value.

The Timeline option allows the building of ‘point in time’ indexes. Using an Effective From date, comparisons of data between set dates can be made. When this build option is selected, a Primary Key is required. Index values will be created for each Primary Key value with an Effective To and Effective From value.

The Cumulative option works the same as an Incremental build, but replaces duplicates as well as adding new data. Duplicate records are removed from the result set based on the Increment Identifier and Primary Key. When cumulatively building from CSV files, only a Primary Key is required. For other database types, an Incremental Identifier and a Primary Key are required.

Use the Index Size drop-down list to specify the expected number of records the resulting Index will contain. This is an indicative value and does not need to be exact.

Using this drop-down list allows the system to work out how many folders to split the Index into. The higher the specified Index size, the less folders created. A lower number of folders results in less threads used to query the Index, as a single thread is allocated per folder. This decreases the speed of individual queries, but reduces the performance impact of multiple concurrent users querying the system simultaneously.

Use the Add to Search Engine drop-down list to select the relevant search engine the Index will be added to upon completion and the Add to Collection drop-down list to select an existing Collection to which the Index will be added.

Enable the Build Now option to build the Index as soon as the creation process is complete. If disabled, the settings are saved for the Index to be built at a later time.

Setup

Select the source database from the Database Type drop-down list to reveal the relevant connection fields below. Previously saved Database Connections are displayed at the top of the list. Enable the Show Driver and URL option to display the underlying details used to provide the database compatibility.

The following options are displayed depending on the choice of Database Type:

Option

Applicable To

Description

Server

4D, CXAIR, DB2, HSqlDB, JDBC4OLAP, MySQL, Oracle 8i, Oracle 9i/10g/11g, Oracle 12c, Postgre SQL, SQL Server 2000 – 2005, SQL Server 2008 and above, Sybase

The name or IP address of the database server.

System

DB2 AS/400

The name of the database system.

Advanced Connection Details

Oracle 12 Advanced

See Extra Parameters.

Database Name

DB2, HSqlDB, MySQL, Oracle 8i, Oracle 9i/10g/11g, Oracle 12c, Postgre SQL, SQL Server 2000 – 2005, SQL Server 2008 and above, Sybase

The name of the target database.

Extra Parameters

SQL Server 2000 – 2005, SQL Server 2008 and above, Sybase

Specify the SQL Server instance or domain if connecting via Windows Authentication. Separate multiple parameters using a semicolon.

Enter the name of an instance in the following format:

instance=<instance name>

If using a Windows SQL Server account*, enter:

domain=<domain name>

Enter the following to send LMv2/NTLMv2 responses when connecting via a domain:

useNTLMv2=true

*Due to driver limitations, Windows Authentication is only compatible with the SQL Server 2000-2005 driver.

Context

CXAIR

See Extra Parameters.

Port

CXAIR, HSqlDB, JDBC4OLAP, MySQL, Oracle 8i, Oracle 9i/10g/11g, Oracle 12c, Postgre SQL, SQL Server 2000 – 2005, SQL Server 2008 and above, Sybase

The port number of the database server.

Account Name

4D, CXAIR, DB2, DB2 AS/400, HSqlDB, JDBC4OLAP, MySQL, Oracle 8i, Oracle 9i/10g/11g, Oracle 12c, Oracle 12c Advanced, PostgreSQL, SQL Server 2000 – 2005, SQL Server 2008 and above, Sybase, U Can Access

The account name of a database user that can execute the underlying SQL.

Account Password

4D, CXAIR, DB2, DB2 AS/400, HSqlDB, JDBC4OLAP, MySQL, Oracle 8i, Oracle 9i/10g/11g, Oracle 12c, Oracle 12c Advanced, PostgreSQL, SQL Server 2000 – 2005, SQL Server 2008 and above, Sybase, U Can Access

The account password for the account detailed above.

SQL

4D, CXAIR, DB2, DB2 AS/400, HSqlDB, JDBC4OLAP, MySQL, Oracle 8i, Oracle 9i/10g/11g, Oracle 12c, Oracle 12c Advanced, PostgreSQL, SQL Server 2000 – 2005, SQL Server 2008 and above, Sybase, Fixed, U Can Access

The SQL statement that will retrieve the data from the data source.

With the Fixed option selected, the following syntax is used to select the required columns:

select
column(11),
column(21),
column(8)
from <tablename>

Delimiter

CSV

The character that splits the field values. By default, this is a comma (,).

Text Qualifier

CSV

The character that qualifies text within a value. By default, this is quotation marks ( ).

First Row is Header

CSV, Excel, Fixed

Sets the contents of the first row to be used as the column headings in the resulting output.

Sheet

Excel

Specify the required Excel sheet.

Location

CSV, Excel, U Can Access

Click the icon to open the file browser, where the relevant file can be selected.

First Row is Header

CSV, Excel

Enable this option if the first row of data in the spreadsheet contains column headings.

Character Set

CSV

The encoding of the CSV file. The default option is UTF-8, but this will differ depending on the source file.

ProjectBigQueryName of the BigQuery project
Credential FileBigQueryPath to your Credentials File

Column Mapping

Clicking the Column Mapping tab will return the metadata from the data source and allows the configuration of how the database fields will be formatted. A loading icon denotes that the data is being processed.

If First Row is Header option from the Setup tab is enabled, the column names will be displayed under the Column heading. If not enabled when using a CSV as a source file, columns are numbered.

Data Type Conversion

While the type of data that exists in each field will be automatically recognised, the Type drop-down list allows the data format to be changed as it is Indexed.

CXAIR groups data formats into four distinct types: Timestamp, Date, Number and Text. To convert the data format, select the required type from below the Data Type Conversion heading. If converting a field to a numeric or date value, use the subsequently revealed format text box to specify the format of the incoming values. Multiple date formats can be assigned to a column by pipe (|) separating entries. For example, dd/MM/yyyy | MM/dd/yyyy.

Obfuscation

Using the Obfuscation options, selected fields can be obscured to preventing individual records from being identifiable. The following options are available:

Option

Description

Anonymise

The field contents are replaced with the field name and a sequential integer value, completely removing the field contents.

Encrypted

Encrypts the field with the Encryption Key set in the System section of the System Settings.

MD5 / Dual MD5

Produces a 128-bit hexadecimal hash value in place of the field contents.
Selecting Dual MD5 creates the hashed value alongside the original value, with administrators able to select the field that is made available to users.

SHA-2 / Dual SHA-2

Produces a hash value using the Secure Hashing Algorithm 2, with a 256 bit size. Structurally, SHA-2 uses the Davies–Meyer structure with a block cipher.
Selecting Dual SHA-2 creates the hashed value alongside the original value, with administrators able to select the field that is made available to users.

SHA-3 / Dual SHA-3

Produces a hash value using the Secure Hashing Algorithm 3, with a 256 bit size. Structurally, SHA-3 uses a sponge structure with the Keccak permutation.
Selecting Dual SHA-3 creates the hashed value alongside the original value, with administrators able to select the field that is made available to users.

For URL values, selecting URL will add a navigation button into the Query screen that end-users can use to follow the link in a new browser tab.

Use the Null Values text box enables the entry of any values that will be converted to NULL within the Index for each column.

Index Mappings

To further configure the resulting fields, there are three configurable lists that will impact their visibility: Store, Index and Free Text. When new columns are added to the data source and the Index is refreshed, the new columns are not automatically added to these lists. Enable the Automatically add new columns option to ensure all new columns are added automatically.

For a field to be displayed and fully searchable, it must be added to all three lists.

Add fields to these lists by using the left and right arrows to move the fields from the Available to the Selected boxes.

The values specified under the Store heading will be physically stored within the Index. By default, all values are selected based on the SQL statement, if provided. This provides the option of removing any unwanted fields when selecting all values from a large database table.

The values specified under the Index heading are the fields that will be accessed when a data query is performed. By default, fields with a large number of unique values will be excluded from this list. The limit is controlled by the Maximum Drop List Items option in the System Settings.

The Values specified under the Free Text heading are the fields that will be accessed when a free text query is performed. Numeric and date fields are automatically excluded. By default, fields with a large number of unique values will be excluded from this list. The limit is controlled by the Maximum Drop List Items option in the System Settings.

If a field is contained in the Store list but not within the Index list, the field will be displayed in the table and list view. When running a search, however, the field will be omitted if it is being used in a filter.

If a field has been added to the Store and Index lists, but not the Free Text list, any filters used to search data will take the field into account, but any free text searches using the query bar will ignore it.

The Analysed list has been designed to accommodate fields containing multiple words, such as a ‘Comments’ field, to allow for case insensitive searches in the Query screen. When added to the list, each word in the field is stored as an individual entity to facilitate field-specific searches, in contrast to regular fields that are stored as a single string value. This allows individual words to be more accurately returned from a field query and is especially useful when creating Word Clouds that require the Stop Words functionality to display an accurate view of the frequency. Please note that any fields added to this list cannot be used as a row or column when creating a Crosstab.

If creating a timeline Index, specify the fields that will uniquely identify the required change in the data by moving them into the Selected box for the Timeline Columns option.

URL Template

The URL Template options allow a website to be linked to search results. The entered address is constructed using the entered URL along with fields specified as the Primary Key (needed for Timeline and Cumulative build methods).

Select the columns that will be used as the Primary Key by using the left and right arrows to move the fields from the Available to the Selected boxes.

Enter the required address in the URL text box in the following format:

http://<URL>/%<column name>%

<URL> is the website address and <column name> is the name of a Primary Key column.

Advanced

Enter the name of a stored procedure to execute against the data source before the Index starts building in the Pre Procedure text box, in the Post Procedure text box to execute it after the Index has finished building or in the Error Procedure text box to execute it if an error occurs. Check the Commit checkbox to commit the output of the stored procedure.

Enable the Initial Capital option to capitalise the first letter of each column heading.

To ensure unexpected changes to the source data do not impact the resulting Index, the Error on Column Changes and Error on Column Format Changes options can be enabled to force the Index build to fail if changes are detected.

Specify the Time Zone if the source data is derived from a different time zone to the server. This option can affect derived date/time number fields.

Specify the Time Out option, in minutes, with the amount of time a row retrieval process will run for before being terminated. To disable this option, enter zero.

Specify the minimum number of rows that an Index can receive before completing in the Minimum Rows text box. To disable this option, enter zero.

Specify the maximum number of rows that an Index can receive before completing in the Maximum Rows text box. To disable this option, enter zero.

Specify the maximum time, in seconds, that an Index process can run for before being terminated in the Maximum Time text box. To disable this option, enter zero. This only applies to incremental indexes.

Derived Dates

Using the date options allows the individual components of a date/time field to be separately indexed.

Use the left and right arrows to move the required fields from the Available to the Selected boxes for the following options: Day, Week, Month, Quarter, Year, Fiscal, Hour, Minute, Second, Time.

The output from these fields can also be output in a number of different formats, as described in the System Settings chapter. Enabling the Fiscal checkbox will overwrite any of these selections, however.

Index Method Details

These options will change dependant upon the Build Method set in the Details tab.

Incremental Identifier

This will appear when using an incremental or cumulative build method. You will need to specify the field that will identify if a record is new or updated. This must be a sequential value such as a date or numeric value that will increase in value over time. CXAIR will only bring in records where the incremental identifier is greater than the previously store values.

Effective Date

This will appear when using timeline indexes. It does not have to be set. By default the effective date for each current effective record is 01/01/9999. This can be overridden by using a date field in the Index.

Partition Key

Using a Partition Key will split the Index into partitions based on the unique values found in that field. Once the Index has built for the first time, you can specify which partitions are available to the end user by choosing the the relevant value in the Offline Scheme drop list in the Data Source Group. Any subsequent builds will store the full set of data but only published the specified partition for front-end reporting.

Partition Key Format

This will only appear once a Partition Key has been selected. Simply specify the format based on the data type coming into the system i.e. the expected format of the source data.