Last modified by Ross Beck on 06/11/2025, 14:23

Hide last authors
Ross Beck 1.1 1 {{box cssClass="floatinginfobox" title="**Contents**"}}
2 {{toc/}}
3 {{/box}}
4
5 Navigate to the Data Source Groups screen by clicking **Search Engine**, then **Data Source Groups**. All currently loaded Data Source Groups will be displayed, and clicking **New** will allow you to add a new group.
6
7 == {{id name="Details"/}}Details ==
8
9 Enter a name for the Data Source Group in the **Name** text box.
10
11 Use the **Tags** text box to add associated search terms to the Data Source Group. This allows the components to be accessed using customised strings when using the search bar.
12
13 The **Directory** option will automatically populate with the path set specified in the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome]], suffixed with the text entered in the **Name** text box. Click the **…** icon to specify a different directory.
14
15 {{id name="Index Method"/}}The **Index Method** drop-down list specifies the build method to be used when building the Index.
16
17 The **Complete** option will result in all items in a data source being Indexed. Once complete, a new version number is applied to the Index. Each run completely refreshes the Index.
18
19 Building an **Incremental** Index enables an optimised refresh process when rebuilding Indexes to account for any changes. Rather than rebuilding the entire Index, only data that has been modified or added will be processed. When selected, the **Incremental Identifier** drop-down list is revealed. This is required to identify changes, using the selected field to detect when new data is greater than the previous highest value.
20
21 The **Timeline** option allows the building of ‘point in time’ Indexes. Using an **Effective From** date, comparisons of data between set dates can be made. A **Primary Key** is required.
22
23 Selecting **Cumulative** will only Index items that have changed in a data source. Duplicate records are removed from the result set based on the **Increment Identifier** and **Primary Key**. When cumulatively building from CSV files, only a **Primary Key** is required. For other files, an **Incremental Identifier** and a **Primary Key** are required.
24
25 Use the **Index Size** drop-down list to specify the expected number of records the resulting Index will contain. This is an indicative value and does not need to be exact. Using this drop-down list allows the system to work out how many folders to split the Index into. The higher the specified Index size, the less folders created. A lower number of folders results in less threads used to query the Index, as a single thread is allocated per folder. This decreases the speed of individual queries, but reduces the performance impact of multiple concurrent users querying the system simultaneously.
26
27 Select the Data Sources that will be used to build the group from the **Data Sources Added to Group** drop-down list. One or more Data Sources can be selected.
28
29 Enable the **Build Now** option to build the Data Source Group as soon as the creation process is complete. If disabled, the settings are saved for the build to initiate at a later time.
30
31 Once all the options have been completed, click **Create Data Source Group** to complete the process. To build the Data Source Group and navigate directly to the [[Indexes>>doc:Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.c\. Creating an Index.WebHome]] or [[Scheduling>>doc:Technical Documentation.CXAIR.User Guide.3\. Report Administration.WebHome||anchor="Scheduling"]] screen, click the up arrow next to this text and click **Create Data Source Group and View Indexes** or **Create Data Source Group and View Schedules**.
32
33 If the **Build Now** option has been enabled, the Data Source Group will now start building from the specified Data Sources.
34
35 == {{id name="Advanced"/}}Advanced ==
36
37 Specify the number of threads used to build the Data Source Group using the **Number of Parallel Writers** option. Specifying more threads will allocate a greater amount of system resources to the build process, potentially increasing performance at the expense of other system tasks.
38
39 Use the **Offline Scheme** drop-down list to specify which segments of the Index will be available for users to query. Select a constraint from the list and quantify it with a number in the subsequent text box.
40
41 To change the text analyser used when processing fields, select from the available options in the **Analyser** drop-down list. This should only be changed if encountering problems with the automatically detected output.
42
43 If multiple Data Sources are used to build the Data Source Group, enable the **Ignore Errors** option to complete the build if one of the Data Sources fail. If errors are found, it is recommended that the system logs are checked to locate and rectify the issue. Please refer to the [[Logging>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.Status Monitoring.WebHome||anchor="Logging"]] chapter for more detailed information.
44
45 To prevent changes to the Data Source Group when the resulting Index is sent to another CXAIR instance, enable the **Locked** option.
46
47 Use the **Email when Index is Updated** or **has Failed** to got notified when any of these events occur.
48
49 **Retain Data Source Order** will ensure that the data is stored in the same order as it is in the Data Source. It will also be reflected in the Query screen. For ETL Data Sources, the order will be from the Data Source of the Index used in the ETL Data Source - the Order options in the ETL stages are purely for processing and will have no impact on this setting.
50
51 **Lock Index Field Types** is used to maintain the data type of all fields in the Index. This is mainly used for Excel and CSV Data Source where there is no defined data type provided to CXAIR (it is derived by parsing through the first 100 records). This option prevents scenarios whereby a date or numeric field is empty when the data refreshes which causes CXAIR to then interpret that field as a string (the default setting). This option is not needed if you have used the force data type options in the Data Source.
52
53 == {{id name="Extra Fields"/}}Extra Fields ==
54
55 Once a Data Source Group has been built, new options become available that allow extra fields to be added.
56
57 On the Data Source Groups screen, click the **Edit** icon next to the relevant Data Source Group. Click the **>** icon next to the **Extra Fields** option to reveal the configuration settings.
58
59 Select an extra field from the drop-down list and click **Add** to reveal the relevant options. Use the **Label** option to name the field. Once the relevant options have been set, click **Save** to add the field.
60
61 The following fields are available:
62
63 === Additional Index Fields ===
64
65 ==== Calculation ====
66
67 Adding a calculated field allows new fields to be derived using calculations.
68
69 Enable the **Overwrite** option to replace any fields with the same name as the calculated field.
70
71 Click the **…** icon to open the Calculation Builder. Please refer to the[[ Calculation Builder>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.2ci\. Calculation Builder.WebHome]] chapter for more detailed information on the functions that can be used.
72
73 ==== Dates ====
74
75 Date fields allow the creation of a date aggregations, using the original date fields as the source. Specify the source date fields using the **Index Fields** drop-down list and select an interval from the **Date** options below.
76
77 ==== {{id name="Distance"/}}Distance ====
78
79 Using the Distance option, an extra field measuring the distance between two points can be built into the Index.
80
81 Select the fields that denote the relevant **Latitude** and **Longitude** for the two points and specify the unit of measurement from the **Radius** radio button.
82
83 ==== Link Fields ====
84
85 Using linked fields allow fields to be joined from another Index based on a join key. This will perform the SQL equivalent of a LEFT OUTER JOIN.
86
87 Select the join key for the source Data Source from the **Index Field** drop-down list and select the Index that will be joined to from the **Index** drop-down list.
88
89 For the Index that is being joined to, select the join key from the **Join Index Field** drop-down list. Use the **Extra Fields** drop-down list to select any fields that will be added to the resulting Index.
90
91 ==== Link Date Range Fields ====
92
93 Uses the same principle as linked fields, but only returns fields when the specified date range is matched in the source data.
94
95 Select the join key for the source Data Source from the **Index Field** drop-down list and select the date field for the current Data Source Group from the **Date Index Field** drop-down list. Specify the Index that will be joined to from the **Index** drop-down list.
96
97 For the Index that is being joined to, select the join key from the **Join Index Field** drop-down list. Specify the date range using the **Join Start Date Index Field** and **Join End Date Index Field** drop-down lists. Use the **Extra Fields** drop-down list to select any fields that will be added to the resulting Index.
98
99 ==== Link Reverse Date Range Fields ====
100
101 Uses the same principle as linked fields, but only returns fields when outside of the specified date range matched in the source data. Please see the above option, **Link Date Range Fields**, for more information concerning the configuration options.
102
103 === Remove Index Fields ===
104
105 ==== Remove ====
106
107 Select the fields from the **Index Fields** drop-down list that will be removed at Data Source Group level.
108
109 === Strings ===
110
111 ==== Analyzed ====
112
113 Specify the Index Fields that will be classed as **Analyzed** when the build process is complete.
114
115 The {{id name="Analysed"/}}**Analysed** functionality has been designed to accommodate fields containing multiple words, such as a ‘Comments’ field, to allow for case insensitive searches in the [[Query>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome]] screen. When specified as **Analyzed**, each word in the field is stored as an individual entity to facilitate field-specific searches, in contrast to regular fields that are stored as a single string value. This makes it easier to search for individual words in a field and is especially useful when creating [[Word Clouds>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome||anchor="Word Cloud"]] that can be used with the [[Stop Words>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Stop Words"]] functionality to display the frequency of key words. Please note that any fields added to this list cannot be used as a row or column when creating a [[Crosstab>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.WebHome]].
116
117 ==== Lower Case ====
118
119 Select the fields from the **Index Fields** drop-down list that will be converted into lower case.
120
121 ==== Proper Case ====
122
123 Select the fields from the **Index Fields** drop-down list that will be converted into proper case, where the first letter of every word is capitalised.
124
125 ==== Regular Expression ====
126
127 Using regular expressions (RegEx), string patterns can be matched within fields and updated, replaced or extracted accordingly.
128
129 Select the fields that the expression will be applied to from the **Index Fields** drop-down list.
130
131 The below **RegEx Mapping** section has two options: **RegEx** and **Replacement**. In the **RegEx **text box, enter the regular expression to match a pattern in the selected Index fields.
132
133 Use the **Replacement** text box to specify the string that will replace the matches found. This can be referenced using named groups and numbered groups using the following format:
134
135 **$<number>$**
136
137 The current functionality does not support named groups or start and end of line anchors.
138
139 Enable the **Stop on First Match** option to only return the first match. The **Blank on No Match** option, if enabled, returns a blank field is no matches are found. Otherwise, the entire field is returned in its original format.
140
141 Use the **Test** section to check that the expression is working correctly. This accesses the underlying data in real-time. Due to the different methods of storing text, the displayed text may be in a different format to the source file.
142
143 ==== Simple Regular Expression ====
144
145 Select the fields that the expression will be applied to from the **Index Fields** drop-down list.
146
147 Specify the sub-section of text to be searched using the **Block Start** and **Block End** options. The string that is located between the specified start and end text will be searched.
148
149 To further specify the string location, use the **Starts With** text box to indicate the starting point for the search and specify any characters to omit from the search, such as punctuation, in the **Skip Text** text box.
150
151 Once a value has been entered in the **Starts With** text box, the **White Space** drop-down list will appear. Specify an option that will best allow the text to be detected over multiple lines, if required.
152
153 Use the **Data Type** drop-down list to specify the format of the matched data. Select **String** for a lazy match or **Long String** for a greedy match. A lazy match will stop as soon as the condition is satisfied, while a greedy match will stop once the condition has been satisfied as many times as possible. Selecting **Number** or **Date** will reveal the **Format** option, where the date and number formatting can be specified.
154
155 Specify where the search will terminate with the **Ends With** drop-down list. Select **User Defined** and enter the required sting in the **User Defined Ends** With text box below to further customise the search. If entering an expression rather than a text string, enable the **User Defined is RegEx** option to activate it. If the expression is not case sensitive, enable the **Case Insensitive** option.
156
157 To set the expression to locate multiple values within the same field, enable the **Multiple Values** option. Otherwise, only the first value is returned. When outputting multiple values, the **Single Line** option, when enabled, will constrain the expression to a single line and the **Truncate** option will shorten the retrieval process by using the previous match as the starting point for the next search rather than starting from the beginning of the field. This will avoid repeat results and shorten the search time.
158
159 To include the string specified in the **Starts With** text box in the output, enable the **Include Starts With** option, and to include the string specified in the **Ends With** text box in the output, enable the **Include Ends With** option.
160
161 Use the **Test** section to check that the expression is working correctly. This accesses the underlying data in real-time to provide accurate results. Due to the different methods of storing text, the displayed text may be in a different format to the source file.
162
163 ==== Upper Case ====
164
165 Select the fields from the **Index Fields** drop-down list that will be converted into upper case.
166
167 === Modelling ===
168
169 The [[Modelling>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome]] options allow the results of previously created [[Bayesian networks>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome||anchor="Bayesian Network"]] and [[decision trees>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome||anchor="Decision Tree"]] to be added at Data Source Group Level to effectively predict future outcomes.
170
171 For each option, click **Decision Tree** or **Bayesian Network** to open the Saved Reports window, where the results will be filtered to only show created models. Click the checkbox next to the relevant model and click the **Selected Reports** tab. Ensure the correct model is selected and click **Add to Data Source**.
172
173 The fields from the Data Source Group will then be displayed alongside those in the model. All of the fields from the model should match those in the Data Source Group, and will be automatically detected. Use the relevant drop-down list to manually match entries if not automatically detected.
174
175 ==== Decision Tree ====
176
177 When saved, two columns are added for the classifier used when the model was built. The **Outcome** column will display the predicted result and **Percentage** column will display the predicted percentage likelihood of the outcome.
178
179 ==== Predictive Analytics ====
180
181 To predict the outcome for a field, change the relevant drop-down list to **[Predict this value]**. For every field with this option specified, two columns are added. The **Outcome** column will display the predicted result and **Percentage** column will display the predicted percentage likelihood of the outcome.
182
183 ==== Prescriptive Analytics ====
184
185 Select the field of interest from the **Classifier** drop-down list and choose the outcome that will be measured from the **Outcome of Interest** drop-down list.
186
187 Adjust the **Threshold Percentage** using the slider or by typing a value below. From this setting, two columns are created: **Threshold Outcomes** and **Predicted New Probabilities**.
188
189 The **Threshold Outcomes** column displays the combinations of outcomes would increase the probability of the specified **Outcome of Interest**. The **Predicted New Probabilities** column displays the predicted probability of the **Outcome of Interest** from the calculated combination of outcomes.
190
191 Select a value from the **Search Depth** drop-down list to restrict the number of possible combinations searched before an outcome is reached. While an increasing this value may provide more accurate results, the processing time and system load will increase exponentially.
192
193 To ensure the process runs as efficiently as possible, select values from the **Ignore Outcomes Already Above Threshold** to manually exclude them from the analysis. Reducing the number of fields will decrease system load and the amount of time required to generate a result.
194
195 === Obfuscation ===
196
197 Using the [[Obfuscation>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome||anchor="Obfuscation"]] options allow selected fields to be obscured, preventing individual records from being identifiable.
198
199 Please refer to the [[Obfuscation>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome||anchor="Obfuscation"]] section of the [[Database Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]] chapter for more information regarding the available options.
200
201 === Third-Party ===
202
203 ==== Base64 Encoded Word Document ====
204
205 Select the fields that will be encoded to base64 strings.
206
207 ==== JD Edwards Date CYYDDD ====
208
209 Select the date fields that will be converted to the J.D. Edwards format (Century, Year, Day of Year).
210
211 ==== JD Edwards Date CYYMMDD ====
212
213 Select the date fields that will be converted to the J.D. Edwards format (Century, Year, Month, Day of Month).
214
215 ==== Modulus 11 Check ====
216
217 Calculates whether a numeric field passes the Modulus 11 check and outputs a True or False flag. For example, '399038' will result in a 'False' flag, while '399027' will result in a 'True' flag.
218
219 ==== Soundex ====
220
221 Outputs the Soundex code for the selected fields. For example, 'Washington' is coded 'W-252'.