XWiki - Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.b\. Creating a Data Source Group

Navigate to the Data Source Groups screen by clicking **Search Engine**, then **Data Source Groups**. All currently loaded Data Source Groups will be displayed, and clicking **New** will allow you to add a new group.

6

7

== {{id name="Details"/}}Details ==

8

9

Enter a name for the Data Source Group in the **Name** text box.

10

11

Use the **Tags** text box to add associated search terms to the Data Source Group. This allows the components to be accessed using customised strings when using the search bar.

12

13

The **Directory** option will automatically populate with the path set specified in the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome]], suffixed with the text entered in the **Name** text box. Click the **…** icon to specify a different directory.

14

15

{{id name="Index Method"/}}The **Index Method** drop-down list specifies the build method to be used when building the Index.

16

17

The **Complete** option will result in all items in a data source being Indexed. Once complete, a new version number is applied to the Index. Each run completely refreshes the Index.

18

19

Building an **Incremental** Index enables an optimised refresh process when rebuilding Indexes to account for any changes. Rather than rebuilding the entire Index, only data that has been modified or added will be processed. When selected, the **Incremental Identifier** drop-down list is revealed. This is required to identify changes, using the selected field to detect when new data is greater than the previous highest value.

20

21

The **Timeline** option allows the building of ‘point in time’ Indexes. Using an **Effective From** date, comparisons of data between set dates can be made. A **Primary Key** is required.

22

23

Selecting **Cumulative** will only Index items that have changed in a data source. Duplicate records are removed from the result set based on the **Increment Identifier** and **Primary Key**. When cumulatively building from CSV files, only a **Primary Key** is required. For other files, an **Incremental Identifier** and a **Primary Key** are required.

24

25

Use the **Index Size** drop-down list to specify the expected number of records the resulting Index will contain. This is an indicative value and does not need to be exact. Using this drop-down list allows the system to work out how many folders to split the Index into. The higher the specified Index size, the less folders created. A lower number of folders results in less threads used to query the Index, as a single thread is allocated per folder. This decreases the speed of individual queries, but reduces the performance impact of multiple concurrent users querying the system simultaneously.

26

27

Select the Data Sources that will be used to build the group from the **Data Sources Added to Group** drop-down list. One or more Data Sources can be selected.

28

29

Enable the **Build Now** option to build the Data Source Group as soon as the creation process is complete. If disabled, the settings are saved for the build to initiate at a later time.

30

31

Once all the options have been completed, click **Create Data Source Group** to complete the process. To build the Data Source Group and navigate directly to the [[Indexes>>doc:Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.c\. Creating an Index.WebHome]] or [[Scheduling>>doc:Technical Documentation.CXAIR.User Guide.3\. Report Administration.WebHome||anchor="Scheduling"]] screen, click the up arrow next to this text and click **Create Data Source Group and View Indexes** or **Create Data Source Group and View Schedules**.

32

33

If the **Build Now** option has been enabled, the Data Source Group will now start building from the specified Data Sources.

34

35

== {{id name="Advanced"/}}Advanced ==

36

37

Specify the number of threads used to build the Data Source Group using the **Number of Parallel Writers** option. Specifying more threads will allocate a greater amount of system resources to the build process, potentially increasing performance at the expense of other system tasks.

38

39

Use the **Offline Scheme** drop-down list to specify which segments of the Index will be available for users to query. Select a constraint from the list and quantify it with a number in the subsequent text box.

40

41

To change the text analyser used when processing fields, select from the available options in the **Analyser** drop-down list. This should only be changed if encountering problems with the automatically detected output.

42

43

If multiple Data Sources are used to build the Data Source Group, enable the **Ignore Errors** option to complete the build if one of the Data Sources fail. If errors are found, it is recommended that the system logs are checked to locate and rectify the issue. Please refer to the [[Logging>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.Status Monitoring.WebHome||anchor="Logging"]] chapter for more detailed information.

44

45

To prevent changes to the Data Source Group when the resulting Index is sent to another CXAIR instance, enable the **Locked** option.

46

47

Use the **Email when Index is Updated** or **has Failed** to got notified when any of these events occur.

48

49

**Retain Data Source Order** will ensure that the data is stored in the same order as it is in the Data Source. It will also be reflected in the Query screen. For ETL Data Sources, the order will be from the Data Source of the Index used in the ETL Data Source - the Order options in the ETL stages are purely for processing and will have no impact on this setting.

50

51

**Lock Index Field Types** is used to maintain the data type of all fields in the Index. This is mainly used for Excel and CSV Data Source where there is no defined data type provided to CXAIR (it is derived by parsing through the first 100 records). This option prevents scenarios whereby a date or numeric field is empty when the data refreshes which causes CXAIR to then interpret that field as a string (the default setting). This option is not needed if you have used the force data type options in the Data Source.

52

53

== {{id name="Extra Fields"/}}Extra Fields ==

54

55

Once a Data Source Group has been built, new options become available that allow extra fields to be added.

56

57

On the Data Source Groups screen, click the **Edit** icon next to the relevant Data Source Group. Click the **>** icon next to the **Extra Fields** option to reveal the configuration settings.

58

59

Select an extra field from the drop-down list and click **Add** to reveal the relevant options. Use the **Label** option to name the field. Once the relevant options have been set, click **Save** to add the field.

60

61

The following fields are available:

62

63

=== Additional Index Fields ===

64

65

==== Calculation ====

66

67

Adding a calculated field allows new fields to be derived using calculations.

68

69

Enable the **Overwrite** option to replace any fields with the same name as the calculated field.

70

71

Click the **…** icon to open the Calculation Builder. Please refer to the[[ Calculation Builder>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.2ci\. Calculation Builder.WebHome]] chapter for more detailed information on the functions that can be used.

==== Dates ====

Date fields allow the creation of a date aggregations, using the original date fields as the source. Specify the source date fields using the **Index Fields** drop-down list and select an interval from the **Date** options below.

76

77

==== {{id name="Distance"/}}Distance ====

78

79

Using the Distance option, an extra field measuring the distance between two points can be built into the Index.

80

81

Select the fields that denote the relevant **Latitude** and **Longitude** for the two points and specify the unit of measurement from the **Radius** radio button.

82

83

==== Link Fields ====

84

85

Using linked fields allow fields to be joined from another Index based on a join key. This will perform the SQL equivalent of a LEFT OUTER JOIN.

86

87

Select the join key for the source Data Source from the **Index Field** drop-down list and select the Index that will be joined to from the **Index** drop-down list.

88

89

For the Index that is being joined to, select the join key from the **Join Index Field** drop-down list. Use the **Extra Fields** drop-down list to select any fields that will be added to the resulting Index.

90

91

==== Link Date Range Fields ====

92

93

Uses the same principle as linked fields, but only returns fields when the specified date range is matched in the source data.

94

95

Select the join key for the source Data Source from the **Index Field** drop-down list and select the date field for the current Data Source Group from the **Date Index Field** drop-down list. Specify the Index that will be joined to from the **Index** drop-down list.

96

97

For the Index that is being joined to, select the join key from the **Join Index Field** drop-down list. Specify the date range using the **Join Start Date Index Field** and **Join End Date Index Field** drop-down lists. Use the **Extra Fields** drop-down list to select any fields that will be added to the resulting Index.

98

99

==== Link Reverse Date Range Fields ====

100

101

Uses the same principle as linked fields, but only returns fields when outside of the specified date range matched in the source data. Please see the above option, **Link Date Range Fields**, for more information concerning the configuration options.

102

103

=== Remove Index Fields ===

==== Remove ====

Select the fields from the **Index Fields** drop-down list that will be removed at Data Source Group level.

=== Strings ===

==== Analyzed ====

Specify the Index Fields that will be classed as **Analyzed** when the build process is complete.

114

115

The {{id name="Analysed"/}}**Analysed** functionality has been designed to accommodate fields containing multiple words, such as a ‘Comments’ field, to allow for case insensitive searches in the [[Query>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome]] screen. When specified as **Analyzed**, each word in the field is stored as an individual entity to facilitate field-specific searches, in contrast to regular fields that are stored as a single string value. This makes it easier to search for individual words in a field and is especially useful when creating [[Word Clouds>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome||anchor="Word Cloud"]] that can be used with the [[Stop Words>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Stop Words"]] functionality to display the frequency of key words. Please note that any fields added to this list cannot be used as a row or column when creating a [[Crosstab>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.WebHome]].

==== Lower Case ====

Select the fields from the **Index Fields** drop-down list that will be converted into lower case.

120

121

==== Proper Case ====

122

123

Select the fields from the **Index Fields** drop-down list that will be converted into proper case, where the first letter of every word is capitalised.

124

125

==== Regular Expression ====

126

127

Using regular expressions (RegEx), string patterns can be matched within fields and updated, replaced or extracted accordingly.

128

129

Select the fields that the expression will be applied to from the **Index Fields** drop-down list.

130

131

The below **RegEx Mapping** section has two options: **RegEx** and **Replacement**. In the **RegEx **text box, enter the regular expression to match a pattern in the selected Index fields.

132

133

Use the **Replacement** text box to specify the string that will replace the matches found. This can be referenced using named groups and numbered groups using the following format:

**$<number>$**

The current functionality does not support named groups or start and end of line anchors.

138

139

Enable the **Stop on First Match** option to only return the first match. The **Blank on No Match** option, if enabled, returns a blank field is no matches are found. Otherwise, the entire field is returned in its original format.

140

141

Use the **Test** section to check that the expression is working correctly. This accesses the underlying data in real-time. Due to the different methods of storing text, the displayed text may be in a different format to the source file.

142

143

==== Simple Regular Expression ====

144

145

Select the fields that the expression will be applied to from the **Index Fields** drop-down list.

146

147

Specify the sub-section of text to be searched using the **Block Start** and **Block End** options. The string that is located between the specified start and end text will be searched.

148

149

To further specify the string location, use the **Starts With** text box to indicate the starting point for the search and specify any characters to omit from the search, such as punctuation, in the **Skip Text** text box.

150

151

Once a value has been entered in the **Starts With** text box, the **White Space** drop-down list will appear. Specify an option that will best allow the text to be detected over multiple lines, if required.

152

153

Use the **Data Type** drop-down list to specify the format of the matched data. Select **String** for a lazy match or **Long String** for a greedy match. A lazy match will stop as soon as the condition is satisfied, while a greedy match will stop once the condition has been satisfied as many times as possible. Selecting **Number** or **Date** will reveal the **Format** option, where the date and number formatting can be specified.

154

155

Specify where the search will terminate with the **Ends With** drop-down list. Select **User Defined** and enter the required sting in the **User Defined Ends** With text box below to further customise the search. If entering an expression rather than a text string, enable the **User Defined is RegEx** option to activate it. If the expression is not case sensitive, enable the **Case Insensitive** option.

156

157

To set the expression to locate multiple values within the same field, enable the **Multiple Values** option. Otherwise, only the first value is returned. When outputting multiple values, the **Single Line** option, when enabled, will constrain the expression to a single line and the **Truncate** option will shorten the retrieval process by using the previous match as the starting point for the next search rather than starting from the beginning of the field. This will avoid repeat results and shorten the search time.

158

159

To include the string specified in the **Starts With** text box in the output, enable the **Include Starts With** option, and to include the string specified in the **Ends With** text box in the output, enable the **Include Ends With** option.

160

161

Use the **Test** section to check that the expression is working correctly. This accesses the underlying data in real-time to provide accurate results. Due to the different methods of storing text, the displayed text may be in a different format to the source file.

==== Upper Case ====

Select the fields from the **Index Fields** drop-down list that will be converted into upper case.

=== Modelling ===

The [[Modelling>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome]] options allow the results of previously created [[Bayesian networks>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome||anchor="Bayesian Network"]] and [[decision trees>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome||anchor="Decision Tree"]] to be added at Data Source Group Level to effectively predict future outcomes.

170

171

For each option, click **Decision Tree** or **Bayesian Network** to open the Saved Reports window, where the results will be filtered to only show created models. Click the checkbox next to the relevant model and click the **Selected Reports** tab. Ensure the correct model is selected and click **Add to Data Source**.

172

173

The fields from the Data Source Group will then be displayed alongside those in the model. All of the fields from the model should match those in the Data Source Group, and will be automatically detected. Use the relevant drop-down list to manually match entries if not automatically detected.

174

175

==== Decision Tree ====

176

177

When saved, two columns are added for the classifier used when the model was built. The **Outcome** column will display the predicted result and **Percentage** column will display the predicted percentage likelihood of the outcome.

178

179

==== Predictive Analytics ====

180

181

To predict the outcome for a field, change the relevant drop-down list to **[Predict this value]**. For every field with this option specified, two columns are added. The **Outcome** column will display the predicted result and **Percentage** column will display the predicted percentage likelihood of the outcome.

182

183

==== Prescriptive Analytics ====

184

185

Select the field of interest from the **Classifier** drop-down list and choose the outcome that will be measured from the **Outcome of Interest** drop-down list.

186

187

Adjust the **Threshold Percentage** using the slider or by typing a value below. From this setting, two columns are created: **Threshold Outcomes** and **Predicted New Probabilities**.

188

189

The **Threshold Outcomes** column displays the combinations of outcomes would increase the probability of the specified **Outcome of Interest**. The **Predicted New Probabilities** column displays the predicted probability of the **Outcome of Interest** from the calculated combination of outcomes.

190

191

Select a value from the **Search Depth** drop-down list to restrict the number of possible combinations searched before an outcome is reached. While an increasing this value may provide more accurate results, the processing time and system load will increase exponentially.

192

193

To ensure the process runs as efficiently as possible, select values from the **Ignore Outcomes Already Above Threshold** to manually exclude them from the analysis. Reducing the number of fields will decrease system load and the amount of time required to generate a result.

=== Obfuscation ===

Using the [[Obfuscation>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome||anchor="Obfuscation"]] options allow selected fields to be obscured, preventing individual records from being identifiable.

198

199

Please refer to the [[Obfuscation>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome||anchor="Obfuscation"]] section of the [[Database Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]] chapter for more information regarding the available options.

=== Third-Party ===

==== Base64 Encoded Word Document ====

204

205

Select the fields that will be encoded to base64 strings.

206

207

==== JD Edwards Date CYYDDD ====

208

209

Select the date fields that will be converted to the J.D. Edwards format (Century, Year, Day of Year).

210

211

==== JD Edwards Date CYYMMDD ====

212

213

Select the date fields that will be converted to the J.D. Edwards format (Century, Year, Month, Day of Month).

214

215

==== Modulus 11 Check ====

216

217

Calculates whether a numeric field passes the Modulus 11 check and outputs a True or False flag. For example, '399038' will result in a 'False' flag, while '399027' will result in a 'True' flag.

==== Soundex ====

Outputs the Soundex code for the selected fields. For example, 'Washington' is coded 'W-252'.

version	line-number	content
1.1	1	{{box cssClass="floatinginfobox" title="Contents"}}
	2	{{toc/}}
	3	{{/box}}
	4
	5	Navigate to the Data Source Groups screen by clicking Search Engine, then Data Source Groups. All currently loaded Data Source Groups will be displayed, and clicking New will allow you to add a new group.
	6
	7	== {{id name="Details"/}}Details ==
	8
	9	Enter a name for the Data Source Group in the Name text box.
	10
	11	Use the Tags text box to add associated search terms to the Data Source Group. This allows the components to be accessed using customised strings when using the search bar.
	12
	13	The Directory option will automatically populate with the path set specified in the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome]], suffixed with the text entered in the Name text box. Click the … icon to specify a different directory.
	14
	15	{{id name="Index Method"/}}The Index Method drop-down list specifies the build method to be used when building the Index.
	16
	17	The Complete option will result in all items in a data source being Indexed. Once complete, a new version number is applied to the Index. Each run completely refreshes the Index.
	18
	19	Building an Incremental Index enables an optimised refresh process when rebuilding Indexes to account for any changes. Rather than rebuilding the entire Index, only data that has been modified or added will be processed. When selected, the Incremental Identifier drop-down list is revealed. This is required to identify changes, using the selected field to detect when new data is greater than the previous highest value.
	20
	21	The Timeline option allows the building of ‘point in time’ Indexes. Using an Effective From date, comparisons of data between set dates can be made. A Primary Key is required.
	22
	23	Selecting Cumulative will only Index items that have changed in a data source. Duplicate records are removed from the result set based on the Increment Identifier and Primary Key. When cumulatively building from CSV files, only a Primary Key is required. For other files, an Incremental Identifier and a Primary Key are required.
	24
	25	Use the Index Size drop-down list to specify the expected number of records the resulting Index will contain. This is an indicative value and does not need to be exact. Using this drop-down list allows the system to work out how many folders to split the Index into. The higher the specified Index size, the less folders created. A lower number of folders results in less threads used to query the Index, as a single thread is allocated per folder. This decreases the speed of individual queries, but reduces the performance impact of multiple concurrent users querying the system simultaneously.
	26
	27	Select the Data Sources that will be used to build the group from the Data Sources Added to Group drop-down list. One or more Data Sources can be selected.
	28
	29	Enable the Build Now option to build the Data Source Group as soon as the creation process is complete. If disabled, the settings are saved for the build to initiate at a later time.
	30
	31	Once all the options have been completed, click Create Data Source Group to complete the process. To build the Data Source Group and navigate directly to the [[Indexes>>doc:Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.c\. Creating an Index.WebHome]] or [[Scheduling>>doc:Technical Documentation.CXAIR.User Guide.3\. Report Administration.WebHome\|\|anchor="Scheduling"]] screen, click the up arrow next to this text and click Create Data Source Group and View Indexes or Create Data Source Group and View Schedules.
	32
	33	If the Build Now option has been enabled, the Data Source Group will now start building from the specified Data Sources.
	34
	35	== {{id name="Advanced"/}}Advanced ==
	36
	37	Specify the number of threads used to build the Data Source Group using the Number of Parallel Writers option. Specifying more threads will allocate a greater amount of system resources to the build process, potentially increasing performance at the expense of other system tasks.
	38
	39	Use the Offline Scheme drop-down list to specify which segments of the Index will be available for users to query. Select a constraint from the list and quantify it with a number in the subsequent text box.
	40
	41	To change the text analyser used when processing fields, select from the available options in the Analyser drop-down list. This should only be changed if encountering problems with the automatically detected output.
	42
	43	If multiple Data Sources are used to build the Data Source Group, enable the Ignore Errors option to complete the build if one of the Data Sources fail. If errors are found, it is recommended that the system logs are checked to locate and rectify the issue. Please refer to the [[Logging>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.Status Monitoring.WebHome\|\|anchor="Logging"]] chapter for more detailed information.
	44
	45	To prevent changes to the Data Source Group when the resulting Index is sent to another CXAIR instance, enable the Locked option.
	46
	47	Use the Email when Index is Updated or has Failed to got notified when any of these events occur.
	48
	49	Retain Data Source Order will ensure that the data is stored in the same order as it is in the Data Source. It will also be reflected in the Query screen. For ETL Data Sources, the order will be from the Data Source of the Index used in the ETL Data Source - the Order options in the ETL stages are purely for processing and will have no impact on this setting.
	50
	51	Lock Index Field Types is used to maintain the data type of all fields in the Index. This is mainly used for Excel and CSV Data Source where there is no defined data type provided to CXAIR (it is derived by parsing through the first 100 records). This option prevents scenarios whereby a date or numeric field is empty when the data refreshes which causes CXAIR to then interpret that field as a string (the default setting). This option is not needed if you have used the force data type options in the Data Source.
	52
	53	== {{id name="Extra Fields"/}}Extra Fields ==
	54
	55	Once a Data Source Group has been built, new options become available that allow extra fields to be added.
	56
	57	On the Data Source Groups screen, click the Edit icon next to the relevant Data Source Group. Click the > icon next to the Extra Fields option to reveal the configuration settings.
	58
	59	Select an extra field from the drop-down list and click Add to reveal the relevant options. Use the Label option to name the field. Once the relevant options have been set, click Save to add the field.
	60
	61	The following fields are available:
	62
	63	=== Additional Index Fields ===
	64
	65	==== Calculation ====
	66
	67	Adding a calculated field allows new fields to be derived using calculations.
	68
	69	Enable the Overwrite option to replace any fields with the same name as the calculated field.
	70
	71	Click the … icon to open the Calculation Builder. Please refer to the[[ Calculation Builder>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.2ci\. Calculation Builder.WebHome]] chapter for more detailed information on the functions that can be used.
	72
	73	==== Dates ====
	74
	75	Date fields allow the creation of a date aggregations, using the original date fields as the source. Specify the source date fields using the Index Fields drop-down list and select an interval from the Date options below.
	76
	77	==== {{id name="Distance"/}}Distance ====
	78
	79	Using the Distance option, an extra field measuring the distance between two points can be built into the Index.
	80
	81	Select the fields that denote the relevant Latitude and Longitude for the two points and specify the unit of measurement from the Radius radio button.
	82
	83	==== Link Fields ====
	84
	85	Using linked fields allow fields to be joined from another Index based on a join key. This will perform the SQL equivalent of a LEFT OUTER JOIN.
	86
	87	Select the join key for the source Data Source from the Index Field drop-down list and select the Index that will be joined to from the Index drop-down list.
	88
	89	For the Index that is being joined to, select the join key from the Join Index Field drop-down list. Use the Extra Fields drop-down list to select any fields that will be added to the resulting Index.
	90
	91	==== Link Date Range Fields ====
	92
	93	Uses the same principle as linked fields, but only returns fields when the specified date range is matched in the source data.
	94
	95	Select the join key for the source Data Source from the Index Field drop-down list and select the date field for the current Data Source Group from the Date Index Field drop-down list. Specify the Index that will be joined to from the Index drop-down list.
	96
	97	For the Index that is being joined to, select the join key from the Join Index Field drop-down list. Specify the date range using the Join Start Date Index Field and Join End Date Index Field drop-down lists. Use the Extra Fields drop-down list to select any fields that will be added to the resulting Index.
	98
	99	==== Link Reverse Date Range Fields ====
	100
	101	Uses the same principle as linked fields, but only returns fields when outside of the specified date range matched in the source data. Please see the above option, Link Date Range Fields, for more information concerning the configuration options.
	102
	103	=== Remove Index Fields ===
	104
	105	==== Remove ====
	106
	107	Select the fields from the Index Fields drop-down list that will be removed at Data Source Group level.
	108
	109	=== Strings ===
	110
	111	==== Analyzed ====
	112
	113	Specify the Index Fields that will be classed as Analyzed when the build process is complete.
	114
	115	The {{id name="Analysed"/}}Analysed functionality has been designed to accommodate fields containing multiple words, such as a ‘Comments’ field, to allow for case insensitive searches in the [[Query>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome]] screen. When specified as Analyzed, each word in the field is stored as an individual entity to facilitate field-specific searches, in contrast to regular fields that are stored as a single string value. This makes it easier to search for individual words in a field and is especially useful when creating [[Word Clouds>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome\|\|anchor="Word Cloud"]] that can be used with the [[Stop Words>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome\|\|anchor="Stop Words"]] functionality to display the frequency of key words. Please note that any fields added to this list cannot be used as a row or column when creating a [[Crosstab>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.WebHome]].
	116
	117	==== Lower Case ====
	118
	119	Select the fields from the Index Fields drop-down list that will be converted into lower case.
	120
	121	==== Proper Case ====
	122
	123	Select the fields from the Index Fields drop-down list that will be converted into proper case, where the first letter of every word is capitalised.
	124
	125	==== Regular Expression ====
	126
	127	Using regular expressions (RegEx), string patterns can be matched within fields and updated, replaced or extracted accordingly.
	128
	129	Select the fields that the expression will be applied to from the Index Fields drop-down list.
	130
	131	The below RegEx Mapping section has two options: RegEx and Replacement. In the RegEx text box, enter the regular expression to match a pattern in the selected Index fields.
	132
	133	Use the Replacement text box to specify the string that will replace the matches found. This can be referenced using named groups and numbered groups using the following format:
	134
	135	$<number>$
	136
	137	The current functionality does not support named groups or start and end of line anchors.
	138
	139	Enable the Stop on First Match option to only return the first match. The Blank on No Match option, if enabled, returns a blank field is no matches are found. Otherwise, the entire field is returned in its original format.
	140
	141	Use the Test section to check that the expression is working correctly. This accesses the underlying data in real-time. Due to the different methods of storing text, the displayed text may be in a different format to the source file.
	142
	143	==== Simple Regular Expression ====
	144
	145	Select the fields that the expression will be applied to from the Index Fields drop-down list.
	146
	147	Specify the sub-section of text to be searched using the Block Start and Block End options. The string that is located between the specified start and end text will be searched.
	148
	149	To further specify the string location, use the Starts With text box to indicate the starting point for the search and specify any characters to omit from the search, such as punctuation, in the Skip Text text box.
	150
	151	Once a value has been entered in the Starts With text box, the White Space drop-down list will appear. Specify an option that will best allow the text to be detected over multiple lines, if required.
	152
	153	Use the Data Type drop-down list to specify the format of the matched data. Select String for a lazy match or Long String for a greedy match. A lazy match will stop as soon as the condition is satisfied, while a greedy match will stop once the condition has been satisfied as many times as possible. Selecting Number or Date will reveal the Format option, where the date and number formatting can be specified.
	154
	155	Specify where the search will terminate with the Ends With drop-down list. Select User Defined and enter the required sting in the User Defined Ends With text box below to further customise the search. If entering an expression rather than a text string, enable the User Defined is RegEx option to activate it. If the expression is not case sensitive, enable the Case Insensitive option.
	156
	157	To set the expression to locate multiple values within the same field, enable the Multiple Values option. Otherwise, only the first value is returned. When outputting multiple values, the Single Line option, when enabled, will constrain the expression to a single line and the Truncate option will shorten the retrieval process by using the previous match as the starting point for the next search rather than starting from the beginning of the field. This will avoid repeat results and shorten the search time.
	158
	159	To include the string specified in the Starts With text box in the output, enable the Include Starts With option, and to include the string specified in the Ends With text box in the output, enable the Include Ends With option.
	160
	161	Use the Test section to check that the expression is working correctly. This accesses the underlying data in real-time to provide accurate results. Due to the different methods of storing text, the displayed text may be in a different format to the source file.
	162
	163	==== Upper Case ====
	164
	165	Select the fields from the Index Fields drop-down list that will be converted into upper case.
	166
	167	=== Modelling ===
	168
	169	The [[Modelling>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome]] options allow the results of previously created [[Bayesian networks>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome\|\|anchor="Bayesian Network"]] and [[decision trees>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2i\. Modelling.WebHome\|\|anchor="Decision Tree"]] to be added at Data Source Group Level to effectively predict future outcomes.
	170
	171	For each option, click Decision Tree or Bayesian Network to open the Saved Reports window, where the results will be filtered to only show created models. Click the checkbox next to the relevant model and click the Selected Reports tab. Ensure the correct model is selected and click Add to Data Source.
	172
	173	The fields from the Data Source Group will then be displayed alongside those in the model. All of the fields from the model should match those in the Data Source Group, and will be automatically detected. Use the relevant drop-down list to manually match entries if not automatically detected.
	174
	175	==== Decision Tree ====
	176
	177	When saved, two columns are added for the classifier used when the model was built. The Outcome column will display the predicted result and Percentage column will display the predicted percentage likelihood of the outcome.
	178
	179	==== Predictive Analytics ====
	180
	181	To predict the outcome for a field, change the relevant drop-down list to [Predict this value]. For every field with this option specified, two columns are added. The Outcome column will display the predicted result and Percentage column will display the predicted percentage likelihood of the outcome.
	182
	183	==== Prescriptive Analytics ====
	184
	185	Select the field of interest from the Classifier drop-down list and choose the outcome that will be measured from the Outcome of Interest drop-down list.
	186
	187	Adjust the Threshold Percentage using the slider or by typing a value below. From this setting, two columns are created: Threshold Outcomes and Predicted New Probabilities.
	188
	189	The Threshold Outcomes column displays the combinations of outcomes would increase the probability of the specified Outcome of Interest. The Predicted New Probabilities column displays the predicted probability of the Outcome of Interest from the calculated combination of outcomes.
	190
	191	Select a value from the Search Depth drop-down list to restrict the number of possible combinations searched before an outcome is reached. While an increasing this value may provide more accurate results, the processing time and system load will increase exponentially.
	192
	193	To ensure the process runs as efficiently as possible, select values from the Ignore Outcomes Already Above Threshold to manually exclude them from the analysis. Reducing the number of fields will decrease system load and the amount of time required to generate a result.
	194
	195	=== Obfuscation ===
	196
	197	Using the [[Obfuscation>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome\|\|anchor="Obfuscation"]] options allow selected fields to be obscured, preventing individual records from being identifiable.
	198
	199	Please refer to the [[Obfuscation>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome\|\|anchor="Obfuscation"]] section of the [[Database Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]] chapter for more information regarding the available options.
	200
	201	=== Third-Party ===
	202
	203	==== Base64 Encoded Word Document ====
	204
	205	Select the fields that will be encoded to base64 strings.
	206
	207	==== JD Edwards Date CYYDDD ====
	208
	209	Select the date fields that will be converted to the J.D. Edwards format (Century, Year, Day of Year).
	210
	211	==== JD Edwards Date CYYMMDD ====
	212
	213	Select the date fields that will be converted to the J.D. Edwards format (Century, Year, Month, Day of Month).
	214
	215	==== Modulus 11 Check ====
	216
	217	Calculates whether a numeric field passes the Modulus 11 check and outputs a True or False flag. For example, '399038' will result in a 'False' flag, while '399027' will result in a 'True' flag.
	218
	219	==== Soundex ====
	220
	221	Outputs the Soundex code for the selected fields. For example, 'Washington' is coded 'W-252'.

Wiki source code of 04b. Creating a Data Source Group