XWiki - Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.a\. Creating a Data Source

Navigate to the Data Sources screen by clicking **Search Engine**, then **Data Sources**. All currently loaded Data Sources will be displayed, and clicking **New** will allow the configuration of a new source.

6

7

Enter the desired name in the **Name** textbox. This name must be unique.

8

9

Use the **Tags** textbox to add associated search terms to the Data Source. This allows the components to be accessed using customised strings when using the search bar.

10

11

Once all the options have been completed, click **Create Data Source** to complete the process. To build the data source and navigate directly to the [[Data Source Groups>>doc:Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.b\. Creating a Data Source Group.WebHome]] screen, click the up arrow next to this text and click **Create Data Source and View Data Source Groups**.

12

13

= Data Source Options =

14

15

The data source options are found in the **Type** drop-down list, with the subsequent options updating once a source is selected.

== Data ==

=== {{id name="Crosstab"/}}Crosstab ===

20

21

Allows a saved Crosstab to be stored as an Index.

22

23

This data source is initially built with no data attached, with the specified Crosstab loaded after the initial build process.

=== Database ===

Enables users to build a data source from a database. This can be used to capture live data and can then be scheduled to build on a regular basis to ensure the data used for reporting is up-to-date.

28

29

Please refer to the [[Database Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]] chapter for more detailed information concerning the available configuration options.

=== Mongo DB ===

This source allows users to link to a Mongo DB database.

=== Simple XML ===

Uses an uploaded Simple XML file as a data source.

38

39

Click the **…** icon next to the **XML File** box to load the relevant file.

40

41

=== SPSS Export File ===

42

43

Enables the use of an IBM SPSS Statistics export file as a data source.

44

45

Click the **…** icon next to the **File** box to load the relevant file.

=== XML File ===

Uses an uploaded XML file as a data source.

50

51

Click the **…** icon next to the **XML File** and **XSD File** boxes to load the relevant files.

== Data Warehouse ==

=== Amalgamated ===

Allows users to create a single data source from various separate data sources that are joined based on join keys. The process works in a similar way to joining tables in SQL where data is joined based on shared data characteristics and results in fast access to data that may have previously existed across multiple sources.

58

59

Please refer to the [[Amalgamated Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.b\. Amalgamated Data Source Wizard.WebHome]] chapter for more detailed information concerning the available configuration options.

60

61

=== Calendar Assistant ===

62

63

Adds rows to the provided Index created from intersections in the specified date ranges.

64

65

Select a source value from the **Index** drop-down list to populate the **Start Date** and **End Date** drop-down lists with the required date fields. The **Use Today If End Date Is Null **option, when checked, will replace the **End Date** with the current system date if the **End Date** is NULL.

66

67

Enable the **Build Daily Calendar** option if an automatic daily refresh the data source is required, or the **Months** option to instead add values on a monthly basis.

68

69

=== Copy Data Source Group ===

70

71

This option gives users the ability to copy an existing Data Source Group and build the relevant Data source and Index via a single process rather than in each individual screen.

72

73

Use the **Data Source Group** drop-down list to select the Data Source Group be copied, and select the fields that will be present in the copy from the **Index Fields** drop-down list.

74

75

To restrict the number of rows copied, enter a value in the **Number To Index** textbox. Leaving this option blank will result in all rows being copied.

=== Data Mart ===

Allows users to extract a portion of an existing Index and incorporate the data into a standalone data source.

80

81

By saving a query within an Index that restricts the data at the required level, this query can then be selected to create an independent data source with a view to increasing performance and removing any irrelevant entries for the end user.

82

83

A Data Mart cannot be built from an unstructured Index.

84

85

Click the **Select Query** button to open the Load screen, where a previously saved Query can be selected, the results of which will drive the Index records.

=== Delete ===

Allows the deletion of records from an Index using a saved query.

90

91

By saving a query within an Index that contains all of the records to be deleted, this query can then be ran as a Delete data source to remove the contained records.

92

93

=== Editable Data ===

94

95

This allows users to create an Index where values can be manually overridden and new rows of data can be added. When a change is made, the field is highlighted green and is only applied after the next Index build.

96

97

Please refer to the [[Editable Data>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome||anchor="Editable Data"]] section of the Data Warehouse chapter for more information.

=== ETL Package ===

This source provides the ability to loop through data to locate entries with specific characteristics. Additional columns are added at Index level against each affected row based on the keys set.

102

103

Multiple stages can be configured to add values based on previously created filter values, allowing the inclusion of numerous additional fields to effectively loop through large data sets to locate specific entries.

104

105

Please refer to the [[ETL Package>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome||anchor="ETL"]] section of the [[Data Warehouse>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome]] chapter for more information.

=== Ranking ===

The Ranking Data Source will result in a Ranking Index, an extension of the aggregation functionality. This Index allows the creation of automatically ranked groups of information by Day, Week, Month or Year based on a series of calculations performed on specific fields. This results in the discovery of the top and bottom ‘n’ values and associated percentages.

110

111

Enter the name that will be given to the associated Data Source, Data Source Group and Index in the **Name** textbox. This name must be unique.

112

113

Specify the source Index using the **Index** dropdown list, and select a field from the **Primary Key** dropdown list to specify the field that will be used to calculate the groupings within the resulting Index.

114

115

Select the field to be fields to be aggregated within each grouping from the **Aggregations** dropdown list, and specify the type of aggregation by selecting from the **Aggregation Methods** options, where **Average**, **Count**, **Count All**, **Maximum**, **Minimum**, **Standard Deviation** and **Sum** are available.

116

117

The **Timestamp** option specifies the data field that will be used to create the different grouping fields. This must be a full date or date/time field. Use the **Months** option to specify the number of months that groupings are created from.

118

119

Use the **Frequency** options to specify the period types the groupings will adhere to. Select **Custom** to determine the frequency using the data field selected for the **Timestamp** option, or select a predefined interval from the following options: **Days**, **Months**, **Months Summary**, **Rolling**, **Weeks** and **Years**.

120

121

Specify the level of the produced ranking using the **Grouping** drop-down list. This is especially useful when a certain level of analysis is required, such as ranking store performance rather than region performance in a retail Index.

=== Reg Ex File ===

Allows users to correct errors in an input file using regular expression (Regex) to write changes to an output file.

126

127

The **RegEx Mapping** options allow the specification of the changes and the **Stop at First Match** option halts any further changes to a file once a single match has been found.

128

129

=== Reverse Transpose ===

130

131

Allows users to transpose values within rows into columns.

132

133

Use the **Index** drop-down list to specify the source of the values to be reverse transposed.

134

135

Specify what will be used to identify the rows to be transposed using the **Repeating Fields** drop-down list. From the **Distinct Fields** drop-down list, select the original column values.

136

137

Enter a column name for the values specified as **Distinct Fields** in the **Field Name** textbox, and a column name for the associated values in the **Value Name** textbox.

138

139

By default, blank fields are not included as part of the reverse transpose process. Enable the **Include Blank Fields** option to include them in the output.

=== Rules ===

A Rules data source will use pre-set rules to validate or update data.

144

145

Rules can be applied to individual fields and the output will inform the user of the number of rules that have passed or failed. Rules are categorised as either major or minor to allow the output to be split based on priority.

146

147

Please refer to the [[Rules>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.a\. Rules.WebHome||anchor="Rules"]] section of the [[Data Warehouse>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome]] for chapter more information.

148

149

=== Rules Promotion ===

150

151

The Rules Promotion Data Source allows an administrator to specify the maximum number of errors before an Index is prevented from being made available to end users. If the maximum number of minor or major errors is breached, the new version of the data will not be available in the front-end and the previous version will remain in place until the errors are corrected and the data refreshed.

152

153

Requires a Rules Index to be built against.

154

155

=== Rules Summary ===

156

157

This source displays all of the Rules data sources within the CXAIR instance. The passed and failed rules are counted across the different sources.

158

159

Please refer to the [[Rules Summary>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome||anchor="Rules Summary"]] section of the [[Data Warehouse>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome]] chapter for more information.

160

161

=== {{id name="System Calendar"/}}System Calendar ===

162

163

After setting the [[Working Days>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Working Days"]] and [[Holidays>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Holidays"]] options in the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome]], a calendar Index detailing all dates and their working day status can be created the validate and cross-compare values.

164

165

Once built, five fields are created in the Index: **Date**, **Working Day Number**, **Working Day String**, **Count Working Days Of Month** and **Count Working Days of Year**. Use these columns against a date to validate the options set in the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Working Days"]].

=== Transpose ===

Allows users to transpose values within columns into rows.

170

171

Use the **Index** drop-down list to specify the source of the values to be transposed. To restrict the number of transposed entries, specify a **Saved Query** containing the display fields to be processed.

172

173

Use the **Primary Key** dropdown list to specify what will be used to identify the rows to be transposed.

174

175

Specify the fields in the **Field 1 (Column Headers)** dropdown list that will be transposed into column headers once the Index has been created. The fields specified in the **Field 2 (Values)** dropdown list will populate these new columns based on the primary key value. To include additional fields in the new Index, select them from the **Extra Fields (Distincts)** dropdown list.

176

177

If the fields specified in the **Field 2 (Values)** dropdown list contain multiple values, enabling the **Flatten** option will split these values into separate columns.

== Meta Data ==

The following three stages result in the discovery of foreign primary keys between Indexes.

182

183

=== Meta Values Data Source Stage 1 ===

184

185

In this stage, the underlying field names and values from multiple Indexes grouped in a collection can be bought together and organised.

186

187

Specify a collection from the **Collections** drop-down list and enable the **Create Index** option to automatically create an Index from the data source.

188

189

=== Meta Values Data Source Stage 2 ===

190

191

Using the Index created in Stage 1, the values taken from the various tables can now be matched.

192

193

Specify the Index from the **Index** drop-down list and enable the **Create Index** option to automatically create and Index from the data source.

194

195

=== Meta Values Data Source Stage 3 ===

196

197

Now the table values have been matched, the Index created in Stage 2 can now be processed to discover the foreign primary keys.

198

199

Specify the Index from the **Index** drop-down list and enable the **Create Index** option to automatically create and Index from the data source.

=== Metadata ===

Enables an Index to be built from database schemas to analyse and manipulate the associated metadata.

204

205

Please refer to the [[Metadata Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.d\. Metadata Wizard.WebHome]] chapter for more detailed information concerning the available configuration options.

== Solutions ==

=== Basket Analysis Wizard ===

210

211

[[Basket Analysis>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]] is a premium licence option that groups, ranks and transforms transactional data into a format that can be easily used to analyse spending and usage patterns across large data sets.

212

213

Items relating to a single transaction or event are grouped together and stored as a ‘basket’. Multiple baskets can be analysed and compared to identify frequently recurring items.

214

215

Without [[Basket Analysis>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]] enabled, a single basket is comprised of multiple items or instances which require individual rows. Using [[Basket Analysis>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]], multiple instances appear on a single row and individual customer spending can be tracked by grouping multiple items in a single basket or transaction. This provides insight into what products are being purchased, when, in what quantity and in what combination.

216

217

Please refer to the [[Basket Analysis Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]] for more information regarding the available configuration options.

218

219

=== Benford’s Distribution ===

220

221

Applies Benford’s Law to an Index, creating extra columns to denote

222

223

Benford’s Law states that if selecting a random number from base 10 statistical data, the probability that the first digit is ‘1’ is around 0.301 rather than the 0.1 expected if all digits were equally likely.

224

225

This is especially useful in fraud detection to quickly determine if figures do not match a known frequency.

=== Drive Time ===

This source uses Google Maps to calculate the amount of time it would take to drive between two points on a map.

230

231

New fields are added to the Index which display the distance and time taken to travel between the two points. The start and end point are fully configurable and the Mode options allow the choice between driving, walking and cycling. The Avoid options allow further configuration of the route and the Units options change how the distance is displayed.

232

233

To create a Drive Time data source, the Index requires at least two plottable points on a map.

=== HL7 V2 ===

Allows users to connect to databases adhering to the HL7 V2 standard.

=== Ping ===

Using the Ping data source, an Index can be created to monitor network connectivity across a server.

242

243

Specify the length of time, in milliseconds, that the process will wait for a response before determining that it has timed out in the **Timeout** textbox. Increase this number to account for larger servers that may take longer to receive the ping request.

244

245

Increase the value entered in the **Threads** textbox to allocate more concurrent processes to the network task. Any potential improvements are dependent on the multi-core performance of the system’s processor. Please refer to the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome]] chapter for more performance tuning options.

246

247

Finally, select the device used to connect to the host network from the **Network** drop-down list.

=== Target ===

Allows the creation of a Targets Index. Once built, reports can be created to show the hierarchy of Target values before the data is joined to other Indexes. Please refer to the Targets section of the Extensions chapter for more information.

== Third-Party ==

=== Google Analytics ===

256

257

Allows users to create a data source from collected Google Analytics data.

258

259

The correct API key has to be configured using the [[Google Analytics Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.e\. Google Analytics Wizard.WebHome]] for the data to be loaded.

260

261

=== {{id name="TwitterDS"/}}Twitter ===

262

263

Before any Twitter Indexes can be created, a number of options must be configured. Please refer to the [[Twitter Configuration>>doc:Technical Documentation.CXAIR.Administration Guide.6\. Advanced Configuration.6e\. Twitter.WebHome]] chapter for more information.

264

265

The **Twitter** option allows the selection of a specific feed of data that will be Indexed.

266

267

Using the **Datafeed** drop-down list, **Followers** or **Twitter Query** can be specified as the source data.

268

269

=== Twitter Mashup ===

270

271

Allows users to merge the data from a **Followers Data Source Group** and a **Tweet Data Source Group** into a single Data Source Group.

272

273

=== Twitter Subject ===

274

275

Used to track the activity of particular Twitter Users. Select the relevant Twitter Index from the **Index** drop-down list.

== Unstructured ==

=== File System ===

Creates Indexes from a folder location on the server. This file system can include both structured and unstructured data, with each file Indexed as a new row of data.

282

283

Please refer to the [[Unstructured Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.i\. Unstructured Data Source Wizard.WebHome]] for more information concerning the available configuration options.

=== IMAP ===

Allows an email account to be used as a data source through Internet Message Access Protocol (IMAP) compatibility.

288

289

Use the **Messages to Index** drop-down list to restrict the number of messages, if required, and enable the **Include Attachments** option to Index received attachments.

=== POP3 ===

Allows an email account to be used as a data source through Post Office Protocol 3 (POP3) compatibility.

294

295

Enable the **Include Attachments** option to Index received attachments.

=== Tika ===

Extends the data source compatibility to include the Apache Tika toolkit, capable of extracting metadata and text from over a thousand file types.

300

301

Specify the relevant **Directory** using the **…** icon.

=== Web Site ===

Allows users to build an Index from a website.

306

307

The URL must be provided for CXAIR to locate the files to Index.

== Custom ==

The following Data Sources are not included with the installation of CXAIR and are imported via the [[Extensions>>doc:Technical Documentation.CXAIR.Administration Guide.6\. Advanced Configuration.06i\. Extension Management.WebHome]] screen. For more information, please contact support@connexica.com.

=== Stamford NLP ===

The Stamford NLP Data Source leverages sentiment analysis to process text and output values relating to positive and negative words and phrases. This can be used to automate the analysis of text fields without the manual process usually required to glean insight from data sources containing large amounts of text.

316

317

Select a source Index from **Index** drop-down list the fields that will be analysed using the **Sentiment Fields** drop-down list.

318

319

If the source Index has been built from Twitter data, enable the **Twitter** checkbox to reveal a number of options that facilitate the exclusion of elements such as hashtags or emoticons from the analysis.

320

321

Once the Index has been built, a number of key fields are created:

322

323

* <Field Name> Sentiment Positive Score

324

* <Field Name> Sentiment Negative Score

325

* <Field Name> Sentiment Positive Count

326

* <Field Name> Sentiment Negative Count

327

* <Field Name> Sentiment None Count

328

* <Field Name> Sentiment Difference Score

329

* <Field Name> Sentiment Total Score

330

331

Fields with a higher total score are classed as having a 'Positive' sentiment, with the lower scores representing more 'Negative' sentiment. Users can reference these values when reporting to automate the analysis of the selected sentiment fields.

Wiki source code of 04a. Creating a Data Source

Navigation