Wiki source code of 04a. Creating a Data Source

Last modified by Ross Beck on 06/11/2025, 14:23

Hide last authors
Ross Beck 1.1 1 {{box cssClass="floatinginfobox" title="**Contents**"}}
2 {{toc/}}
3 {{/box}}
4
5 Navigate to the Data Sources screen by clicking **Search Engine**, then **Data Sources**. All currently loaded Data Sources will be displayed, and clicking **New** will allow the configuration of a new source.
6
7 Enter the desired name in the **Name** textbox. This name must be unique.
8
9 Use the **Tags** textbox to add associated search terms to the Data Source. This allows the components to be accessed using customised strings when using the search bar.
10
11 Once all the options have been completed, click **Create Data Source** to complete the process. To build the data source and navigate directly to the [[Data Source Groups>>doc:Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.b\. Creating a Data Source Group.WebHome]] screen, click the up arrow next to this text and click **Create Data Source and View Data Source Groups**.
12
13 = Data Source Options =
14
15 The data source options are found in the **Type** drop-down list, with the subsequent options updating once a source is selected.
16
17 == Data ==
18
19 === {{id name="Crosstab"/}}Crosstab ===
20
21 Allows a saved Crosstab to be stored as an Index.
22
23 This data source is initially built with no data attached, with the specified Crosstab loaded after the initial build process.
24
25 === Database ===
26
27 Enables users to build a data source from a database. This can be used to capture live data and can then be scheduled to build on a regular basis to ensure the data used for reporting is up-to-date.
28
29 Please refer to the [[Database Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]] chapter for more detailed information concerning the available configuration options.
30
31 === Mongo DB ===
32
33 This source allows users to link to a Mongo DB database.
34
35 === Simple XML ===
36
37 Uses an uploaded Simple XML file as a data source.
38
39 Click the **…** icon next to the **XML File** box to load the relevant file.
40
41 === SPSS Export File ===
42
43 Enables the use of an IBM SPSS Statistics export file as a data source.
44
45 Click the **…** icon next to the **File** box to load the relevant file.
46
47 === XML File ===
48
49 Uses an uploaded XML file as a data source.
50
51 Click the **…** icon next to the **XML File** and **XSD File** boxes to load the relevant files.
52
53 == Data Warehouse ==
54
55 === Amalgamated ===
56
57 Allows users to create a single data source from various separate data sources that are joined based on join keys. The process works in a similar way to joining tables in SQL where data is joined based on shared data characteristics and results in fast access to data that may have previously existed across multiple sources.
58
59 Please refer to the [[Amalgamated Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.b\. Amalgamated Data Source Wizard.WebHome]] chapter for more detailed information concerning the available configuration options.
60
61 === Calendar Assistant ===
62
63 Adds rows to the provided Index created from intersections in the specified date ranges.
64
65 Select a source value from the **Index** drop-down list to populate the **Start Date** and **End Date** drop-down lists with the required date fields. The **Use Today If End Date Is Null **option, when checked, will replace the **End Date** with the current system date if the **End Date** is NULL.
66
67 Enable the **Build Daily Calendar** option if an automatic daily refresh the data source is required, or the **Months** option to instead add values on a monthly basis.
68
69 === Copy Data Source Group ===
70
71 This option gives users the ability to copy an existing Data Source Group and build the relevant Data source and Index via a single process rather than in each individual screen.
72
73 Use the **Data Source Group** drop-down list to select the Data Source Group be copied, and select the fields that will be present in the copy from the **Index Fields** drop-down list.
74
75 To restrict the number of rows copied, enter a value in the **Number To Index** textbox. Leaving this option blank will result in all rows being copied.
76
77 === Data Mart ===
78
79 Allows users to extract a portion of an existing Index and incorporate the data into a standalone data source.
80
81 By saving a query within an Index that restricts the data at the required level, this query can then be selected to create an independent data source with a view to increasing performance and removing any irrelevant entries for the end user.
82
83 A Data Mart cannot be built from an unstructured Index.
84
85 Click the **Select Query** button to open the Load screen, where a previously saved Query can be selected, the results of which will drive the Index records.
86
87 === Delete ===
88
89 Allows the deletion of records from an Index using a saved query.
90
91 By saving a query within an Index that contains all of the records to be deleted, this query can then be ran as a Delete data source to remove the contained records.
92
93 === Editable Data ===
94
95 This allows users to create an Index where values can be manually overridden and new rows of data can be added. When a change is made, the field is highlighted green and is only applied after the next Index build.
96
97 Please refer to the [[Editable Data>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome||anchor="Editable Data"]] section of the Data Warehouse chapter for more information.
98
99 === ETL Package ===
100
101 This source provides the ability to loop through data to locate entries with specific characteristics. Additional columns are added at Index level against each affected row based on the keys set.
102
103 Multiple stages can be configured to add values based on previously created filter values, allowing the inclusion of numerous additional fields to effectively loop through large data sets to locate specific entries.
104
105 Please refer to the [[ETL Package>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome||anchor="ETL"]] section of the [[Data Warehouse>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome]] chapter for more information.
106
107 === Ranking ===
108
109 The Ranking Data Source will result in a Ranking Index, an extension of the aggregation functionality. This Index allows the creation of automatically ranked groups of information by Day, Week, Month or Year based on a series of calculations performed on specific fields. This results in the discovery of the top and bottom ‘n’ values and associated percentages.
110
111 Enter the name that will be given to the associated Data Source, Data Source Group and Index in the **Name** textbox. This name must be unique.
112
113 Specify the source Index using the **Index** dropdown list, and select a field from the **Primary Key** dropdown list to specify the field that will be used to calculate the groupings within the resulting Index.
114
115 Select the field to be fields to be aggregated within each grouping from the **Aggregations** dropdown list, and specify the type of aggregation by selecting from the **Aggregation Methods** options, where **Average**, **Count**, **Count All**, **Maximum**, **Minimum**, **Standard Deviation** and **Sum** are available.
116
117 The **Timestamp** option specifies the data field that will be used to create the different grouping fields. This must be a full date or date/time field. Use the **Months** option to specify the number of months that groupings are created from.
118
119 Use the **Frequency** options to specify the period types the groupings will adhere to. Select **Custom** to determine the frequency using the data field selected for the **Timestamp** option, or select a predefined interval from the following options: **Days**, **Months**, **Months Summary**, **Rolling**, **Weeks** and **Years**.
120
121 Specify the level of the produced ranking using the **Grouping** drop-down list. This is especially useful when a certain level of analysis is required, such as ranking store performance rather than region performance in a retail Index.
122
123 === Reg Ex File ===
124
125 Allows users to correct errors in an input file using regular expression (Regex) to write changes to an output file.
126
127 The **RegEx Mapping** options allow the specification of the changes and the **Stop at First Match** option halts any further changes to a file once a single match has been found.
128
129 === Reverse Transpose ===
130
131 Allows users to transpose values within rows into columns.
132
133 Use the **Index** drop-down list to specify the source of the values to be reverse transposed.
134
135 Specify what will be used to identify the rows to be transposed using the **Repeating Fields** drop-down list. From the **Distinct Fields** drop-down list, select the original column values.
136
137 Enter a column name for the values specified as **Distinct Fields** in the **Field Name** textbox, and a column name for the associated values in the **Value Name** textbox.
138
139 By default, blank fields are not included as part of the reverse transpose process. Enable the **Include Blank Fields** option to include them in the output.
140
141 === Rules ===
142
143 A Rules data source will use pre-set rules to validate or update data.
144
145 Rules can be applied to individual fields and the output will inform the user of the number of rules that have passed or failed. Rules are categorised as either major or minor to allow the output to be split based on priority.
146
147 Please refer to the [[Rules>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.a\. Rules.WebHome||anchor="Rules"]] section of the [[Data Warehouse>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome]] for chapter more information.
148
149 === Rules Promotion ===
150
151 The Rules Promotion Data Source allows an administrator to specify the maximum number of errors before an Index is prevented from being made available to end users. If the maximum number of minor or major errors is breached, the new version of the data will not be available in the front-end and the previous version will remain in place until the errors are corrected and the data refreshed.
152
153 Requires a Rules Index to be built against.
154
155 === Rules Summary ===
156
157 This source displays all of the Rules data sources within the CXAIR instance. The passed and failed rules are counted across the different sources.
158
159 Please refer to the [[Rules Summary>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome||anchor="Rules Summary"]] section of the [[Data Warehouse>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.WebHome]] chapter for more information.
160
161 === {{id name="System Calendar"/}}System Calendar ===
162
163 After setting the [[Working Days>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Working Days"]] and [[Holidays>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Holidays"]] options in the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome]], a calendar Index detailing all dates and their working day status can be created the validate and cross-compare values.
164
165 Once built, five fields are created in the Index: **Date**, **Working Day Number**, **Working Day String**, **Count Working Days Of Month** and **Count Working Days of Year**. Use these columns against a date to validate the options set in the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome||anchor="Working Days"]].
166
167 === Transpose ===
168
169 Allows users to transpose values within columns into rows.
170
171 Use the **Index** drop-down list to specify the source of the values to be transposed. To restrict the number of transposed entries, specify a **Saved Query** containing the display fields to be processed.
172
173 Use the **Primary Key** dropdown list to specify what will be used to identify the rows to be transposed.
174
175 Specify the fields in the **Field 1 (Column Headers)** dropdown list that will be transposed into column headers once the Index has been created. The fields specified in the **Field 2 (Values)** dropdown list will populate these new columns based on the primary key value. To include additional fields in the new Index, select them from the **Extra Fields (Distincts)** dropdown list.
176
177 If the fields specified in the **Field 2 (Values)** dropdown list contain multiple values, enabling the **Flatten** option will split these values into separate columns.
178
179 == Meta Data ==
180
181 The following three stages result in the discovery of foreign primary keys between Indexes.
182
183 === Meta Values Data Source Stage 1 ===
184
185 In this stage, the underlying field names and values from multiple Indexes grouped in a collection can be bought together and organised.
186
187 Specify a collection from the **Collections** drop-down list and enable the **Create Index** option to automatically create an Index from the data source.
188
189 === Meta Values Data Source Stage 2 ===
190
191 Using the Index created in Stage 1, the values taken from the various tables can now be matched.
192
193 Specify the Index from the **Index** drop-down list and enable the **Create Index** option to automatically create and Index from the data source.
194
195 === Meta Values Data Source Stage 3 ===
196
197 Now the table values have been matched, the Index created in Stage 2 can now be processed to discover the foreign primary keys.
198
199 Specify the Index from the **Index** drop-down list and enable the **Create Index** option to automatically create and Index from the data source.
200
201 === Metadata ===
202
203 Enables an Index to be built from database schemas to analyse and manipulate the associated metadata.
204
205 Please refer to the [[Metadata Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.d\. Metadata Wizard.WebHome]] chapter for more detailed information concerning the available configuration options.
206
207 == Solutions ==
208
209 === Basket Analysis Wizard ===
210
211 [[Basket Analysis>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]] is a premium licence option that groups, ranks and transforms transactional data into a format that can be easily used to analyse spending and usage patterns across large data sets.
212
213 Items relating to a single transaction or event are grouped together and stored as a ‘basket’. Multiple baskets can be analysed and compared to identify frequently recurring items.
214
215 Without [[Basket Analysis>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]] enabled, a single basket is comprised of multiple items or instances which require individual rows. Using [[Basket Analysis>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]], multiple instances appear on a single row and individual customer spending can be tracked by grouping multiple items in a single basket or transaction. This provides insight into what products are being purchased, when, in what quantity and in what combination.
216
217 Please refer to the [[Basket Analysis Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Basket Analysis.WebHome]] for more information regarding the available configuration options.
218
219 === Benford’s Distribution ===
220
221 Applies Benford’s Law to an Index, creating extra columns to denote
222
223 Benford’s Law states that if selecting a random number from base 10 statistical data, the probability that the first digit is ‘1’ is around 0.301 rather than the 0.1 expected if all digits were equally likely.
224
225 This is especially useful in fraud detection to quickly determine if figures do not match a known frequency.
226
227 === Drive Time ===
228
229 This source uses Google Maps to calculate the amount of time it would take to drive between two points on a map.
230
231 New fields are added to the Index which display the distance and time taken to travel between the two points. The start and end point are fully configurable and the Mode options allow the choice between driving, walking and cycling. The Avoid options allow further configuration of the route and the Units options change how the distance is displayed.
232
233 To create a Drive Time data source, the Index requires at least two plottable points on a map.
234
235 === HL7 V2 ===
236
237 Allows users to connect to databases adhering to the HL7 V2 standard.
238
239 === Ping ===
240
241 Using the Ping data source, an Index can be created to monitor network connectivity across a server.
242
243 Specify the length of time, in milliseconds, that the process will wait for a response before determining that it has timed out in the **Timeout** textbox. Increase this number to account for larger servers that may take longer to receive the ping request.
244
245 Increase the value entered in the **Threads** textbox to allocate more concurrent processes to the network task. Any potential improvements are dependent on the multi-core performance of the system’s processor. Please refer to the [[System Settings>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Settings.WebHome]] chapter for more performance tuning options.
246
247 Finally, select the device used to connect to the host network from the **Network** drop-down list.
248
249 === Target ===
250
251 Allows the creation of a Targets Index. Once built, reports can be created to show the hierarchy of Target values before the data is joined to other Indexes. Please refer to the Targets section of the Extensions chapter for more information.
252
253 == Third-Party ==
254
255 === Google Analytics ===
256
257 Allows users to create a data source from collected Google Analytics data.
258
259 The correct API key has to be configured using the [[Google Analytics Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.e\. Google Analytics Wizard.WebHome]] for the data to be loaded.
260
261 === {{id name="TwitterDS"/}}Twitter ===
262
263 Before any Twitter Indexes can be created, a number of options must be configured. Please refer to the [[Twitter Configuration>>doc:Technical Documentation.CXAIR.Administration Guide.6\. Advanced Configuration.6e\. Twitter.WebHome]] chapter for more information.
264
265 The **Twitter** option allows the selection of a specific feed of data that will be Indexed.
266
267 Using the **Datafeed** drop-down list, **Followers** or **Twitter Query** can be specified as the source data.
268
269 === Twitter Mashup ===
270
271 Allows users to merge the data from a **Followers Data Source Group** and a **Tweet Data Source Group** into a single Data Source Group.
272
273 === Twitter Subject ===
274
275 Used to track the activity of particular Twitter Users. Select the relevant Twitter Index from the **Index** drop-down list.
276
277 == Unstructured ==
278
279 === File System ===
280
281 Creates Indexes from a folder location on the server. This file system can include both structured and unstructured data, with each file Indexed as a new row of data.
282
283 Please refer to the [[Unstructured Data Source Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.i\. Unstructured Data Source Wizard.WebHome]] for more information concerning the available configuration options.
284
285 === IMAP ===
286
287 Allows an email account to be used as a data source through Internet Message Access Protocol (IMAP) compatibility.
288
289 Use the **Messages to Index** drop-down list to restrict the number of messages, if required, and enable the **Include Attachments** option to Index received attachments.
290
291 === POP3 ===
292
293 Allows an email account to be used as a data source through Post Office Protocol 3 (POP3) compatibility.
294
295 Enable the **Include Attachments** option to Index received attachments.
296
297 === Tika ===
298
299 Extends the data source compatibility to include the Apache Tika toolkit, capable of extracting metadata and text from over a thousand file types.
300
301 Specify the relevant **Directory** using the **…** icon.
302
303 === Web Site ===
304
305 Allows users to build an Index from a website.
306
307 The URL must be provided for CXAIR to locate the files to Index.
308
309 == Custom ==
310
311 The following Data Sources are not included with the installation of CXAIR and are imported via the [[Extensions>>doc:Technical Documentation.CXAIR.Administration Guide.6\. Advanced Configuration.06i\. Extension Management.WebHome]] screen. For more information, please contact support@connexica.com.
312
313 === Stamford NLP ===
314
315 The Stamford NLP Data Source leverages sentiment analysis to process text and output values relating to positive and negative words and phrases. This can be used to automate the analysis of text fields without the manual process usually required to glean insight from data sources containing large amounts of text.
316
317 Select a source Index from **Index** drop-down list the fields that will be analysed using the **Sentiment Fields** drop-down list.
318
319 If the source Index has been built from Twitter data, enable the **Twitter** checkbox to reveal a number of options that facilitate the exclusion of elements such as hashtags or emoticons from the analysis.
320
321 Once the Index has been built, a number of key fields are created:
322
323 * <Field Name> Sentiment Positive Score
324 * <Field Name> Sentiment Negative Score
325 * <Field Name> Sentiment Positive Count
326 * <Field Name> Sentiment Negative Count
327 * <Field Name> Sentiment None Count
328 * <Field Name> Sentiment Difference Score
329 * <Field Name> Sentiment Total Score
330
331 Fields with a higher total score are classed as having a 'Positive' sentiment, with the lower scores representing more 'Negative' sentiment. Users can reference these values when reporting to automate the analysis of the selected sentiment fields.