XWiki - Technical Documentation.CXAIR.Reporting Guides.ETL Data Processing Guide

This guide has been written to outline the steps required to Index, join, shape and flatten data using the [[Database Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]], [[Amalgamated Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.b\. Amalgamated Data Source Wizard.WebHome]] and [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]] to automate the creation of a customer master record, with the final output summarising customer activity across the business.

6

7

To facilitate this level of data processing, the [[Extract, Transform & Load (ETL)>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]][[ >>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]]functionality is a vital component. This is due to how [[CXAIR>>doc:Technical Documentation.CXAIR.WebHome]] processes data when Indexed on the system.

8

9

When building an Index, [[CXAIR>>doc:Technical Documentation.CXAIR.WebHome]] handles the retrieved data row-by-row. Therefore, there is no grouping at this point and no way to relate rows to each other. For example, the same Customer ID appearing on each row is inconsequential – [[CXAIR>>doc:Technical Documentation.CXAIR.WebHome]] simply stores the rows in the order they are retrieved.

10

11

To discover and specify these relationships, these Indexes can then be used in the [[Crosstab>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.WebHome]] functionality where fields can be grouped at row or column level before aggregated values are derived at run-time. For example, using a date field as a column value will order the values in the report and update the output accordingly.

12

13

With this in mind, the following questions pose a problem when simply viewing the raw data:

14

15

//When was a customer’s first transaction? What was the value?

16

What is the total balance value for a single customer across multiple accounts?

17

What is the most recent transaction value?//

18

19

The answers to these questions cannot be determined without prior grouping at customer level, as each transaction is stored as a single row.

20

21

This is where the [[ETL>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]] functionality and its fundamentally different approach to data processing is crucial, as it is possible to loop through data based on a set of keys to group items together. Using the wizard, a number of grouped aggregations can be written out to the Index that negates the requirement for report-level relationship discovery.

22

23

For the purpose of this guide, demonstration data has been used that contains three separate customer, account and transaction tables. For a copy of this sample dataset, please contact Connexica.

24

25

= Creating the Source Indexes =

26

27

Three Indexes will be created that each serves a specific purpose, providing key information for an area of the business.

28

29

Using the [[Database Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]], select Excel XLSX as the **Database Type** and use the **File** option to select the supplied Excel file.

30

31

For the first Index, enter ‘Accounts’ into the **Sheet** textbox. For the second Index, repeat the above process and instead enter ‘Customers’ into the **Sheet** textbox and for the third Index, enter ‘Transactions’ into the **Sheet** textbox:

32

33

[[image:Screenshot 1.png]]

34

35

Ensure the three Indexes are added into the same Search Engine and the Home screen should resemble the following screenshot:

36

37

[[image:Screenshot 2.png||height="330" width="950"]]

38

39

The Accounts Index contains multiple account information for a single customer:

40

41

[[image:Screenshot 3.png||height="309" width="950"]]

42

43

The Customers Index contains a record per customer, detailing personal information:

44

45

[[image:Screenshot 4.png||height="207" width="950"]]

46

47

The Transactions Index contains transactional-level information for a single customer across multiple accounts:

48

49

[[image:Screenshot 5.png||height="417" width="950"]]

50

51

= Amalgamating the Data =

52

53

With three separate Indexes now created, the [[Amalgamated Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.b\. Amalgamated Data Source Wizard.WebHome]] can be used to join the data into a single Amalgamated Index.

54

55

This is facilitated by two joins. First, join the Customers and Accounts Indexes using a Left Outer join and select ‘Customer_ID’ as the common field. This will result in account information being added for every customer where the ‘Customer_ID’ field matches.

56

57

Leave the **Display Fields** checkboxes blank to include every column in the resulting output:

58

59

[[image:Screenshot 6.png||height="497" width="950"]]

60

61

Next, create a second join that combines the Accounts and Transactions Indexes using a Left Outer join with ‘Account_Number' used as the first common field. For this join, a second common field is also needed. Click **Add Link** and select ‘Account_Type’ and ‘Acc_Type’ to meet this requirement. Establishing this second condition ensures that a unique account number is not matched to the wrong type of account.

62

63

From the Accounts Index **Display Fields**, leave the checkboxes empty to bring through every column, while from the Transactions Index, select the ‘Account Name’, ‘Trans ID’, ‘Transaction Date’, ‘Transaction Type’ and ‘Transaction Value’ fields to ensure duplicate fields are not in the output:

64

65

[[image:1697452418525-512.png||height="734" width="1310"]]

66

67

Use the **Add to Search Engine** drop-down list to specify the Search Engine the resulting Index will be added to, enable the **Build Now** option and click **Create** to complete the process.

68

69

The resulting Index now contains a row for every transaction with the additional customer and account-level columns present:

70

71

[[image:Screenshot 8.png||height="530" width="950"]]

72

73

= Extra Field Calculations =

74

75

With an Amalgamated Index now created from the three source Indexes, the output can be further supplemented by utilising the [[Extra Fields>>doc:Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.b\. Creating a Data Source Group.WebHome||anchor="Extra Fields"]] functionality. Written at Data Source Group level, calculations can be added that derive additional columns at build-time.

76

77

The first extra field that can be added to the Amalgamated Index concatenates the Account Number and Account Name fields to create a composite key called Product Key. This will be used in later stages to denote the number of unique accounts opened by a customer and uses the following logic:

78

79

**AccountNumber + '-' + AccountName**

80

81

[[image:Screenshot 9.png]]

82

83

The second extra field, Months as Customer, dynamically compares the Inception Date value for every row and compares it to the current system date. The difference is then written out in months. This uses the following logic:

84

85

**DateBetween(InceptionDate , TODAY , MONTHS)**

86

87

[[image:Screenshot 10.png]]

88

89

Once the two calculated fields have been saved, rebuild the Index and validate the results in the [[Query>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome]] screen.

= Saved Queries =

With rows from multiple accounts joined in a single Index, the following queries can be used to access key areas of interest:

94

95

**+Account_Status:"Active"**

96

**+Account_Status:"Closed"**

97

**+Account_Status:"Offer"**

98

99

These queries provide a cohort of data for active and closed accounts, along with any accounts that have been offered to a customer. Save these queries to utilise this dynamic filtering when creating aggregations in the [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]].

= ETL Processing =

In this example, the data processing is split into five distinct stages in order to shape the data into the final output.

104

105

Using the [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]], select the previously created Amalgamated Index from the **Index** drop-down list to reveal the necessary processing options. (The [[ETL Wizard >>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]]allows additional steps to be added directly within the ETL to create these Database & Amalgamated Indexes. Clicking the '+' plus button lists these optional steps.)

== Stage 1 ==

The first stage will flag the first and last transaction for each customer and will count their number of accounts, products and transactions.

110

111

When using the [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]], each stage requires a Primary Key value entered in the **Key** drop-down list, and a field that will be used to order the output entered in the **Order** drop-down list.

112

113

As the output will be based at a customer-level, select ‘Customer ID’ from the **Key** drop-down list. Select ‘Transaction Date’ from the **Order** drop-down list to specify that the data will be ordered by the date of each transaction:

114

115

[[image:Screenshot 11.png]]

116

117

From the **Flags** tab, select the ‘Transaction Date’ field from the **Is First Minimum** and **Is Last Maximum** drop-down lists. This will identify the first and last transaction date for each customer:

118

119

[[image:Screenshot 12.png||height="620" width="720"]]

120

121

In the **Aggregations** tab, use the **Count Unique** option to count the number of accounts, products and transactions for each customer. This is achieved by selecting a key field that denotes each area of interest. Select ‘Account Number’, ‘Product Key’ and ‘Trans ID’.

122

123

Additionally, select the ‘Transaction Value’ field from the **Sum** drop-down list to sum the values from multiple rows:

124

125

[[image:Screenshot 13.png]]

126

127

As the **Is First Minimum** and **Is Last Maximum** flags have been set, the **Additional** tab provides further options to tailor the output when the flag is true. Set the ‘Account Type’ and ‘Transaction Date’ fields for the first transaction, and the ‘Transaction Date’, ‘Transaction Type’ and ‘Transaction Value’ fields for the last transaction:

128

129

[[image:Screenshot 14.png]]

130

131

By setting these options in the **Additional** tab, each customer’s first transaction is flagged and the account type is added to the row. Additionally, each customer’s last transaction is also supplemented with the date, type of transaction and the value of the transaction.

== Stage 2 ==

The next three stages all perform the same actions against different subsets of data, each pertaining to a different account type.

136

137

The second stage will count closed products per customer and does not need the data to be output in a particular order. Therefore, select ‘Customer ID’ for both the **Key** and **Order** drop-down lists:

138

139

[[image:Screenshot 15.png]]

140

141

In the **General** section, select the closed accounts query saved previously and enable the **Others** option:

142

143

[[image:Screenshot 16.png]]

144

145

Using a saved query at this point will filter down this stage to rows of interest. Without the **Others** option enabled, any rows not returned by the saved query are discarded. When enabled, these results are amended to the bottom of the stage output for use in future stages. This means that stages three and four can still utilise other cohorts of data not used in this stage.

146

147

From the **Aggregations** tab, select the ‘Product Key’ field from the **Count Unique** drop-down list. Due to the added query, this will count the number of closed accounts for each customer:

148

149

[[image:Screenshot 17.png||height="362" width="720"]]

== Stage 3 ==

Repeat the options specified for Stage 2, instead selecting the Active Accounts query. This will count each active account per customer.

== Stage 4 ==

Repeat the options specified for Stage 2, instead selecting the Offered Accounts query. This will count each account offer per customer.

== Stage 5 ==

The final stage deduplicates records and squashes the data into a single row per customer. This is achieved by selecting ‘Customer ID’ in both the **Key** and **Order** drop-down lists and selecting the Squash option:

162

163

[[image:Screenshot 2019-09-12 at 16.50.10.png]]

164

165

When the **Squash** option has been enabled, the fields that will be present in the output now need to be specified in the **Copy** tab.

166

167

By default, every field is added to the **Copy** drop-down list. However, as the output relies on unique values, remove all entries from the **Copy** drop-down list and instead populate the **Distinct** drop-down list with the required fields:

168

169

[[image:Screenshot 2019-10-22 at 14.39.56.png]]

170

171

If preferred, these fields can instead be added to the **Unique** drop-down list at this point, which will perform the same function but order the values. As ordering the values at this point will not provide any meaningful insight, this is not required.

172

173

With the all five stages configured, use the **Add to Search Engine** drop-down list to specify the Search Engine the resulting Index will be added to, enable the **Build Now** option and click **Create** to complete the process.

174

175

= Renaming Key Fields =

176

177

To enhance the end-user experience and make the fields easier to understand, navigate to the Index level of the created ETL Index and open the **Index Fields** options.

178

179

From here change the text in the **Display Field** text to better represent the field contents. For example, change ‘Account_Number_COUNTUNIQUE’ to ‘No. of Accounts’.

180

181

For a full list of recommended field names, please see the below screenshot:

182

183

[[image:Screenshot 20.png||height="740" width="950"]]

184

185

In the [[Query>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome]] screen the Index should now resemble the following screenshot:

186

187

[[image:Screenshot 21.png||height="199" width="950"]]

188

189

This ETL output contains a row per customer, outlining a number of key metrics to provide a view of activity across numerous areas of the business.

190

191

= Automating the Output =

192

193

With the data processing now complete, schedules can now be established to automate the entire process and account for new and changed records.

194

195

Rather than creating individual schedules for each component, this will be achieved using three Index schedules. The first will rebuild the Indexes from the source Excel sheet, the second will rebuild the subsequent Amalgamated Index and the third will rebuild the ETL Index.

196

197

First, navigate to the [[System Overview>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Overview.WebHome]] screen and select the three source Indexes:

198

199

[[image:Screenshot 22.png||height="265" width="950"]]

200

201

Once selected, click [[Collections>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Overview.WebHome||anchor="Collections"]] then **Save** and name the Collection ‘Source Data’. By grouping these three Indexes into one Collection, a single Index schedule can be created that will queue all contained Indexes at build-time, negating the requirement for many Index schedules.

202

203

Navigate to the [[Index Schedules>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.e\. Index Schedules.WebHome]] screen and click **New**. Enter ‘Source Indexes’ into the Name textbox and select the ‘Source Data’ collection from the Items drop-down list:

204

205

[[image:Screenshot 24.png]]

206

207

Use the **Frequency of Execution** drop-down list to specify how often the schedule will run and click **Create Schedule** to complete the process.

208

209

Next, create a new schedule called ‘Amalgamated Indexes’ and add the previously created Amalgamated Index using the **Items** drop-down list. From the **Frequency of Execution** drop-down list, select **Dependent**. This allows the schedule to only run once another schedule has completed, ensuring that the Index does not build until all of the source data has been refreshed successfully. Select the ‘Source Data’ schedule from the **Dependent Index Schedules** drop-down list and the screen should resemble the following screenshot:

210

211

[[image:Screenshot 25.png]]

212

213

Finally, create a third schedule titled ‘ETL Indexes’ and select the previously created ETL Index selected from the **Items** drop-down list and **Dependent** from the **Frequency of Execution** drop-down list. This time select the ‘Amalgamated Indexes’ schedule. This will instruct the system to only perform the ETL processing once the Amalgamated Index has built. The screen should resemble the following screenshot:

214

215

[[image:Screenshot 26.png]]

216

217

With all three items created, the following schedules should be visible:

218

219

[[image:Screenshot 27.png]]

220

221

Creating these dependent schedules will guarantee that the Indexes automatically build in the correct order, accounting for new information and changed records over time.

version	line-number	content
1.1	1	{{box cssClass="floatinginfobox" title="Contents"}}
	2	{{toc/}}
	3	{{/box}}
	4
	5	This guide has been written to outline the steps required to Index, join, shape and flatten data using the [[Database Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]], [[Amalgamated Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.b\. Amalgamated Data Source Wizard.WebHome]] and [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]] to automate the creation of a customer master record, with the final output summarising customer activity across the business.
	6
	7	To facilitate this level of data processing, the [[Extract, Transform & Load (ETL)>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]][[ >>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]]functionality is a vital component. This is due to how [[CXAIR>>doc:Technical Documentation.CXAIR.WebHome]] processes data when Indexed on the system.
	8
	9	When building an Index, [[CXAIR>>doc:Technical Documentation.CXAIR.WebHome]] handles the retrieved data row-by-row. Therefore, there is no grouping at this point and no way to relate rows to each other. For example, the same Customer ID appearing on each row is inconsequential – [[CXAIR>>doc:Technical Documentation.CXAIR.WebHome]] simply stores the rows in the order they are retrieved.
	10
	11	To discover and specify these relationships, these Indexes can then be used in the [[Crosstab>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2c\. Crosstabs.WebHome]] functionality where fields can be grouped at row or column level before aggregated values are derived at run-time. For example, using a date field as a column value will order the values in the report and update the output accordingly.
	12
	13	With this in mind, the following questions pose a problem when simply viewing the raw data:
	14
	15	//When was a customer’s first transaction? What was the value?
	16	What is the total balance value for a single customer across multiple accounts?
	17	What is the most recent transaction value?//
	18
	19	The answers to these questions cannot be determined without prior grouping at customer level, as each transaction is stored as a single row.
	20
	21	This is where the [[ETL>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]] functionality and its fundamentally different approach to data processing is crucial, as it is possible to loop through data based on a set of keys to group items together. Using the wizard, a number of grouped aggregations can be written out to the Index that negates the requirement for report-level relationship discovery.
	22
	23	For the purpose of this guide, demonstration data has been used that contains three separate customer, account and transaction tables. For a copy of this sample dataset, please contact Connexica.
	24
	25	= Creating the Source Indexes =
	26
	27	Three Indexes will be created that each serves a specific purpose, providing key information for an area of the business.
	28
	29	Using the [[Database Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.Database Data Source Wizard.WebHome]], select Excel XLSX as the Database Type and use the File option to select the supplied Excel file.
	30
	31	For the first Index, enter ‘Accounts’ into the Sheet textbox. For the second Index, repeat the above process and instead enter ‘Customers’ into the Sheet textbox and for the third Index, enter ‘Transactions’ into the Sheet textbox:
	32
	33	[[image:Screenshot 1.png]]
	34
	35	Ensure the three Indexes are added into the same Search Engine and the Home screen should resemble the following screenshot:
	36
	37	[[image:Screenshot 2.png\|\|height="330" width="950"]]
	38
	39	The Accounts Index contains multiple account information for a single customer:
	40
	41	[[image:Screenshot 3.png\|\|height="309" width="950"]]
	42
	43	The Customers Index contains a record per customer, detailing personal information:
	44
	45	[[image:Screenshot 4.png\|\|height="207" width="950"]]
	46
	47	The Transactions Index contains transactional-level information for a single customer across multiple accounts:
	48
	49	[[image:Screenshot 5.png\|\|height="417" width="950"]]
	50
	51	= Amalgamating the Data =
	52
	53	With three separate Indexes now created, the [[Amalgamated Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Wizards.b\. Amalgamated Data Source Wizard.WebHome]] can be used to join the data into a single Amalgamated Index.
	54
	55	This is facilitated by two joins. First, join the Customers and Accounts Indexes using a Left Outer join and select ‘Customer_ID’ as the common field. This will result in account information being added for every customer where the ‘Customer_ID’ field matches.
	56
	57	Leave the Display Fields checkboxes blank to include every column in the resulting output:
	58
	59	[[image:Screenshot 6.png\|\|height="497" width="950"]]
	60
	61	Next, create a second join that combines the Accounts and Transactions Indexes using a Left Outer join with ‘Account_Number' used as the first common field. For this join, a second common field is also needed. Click Add Link and select ‘Account_Type’ and ‘Acc_Type’ to meet this requirement. Establishing this second condition ensures that a unique account number is not matched to the wrong type of account.
	62
	63	From the Accounts Index Display Fields, leave the checkboxes empty to bring through every column, while from the Transactions Index, select the ‘Account Name’, ‘Trans ID’, ‘Transaction Date’, ‘Transaction Type’ and ‘Transaction Value’ fields to ensure duplicate fields are not in the output:
	64
	65	[[image:1697452418525-512.png\|\|height="734" width="1310"]]
	66
	67	Use the Add to Search Engine drop-down list to specify the Search Engine the resulting Index will be added to, enable the Build Now option and click Create to complete the process.
	68
	69	The resulting Index now contains a row for every transaction with the additional customer and account-level columns present:
	70
	71	[[image:Screenshot 8.png\|\|height="530" width="950"]]
	72
	73	= Extra Field Calculations =
	74
	75	With an Amalgamated Index now created from the three source Indexes, the output can be further supplemented by utilising the [[Extra Fields>>doc:Technical Documentation.CXAIR.Administration Guide.4\. Manual Index Creation.b\. Creating a Data Source Group.WebHome\|\|anchor="Extra Fields"]] functionality. Written at Data Source Group level, calculations can be added that derive additional columns at build-time.
	76
	77	The first extra field that can be added to the Amalgamated Index concatenates the Account Number and Account Name fields to create a composite key called Product Key. This will be used in later stages to denote the number of unique accounts opened by a customer and uses the following logic:
	78
	79	AccountNumber + '-' + AccountName
	80
	81	[[image:Screenshot 9.png]]
	82
	83	The second extra field, Months as Customer, dynamically compares the Inception Date value for every row and compares it to the current system date. The difference is then written out in months. This uses the following logic:
	84
	85	DateBetween(InceptionDate , TODAY , MONTHS)
	86
	87	[[image:Screenshot 10.png]]
	88
	89	Once the two calculated fields have been saved, rebuild the Index and validate the results in the [[Query>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome]] screen.
	90
	91	= Saved Queries =
	92
	93	With rows from multiple accounts joined in a single Index, the following queries can be used to access key areas of interest:
	94
	95	+Account_Status:"Active"
	96	+Account_Status:"Closed"
	97	+Account_Status:"Offer"
	98
	99	These queries provide a cohort of data for active and closed accounts, along with any accounts that have been offered to a customer. Save these queries to utilise this dynamic filtering when creating aggregations in the [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]].
	100
	101	= ETL Processing =
	102
	103	In this example, the data processing is split into five distinct stages in order to shape the data into the final output.
	104
	105	Using the [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]], select the previously created Amalgamated Index from the Index drop-down list to reveal the necessary processing options. (The [[ETL Wizard >>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]]allows additional steps to be added directly within the ETL to create these Database & Amalgamated Indexes. Clicking the '+' plus button lists these optional steps.)
	106
	107	== Stage 1 ==
	108
	109	The first stage will flag the first and last transaction for each customer and will count their number of accounts, products and transactions.
	110
	111	When using the [[ETL Wizard>>doc:Technical Documentation.CXAIR.Administration Guide.Data Warehouse.c\. ETL.WebHome]], each stage requires a Primary Key value entered in the Key drop-down list, and a field that will be used to order the output entered in the Order drop-down list.
	112
	113	As the output will be based at a customer-level, select ‘Customer ID’ from the Key drop-down list. Select ‘Transaction Date’ from the Order drop-down list to specify that the data will be ordered by the date of each transaction:
	114
	115	[[image:Screenshot 11.png]]
	116
	117	From the Flags tab, select the ‘Transaction Date’ field from the Is First Minimum and Is Last Maximum drop-down lists. This will identify the first and last transaction date for each customer:
	118
	119	[[image:Screenshot 12.png\|\|height="620" width="720"]]
	120
	121	In the Aggregations tab, use the Count Unique option to count the number of accounts, products and transactions for each customer. This is achieved by selecting a key field that denotes each area of interest. Select ‘Account Number’, ‘Product Key’ and ‘Trans ID’.
	122
	123	Additionally, select the ‘Transaction Value’ field from the Sum drop-down list to sum the values from multiple rows:
	124
	125	[[image:Screenshot 13.png]]
	126
	127	As the Is First Minimum and Is Last Maximum flags have been set, the Additional tab provides further options to tailor the output when the flag is true. Set the ‘Account Type’ and ‘Transaction Date’ fields for the first transaction, and the ‘Transaction Date’, ‘Transaction Type’ and ‘Transaction Value’ fields for the last transaction:
	128
	129	[[image:Screenshot 14.png]]
	130
	131	By setting these options in the Additional tab, each customer’s first transaction is flagged and the account type is added to the row. Additionally, each customer’s last transaction is also supplemented with the date, type of transaction and the value of the transaction.
	132
	133	== Stage 2 ==
	134
	135	The next three stages all perform the same actions against different subsets of data, each pertaining to a different account type.
	136
	137	The second stage will count closed products per customer and does not need the data to be output in a particular order. Therefore, select ‘Customer ID’ for both the Key and Order drop-down lists:
	138
	139	[[image:Screenshot 15.png]]
	140
	141	In the General section, select the closed accounts query saved previously and enable the Others option:
	142
	143	[[image:Screenshot 16.png]]
	144
	145	Using a saved query at this point will filter down this stage to rows of interest. Without the Others option enabled, any rows not returned by the saved query are discarded. When enabled, these results are amended to the bottom of the stage output for use in future stages. This means that stages three and four can still utilise other cohorts of data not used in this stage.
	146
	147	From the Aggregations tab, select the ‘Product Key’ field from the Count Unique drop-down list. Due to the added query, this will count the number of closed accounts for each customer:
	148
	149	[[image:Screenshot 17.png\|\|height="362" width="720"]]
	150
	151	== Stage 3 ==
	152
	153	Repeat the options specified for Stage 2, instead selecting the Active Accounts query. This will count each active account per customer.
	154
	155	== Stage 4 ==
	156
	157	Repeat the options specified for Stage 2, instead selecting the Offered Accounts query. This will count each account offer per customer.
	158
	159	== Stage 5 ==
	160
	161	The final stage deduplicates records and squashes the data into a single row per customer. This is achieved by selecting ‘Customer ID’ in both the Key and Order drop-down lists and selecting the Squash option:
	162
	163	[[image:Screenshot 2019-09-12 at 16.50.10.png]]
	164
	165	When the Squash option has been enabled, the fields that will be present in the output now need to be specified in the Copy tab.
	166
	167	By default, every field is added to the Copy drop-down list. However, as the output relies on unique values, remove all entries from the Copy drop-down list and instead populate the Distinct drop-down list with the required fields:
	168
	169	[[image:Screenshot 2019-10-22 at 14.39.56.png]]
	170
	171	If preferred, these fields can instead be added to the Unique drop-down list at this point, which will perform the same function but order the values. As ordering the values at this point will not provide any meaningful insight, this is not required.
	172
	173	With the all five stages configured, use the Add to Search Engine drop-down list to specify the Search Engine the resulting Index will be added to, enable the Build Now option and click Create to complete the process.
	174
	175	= Renaming Key Fields =
	176
	177	To enhance the end-user experience and make the fields easier to understand, navigate to the Index level of the created ETL Index and open the Index Fields options.
	178
	179	From here change the text in the Display Field text to better represent the field contents. For example, change ‘Account_Number_COUNTUNIQUE’ to ‘No. of Accounts’.
	180
	181	For a full list of recommended field names, please see the below screenshot:
	182
	183	[[image:Screenshot 20.png\|\|height="740" width="950"]]
	184
	185	In the [[Query>>doc:Technical Documentation.CXAIR.User Guide.02\. Reporting.2a\. Query.WebHome]] screen the Index should now resemble the following screenshot:
	186
	187	[[image:Screenshot 21.png\|\|height="199" width="950"]]
	188
	189	This ETL output contains a row per customer, outlining a number of key metrics to provide a view of activity across numerous areas of the business.
	190
	191	= Automating the Output =
	192
	193	With the data processing now complete, schedules can now be established to automate the entire process and account for new and changed records.
	194
	195	Rather than creating individual schedules for each component, this will be achieved using three Index schedules. The first will rebuild the Indexes from the source Excel sheet, the second will rebuild the subsequent Amalgamated Index and the third will rebuild the ETL Index.
	196
	197	First, navigate to the [[System Overview>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Overview.WebHome]] screen and select the three source Indexes:
	198
	199	[[image:Screenshot 22.png\|\|height="265" width="950"]]
	200
	201	Once selected, click [[Collections>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.System Overview.WebHome\|\|anchor="Collections"]] then Save and name the Collection ‘Source Data’. By grouping these three Indexes into one Collection, a single Index schedule can be created that will queue all contained Indexes at build-time, negating the requirement for many Index schedules.
	202
	203	Navigate to the [[Index Schedules>>doc:Technical Documentation.CXAIR.Administration Guide.Status Monitoring.e\. Index Schedules.WebHome]] screen and click New. Enter ‘Source Indexes’ into the Name textbox and select the ‘Source Data’ collection from the Items drop-down list:
	204
	205	[[image:Screenshot 24.png]]
	206
	207	Use the Frequency of Execution drop-down list to specify how often the schedule will run and click Create Schedule to complete the process.
	208
	209	Next, create a new schedule called ‘Amalgamated Indexes’ and add the previously created Amalgamated Index using the Items drop-down list. From the Frequency of Execution drop-down list, select Dependent. This allows the schedule to only run once another schedule has completed, ensuring that the Index does not build until all of the source data has been refreshed successfully. Select the ‘Source Data’ schedule from the Dependent Index Schedules drop-down list and the screen should resemble the following screenshot:
	210
	211	[[image:Screenshot 25.png]]
	212
	213	Finally, create a third schedule titled ‘ETL Indexes’ and select the previously created ETL Index selected from the Items drop-down list and Dependent from the Frequency of Execution drop-down list. This time select the ‘Amalgamated Indexes’ schedule. This will instruct the system to only perform the ETL processing once the Amalgamated Index has built. The screen should resemble the following screenshot:
	214
	215	[[image:Screenshot 26.png]]
	216
	217	With all three items created, the following schedules should be visible:
	218
	219	[[image:Screenshot 27.png]]
	220
	221	Creating these dependent schedules will guarantee that the Indexes automatically build in the correct order, accounting for new information and changed records over time.

Wiki source code of 04. ETL Data Processing Guide