07c. ETL Package

Last modified by Ross Beck on 06/11/2025, 14:23

The ETL Package Wizard encompasses a number of Data Sources to facilitate the creation of data structures in a single screen.

Rather than creating separate Indexes and processing them to create more Indexes, the ETL Wizard can be used to connect to and process data resulting in a single Data Source, Data Source Group and Index.

Navigate to the ETL Package Wizard screen by clicking Wizards, then ETL Package.

Click the New button to begin the process.

Details

Enter the name in the Name textbox. This name must be unique.

Use the Tags textbox to add associated search terms to the data source. This allows access via customised strings when using the search bar.

To automatically restrict access to the Index once built, enable the Sensitive option. Only users with explicit permission will be able to access the Index.

Use the Notes textbox to enter additional information for other administrator users to view when editing the structure.

The Index Method drop-down list specifies the build method to be used when building the Index.

The Complete option will result in all items in a data source being indexed when a refresh takes place. Once refreshed, a new version number is applied to the Index. Each run completely refreshes the Index.

Building an Incremental Index enables an optimised refresh process when rebuilding Indexes to account for any changes. Rather than rebuilding the entire Index, only data that has been modified or added will be processed. New and changed records are added as new rows. This maintains the history of a record that may have changed over time, and no de-duplication is applied. When selected, the Incremental Identifier drop-down list is revealed. This is required to identify changes, using the selected field to detect when new data is greater than the previous highest value.

The Cumulative option works the same as an Incremental build, but replaces duplicates as well as adding new data. Duplicate records are removed from the result set based on the Increment Identifier and Primary Key.

Use the Index Size drop-down list to specify the expected number of records the resulting Index will contain. This is an indicative value and does not need to be exact.

Finally, specify the the Search Engine the Index will be added to from the Search Engines drop-down list and the Collection from the Collections drop-down list.

Stages

To add a new stage to the Index, click the + icon. This will present a list of the available Data Sources.

Indexes built as stages in the ETL Wizard will not appear in any other screen, and are treated as 'temporary' Indexes that facilitate the final output from the wizard.

To create a copy of a stage, click the Cog icon and click Copy. To move a stage, click the Cog icon and click the required move option. Please note that stages that reference values from a previous stage will require changes to ensure the structure of the processing is conserved.

To delete a stage, click the Cog icon and click Remove.

Amalgamated

Using the Amalgamated Data Source, two or more existing Indexes can be joined based on a number options to de-duplicate and filter the resulting output.

For more information, please refer to the Amalgamated Data Source Wizard chapter.

Calculations

This stage is used to create additional fields using the Data Source Group calculations functionality.

Please note that any field required for a calculation must be included in a prior stage.

For more information regarding the available calculation functions, please refer to the Calculation Builder chapter.

Calendar Assistant

The Calendar Assistant Data Source adds rows to the provided Index created from intersections in the specified date ranges.

Select a source value from the Index drop-down list to populate the Start Date and End Date drop-down lists with the required date fields. The Use Today If End Date Is Null option, when checked, will replace the End Date with the current system date if the End Date is NULL.

Enable the Build Daily Calendar option if an automatic daily refresh the data source is required, or the Months option to instead add values on a monthly basis.

Data Mart

The Data Mart Data Source allows users to extract a portion of an existing Index and incorporate the data into a standalone Data Source.

By saving a query within an Index that restricts the data at the required level, this query can then be selected to create an independent data source with a view to increasing performance and removing any irrelevant entries for the end user.

A Data Mart cannot be built from an unstructured Index.

Click the Select Query button to open the Load screen, where a previously saved Query can be selected, the results of which will drive the Index records.

Database

The Database Data Source contains the options to create Index structures from a number of database types along with advanced options such as procedural correction and derived dates.

For more information regarding the available options, please refer to the Database Data Source Wizard chapter.

ETL

By creating an Index that contains a number of techniques to loop through the data, flags can be returned that can be used to manipulate the underlying records.

This is achieved using the Extract, Transform and Load (ETL) Data Source to identify and amend target data through a number of configurable stages.

Select the required Index from the Index drop-down list to reveal the options below.

Stage Management

Check the Disable Stage textbox to retain the tab and all settings, but not apply it to the output.

Mandatory

Select fields from the Keys drop-down list that will be used to identify each unique record when looping through the data. Click the X icon next to a selected field remove it.

Select fields from the Order drop-down list that define the ordering of the data when the looping though the records. This also impacts the ordering in the Query screen once the Index has been built.

General

Enable the Squash option to filter the data to a single record for the defined Key. When using Squash, only fields added to the drop-down lists in the Copy tab will appear in the next stage (or final ETL output if there are no subsequent stages). As such, the field(s) selected as the primary key from the Keys drop-down list will be automatically added to the Distinct list found in the Copy tab. This is to prevent zero fields being present in the output. Please note that a transpose stage cannot follow a squash, and this function will need to be performed in a separate Index.

Enable the Others option to copy all documents that do not matched a defined Saved Query and append them to the bottom of the Index.

The Saved Query option can be used to subset the data within an Index. This is applied before the looping techniques.

Display Field Mapping

Each tab in this section enables the user to perform a number of different tasks. Any tab that has had an option set will be highlighted with a star in the top right corner.

Flags

These functions will return a value of ‘True’ or ‘False’ against every specified field for each row in the Index. The data is not filtered, only a flag is set.

Select individual values using the relevant drop-down list, or click All to apply the flag to all fields. Delete individual selections by clicking the X icon next to the relevant field, or clear all selections by clicking None.

The following flags are available:

Function	Description
Has Null	Returns true against all the key if any values of the specified field(s) contain a NULL. Based on the field(s) that have been set to use this option, a new column called HASNULL will be created. It will populate the column with a value of true for all values of the key is any row contains a NULL (for the key).
Is Minimum	Returns true for each minimum value of the specified field(s). Based on the field(s) that have been set to use this option, a new column called ISMIN will be created. It will populate the column with a value of true for the minimum value(s) it finds in the specified field. If it finds duplicate minimum values, it will flag true against each one.
Is First minimum	Returns true for the first minimum value of the specified field(s). Based on the field(s) that have been set to use this option, a new column called ISFIRSTMIN will be created. It will populate the column with a value of true for the first minimum value it finds in the specified field (based on the order of the data).
Is Last Minimum	Returns true for the last minimum value of the specified fields(s). Based on the field(s) that have been set to use this option, a new column called ISLASTMIN will be created. It will populate the column with a value of true for the last minimum value it finds in the specified field (based on the order of the data).
Is Maximum	Returns true for each maximum value of the specified field(s). Based on the field(s) that have been set to use this option, a new column called ISMAX will be created. It will populate the column with a value of true for the maximum value(s) it finds in the specified field. If it finds duplicate maximum values, it will flag true against each one.
Is First Maximum	Returns true for the first maximum value of the specified field(s). Based on the field(s) that have been set to use this option, a new column called ISFIRSTMAX will be created. It will populate the column with a value of true for the first maximum value it finds in the specified field (based on the order of the data).
Is Last Maximum	Returns true for the last maximum value of the specified field(s). Based on the field(s) that have been set to use this option, a new column called ISLASTMAX will be created. It will populate the column with a value of true for the last maximum value it finds in the specified field (based on the order of the data).
Is First	Returns true for the first value found in the specified field(s). Based on the field(s) that have been set to use this option, a new column called ISFIRST will be created. It will populate the column with a value of true for the first value it finds in the specified field (based on the order of the data).
Is Last	Returns true for the last value found in the specified field(s). Based on the field(s) that have been set to use this option, a new column called ISLAST will be created. It will populate the column with a value of true for the last value it finds in the specified field (based on the order of the data).
Is Greater Than Zero	Returns true for each value of the specified field(s) where the value of the field is greater than zero. Based on the field(s) that have been set to use this option, a new column called ISGREATERTHANZERO will be created. It will populate the column with a value of true for each instance it finds where the specified field is greater than zero.
Is Greater Than One	Returns true for each value of the specified field(s) where the value of the field is greater than one. Based on the field(s) that have been set to use this option, a new column called ISGREATERTHANONE will be created. It will populate the column with a value of true for each instance it finds where the specified field is greater than one.
Is Working Day	Returns true for each value of the specified field(s) where the date value falls on a working day, as specified in the System Settings. Based on the field(s) that have been set to use this option, a new column called ISWORKINGDAY will be created. It will populate the column with a value of true for each instance it finds a date matching the specified working days.

Copy

Function	Description
Copy	Copies all values of the specified field(s) into a single record. Based on the field(s) that have been set to use this option, a new column called COPY will be created. It will populate the column with all value(s) it finds in the specified field against every row for that specified key.
Analyzed	Copies all values of the specified field(s), ordered by the Order value. Based on the field(s) that have been set to use this option, a new column called ANALYZED will be created. It will populate the column with all value(s) it finds in the specified field against every row for that specified key. The order of the values will be set based on the Order value. The fields will be analyzed to enable free text searching
Unique	Copies all unique values of the specified field(s), ordered ascendingly, into a single record. Based on the field(s) that have been set to use this option, a new column called UNIQUE will be created. It will populate the column with all unique value(s) it finds in the specified field against every row for that specified key. The order of the values will be in ascending order.
Transpose	Turns every unique value in a column into its own field. Based on the field that has been set to use this option, new columns will be created. It will populate the columns with all the corresponding values from the field specified by "Transpose Values".
Flattened Transpose	For example if a value 'A' in Flattened Transpose Headers corresponds with the values "B" and "C" in Flattened Transpose Values then two columns will be created "A B" and "A C". The fields in the created columns are populated with a "Y" or left blank. Based on the field(s) that have been set to use this option, a new column called DISTINCT will be created. It will populate the column with all distinct value(s) it finds in the specified field against every row for that specified key. The order of the values will be set based on the Order value.
Distinct	Copies all distinct values of the specified field(s), ordered by the Order value. Based on the field(s) that have been set to use this option, a new column called DISTINCT will be created. It will populate the column with all distinct value(s) it finds in the specified field against every row for that specified key. The order of the values will be set based on the Order value.
Concatenate	Concatenates all values (comma separated) from the selected field based on the specified key(s). Based on the field(s) that have been set to use this option, a new column called CONCATENATE will be created. It will populate the column with a concatenated, comma separated string of all values found in the selected field based on the specified key. It displays all values, not all unique values.
Concatenate Literal	Concatenates all values (no spaces) from the selected field based on the specified key(s). Based on the field(s) that have been set to use this option, a new column called CONCATENATE will be created. It will populate the column with a concatenated string of all values found in the selected field based on the specified key. It displays all values, not all unique values.

Positional

Function	Description
First	Writes the maximum value of the specified field(s) based on the selected key. Based on the field(s) that have been set to use this option, a new column called MAX will be created. It will populate the column with the maximum value it finds in the specified field against every row for that specified key.
Last	Writes the last value of the specified field(s) based on the selected key and the order of the data. Based on the field(s) that have been set to use this option, a new column called LAST will be created. It will populate the column with the last value it finds in the specified field against every row for that specified key (based on the order that has been set).
Previous	Writes the previous value of the specified field(s) based on the selected key and the order of the data. Based on the field(s) that have been set to use this option, a new column called PREVIOUS will be created. It will populate the column with the previous value it finds in the specified field against the specified key (based on the order that has been set).
Next	Writes the next value of the specified field(s) based on the selected key and the order of the data. Based on the field(s) that have been set to use this option, a new column called NEXT will be created. It will populate the column with the next value it finds in the specified field against the specified key (based on the order that has been set).
Lower	Writes the lower value of the specified field(s) based on the selected key and the order of the data. Based on the field(s) that have been set to use this option, a new column called LOWER will be created. It will populate the column with the next value it finds in the specified field against the specified key (based on the order that has been set).
Higher	Provides an incremental number against several rows of data based on the specified key(s) and the data containing dates that overlap. Based on the field(s) that have been set to use this option, a new column called OVERLAP will be created. It will populate the column with a numeric value starting from 0 and incrementing by 1 for each subsequent group of records that have dates that overlap.
Overlapping Dates Sequence	Writes the higher value of the specified field(s) based on the selected key and the order of the data. Based on the field(s) that have been set to use this option, a new column called HIGHER will be created. It will populate the column with the next value it finds in the specified field against the specified key (based on the order that has been set).
Sequence Starting from 0	Provides an incremental number against each row of data based on the specified key(s) and the order of the data Based on the field(s) that have been set to use this option, a new column called SEQUENCE0 will be created. It will populate the column with a numeric value starting from 0 and incrementing by 1 for each subsequent record for the specified key.
Sequence Starting From 1 On Value Change	Provides an incremental number against each row of data based on the specified key(s) and the order of the data. Based on the field(s) that have been set to use this option, a new column called SEQUENCEVALUECHANGE will be created. It will populate the column with a numeric value starting from 1 and incrementing by 1 for each subsequent record where the value changes for the specified key.
Sequence Starting from 1	Provides an incremental number against each row of data based on the specified key(s) and the order of the data. Based on the field(s) that have been set to use this option, a new column called SEQUENCE1 will be created. It will populate the column with a numeric value starting from 1 and incrementing by 1 for each subsequent record for the specified key.
Descending Sequence Ending With 0	Provides a descending number against each row of data based on the specified key(s) and the order of the data. Based on the field(s) that have been set to use this option, a new column called REVERSESEQUENCE0 will be created. It will populate the column with a numeric value starting from the number of records minus one and decrementing by 1 for each subsequent record for the specified key.
Descending Sequence Ending With 1	Provides a descending number against each row of data based on the specified key(s) and the order of the data. Based on the field(s) that have been set to use this option, a new column called REVERSESEQUENCE1 will be created. It will populate the column with a numeric value starting from the number of records and decrementing by 1 for each subsequent record for the specified key.

Aggregations

Function	Description
Minimum	Writes the minimum value of the specified field(s) based on the selected key. Based on the field(s) that have been set to use this option, a new column called MIN will be created. It will populate the column with the minimum value it finds in the specified field against every row for that specified key.
Maximum	Writes the maximum value of the specified field(s) based on the selected key. Based on the field(s) that have been set to use this option, a new column called MAX will be created. It will populate the column with the maximum value it finds in the specified field against every row for that specified key.
Count	Provides a count of all non-NULL values in a specified field for a selected key. Based on the field(s) that have been set to use this option, a new column called COUNT will be created. It will populate the column with a count of all non-NULL values it finds in the specified field based on the specified key.
Count All	Provides a count of all (including NULL) values in a specified field for a selected key. Based on the field(s) that have been set to use this option, a new column called COUNT will be created. It will populate the column with a count of all (including NULL) values it finds in the specified field based on the specified key.
Count Unique	Provides a count of unique values in a specified field for a selected key. Based on the field(s) that have been set to use this option, a new column called COUNTUNIQUE will be created. It will populate the column with a count of unique values it finds in the specified field based on the specified key.
Sum	Aggregates the data based on the specified key(s). Based on the field(s) that have been set to use this option, a new column called SUM will be created. It will populate the column with a sum of the field based on the specified key.
Cumulative Sum	Aggregates the data based on the specified key(s). Based on the field(s) that have been set to use this option, a new column called CUMULATIVESUM will be created. It will populate the column with a cumulative sum of the field based on the specified key.
Average	Averages the data based on the specified key(s). Based on the field(s) that have been set to use this option, a new column called AVERAGE will be created. It will populate the column with an average of the field based on the specified key.

Banding and Ranking

Function	Description
Band	Creates bands for the data by diving the range of the numbers by 10. It is advised to use this function only when there are more than 10 unique values. e.g if there are 100 entries, the bands would be 1-10, 11-20, 21-30 etc.
Ranking	Ranks the values from 1 the highest to n the lowest.
Rank Within	Ranks the values from 1 the highest to n the lowest within the current set of records.
Range Band	Bands the values in 10 groups, the banding is calculated by adding 2 standard deviations to the mean value, subtracting the 2 standard deviations from the mean value, and then dividing this range into 10.
Percentile Frequency	Bands the values in 10 Percentile groups depending on the frequency of the values, ideally each group will have 10 percent of the values.
Percentile Value	Bands the values in 10 Percentile groups depending on the sum of the values, ideally each group will have 10 percent of the values.

Calculations

Using the Calculations tab, custom calculations can be added.

Give the calculation a name in the Label textbox, and click the ... icon next to the Calculation box to open the Calculation Builder. Enter the required calculation and click Apply to save the changes made, or Cancel to discard changes. Please refer to the Calculation Builder chapter for more detailed information regarding the functionality and available options.

To add additional calculations, click Add. To remove an added calculation, click the X icon next to the relevant entry.

Please note that any field required for a calculation following a Squash stage must be selected in either the Copy, Distinct or Unique lists as part of that stage.

Additional

Additional Display Field Mapping options will become available if a value has been set for the Is First Minimum, Is Last Minimum, Is First Maximum or Is Last Maximum flags.

Function	Description
<Flag Name> <Field Name>	Based on the flag chosen, this outputs the value(s) in the field(s) selected as they appear where the flag is true, against each row for that key.
<Flag Name> <Field Name> Offset	Allows an offset value to be returned. For example, entering 1 will return the next value in the list, and entering -1 will return the previous value in the list.

Reverse Transpose

Use the Reverse Transpose Data Source to transpose values within rows into columns.

Use the Primary Key dropdown list to specify what will be used to identify the columns to be transposed.

Enter a row name in the Transpose Field Name textbox, and a name for the associated values in the Transpose Value Name textbox.

By default, blank fields are not included as part of the reverse transpose process. Enable the Include Blank Fields option to include them in the output.

Rules

A Rules data source will use pre-set rules to validate or update data.

Rules can be applied to individual fields and the output will inform the user of the number of rules that have passed or failed. Rules are categorised as either major or minor to allow the output to be split based on priority.

Please refer to the Rules section of the Data Warehouse for chapter more information.

Store in Index

Use this stage to specify a subset of data that will be used to create the resulting Index. This can be used at multiple times within a structure to create separate Indexes at a certain point in the order.

Specify the name of this subset Index using the Create as Name textbox, and specify the fields that will be included using the Display Fields drop-down list.

Finally, specify the maximum row count that the subset Index can contain using the Number to Index option.

Transpose

Use the Transpose Data Source to transpose values within columns into rows.

Use the Primary Key dropdown list to specify what will be used to identify the rows to be transposed.

Specify the fields in the Field 1 (Column Headers) dropdown list that will be transposed into column headers once the Index has been created. The fields specified in the Field 2 (Values) dropdown list will populate these new columns based on the primary key value. To include additional fields in the new Index, select them from the Extra Fields (Distincts) dropdown list.

If the fields specified in the Field 2 (Values) dropdown list contain multiple values, enabling the Flatten option will split these values into separate columns.

Configure Index

Once one or more stages are added, the Configure Index tab will be revealed. These options control the Index-level output that can be configured prior to completing the wizard.

For more information regarding the available options, please refer to the Creating an Index chapter.