This next entry in the ‘ simple ETL generation series ‘ introduces additional scenarios for the automation of Hub ETL development .
Beyond the relatively straightforward approach of defining a Business Key using a direct attribute-to-attribute key mapping, a common requirement is the ability to combine multiple data elements into a Business Key definition.
This is a typical scenario when it is necessary to ‘uniquefy’ a Business Key. This may occur when certain Business Keys are not ‘strong’ enough, for instance because a technical key has been used (there may not be something better available). In this scenario, the Business Key definition may need to be extended by adding a source system identifier, or through some other means of combining attributes to guarantee uniqueness (including hard-coded values).
I use the term Business Key definition to indicate how the intended (target) Business Key is represented in the feeding / source system. In other words; how the Business Key is represented in the source. This definition, which can be a combination of values and attributes, can then be mapped to the target (standardised) Data Warehouse Business Key attribute(s) in metadata to facilitate ETL generation.
When mapping the (source) Business Key definition to the target Business Key attribute(s), the most common scenarios besides the straight-up one-to-one mapping are ‘composition’, ‘concatenation’ and ‘pivoting’. In this post I will focus on the first two, as pivoting can be implemented in different ways that (depending on the solution) do not necessarily require pattern changes.
- Composition is where the target Business Key is defined as a collection of columns (i.e. a composite key). Each part of the Business Key is an individual attribute mapping.
- Concatenation is where the Business Key definition contains multiple attributes that are combined into a single string value (including sanding components / delimiters for uniqueness). A concatenated key consists of more than one source attribute but maps to a single target Business Key attribute.
This is shown in the following diagram as well (sanding elements not displayed):

In all cases do the combined Business Keys map to a single Data Warehouse key (i.e. Hash Key, Sequence Value or Natural Business Key).
It is interesting to see to what extent supporting these scenarios complicates the ETL generation code. The more procedural logic is required, the more you may run into development limitations using SQL.
For now, let’s look at how the concatenation and composition scenarios can be supported via SQL-based code generation.
The setup and metadata
Similar to how we generated the regular Hub ETL process using standard SQL we can use the TEAM interface (meta)data (INTERFACE_SOURCE_HUB_XREF) as a sample. This data is available when you install TEAM and generate the sample data, but I have scripted these samples and added these to this post as well.
The sample data can be downloaded here (rename to SQL): Download
This script contains the examples we used in theprevious post, as well as the more complex use-cases we’ll discuss here. The updated code will be able to support all scenarios so far.
If we query this metadata, we will see the following results:
SELECT [HUB_NAME],[HUB_BUSINESS_KEY_DEFINITION],[SOURCE_NAME],[SOURCE_BUSINESS_KEY_DEFINITION] FROM [INTERFACE_SOURCE_HUB_XREF] WHERE [HUB_NAME] IN ('HUB_MEMBERSHIP_PLAN', 'HUB_SEGMENT') -- The complex examples, but simple ones are support also

We will expect that five separate ETL processes will be generated from this metadata. One for each source-to-target mapping.
This query is limited to the examples that contain the concatenation and composition scenarios. These can be identified by the COMPOSITE and CONCATENATE keywords in the (source) Business Key definition that have been prepared by TEAM. In these examples, the definition of the Business Key (as it appears in the source) is the composition (or concatenation of) two attributes.
Generating Hubs with complex Business Keys
Before we dive into code, it is worth investigating the process in a more schematic way – using the Activity Diagram UML notation we used earlier. Compared to theregular Hub process, you can see this diagram has been extended quite a bit to accommodate the handling of different Business Key scenarios.

The major change here is the addition of the ‘key parts’ concept, and various local variables to support the code generation.
The diagram can be downloaded and / or edited here: Download
The overarching idea is to be able to break up the Business Key definition into ‘key parts’ and cycle through these key parts to construct the various clauses in the SQL using the local variables.
How can this be done in SQL? We are looking at two ‘loops’ over the five rows of metadata. One ‘outer’ loop to create the individual Hub ETL process using the source-to-target mapping (one ETL process for one row in the metadata) and an ‘inner’ loop to break up the Business Key definition and iterate over the key parts (Business Key components).
Breaking up the Business Key definition
In the sample data, the Business Key definition is captured in a single value – for example COMPOSITE(Plan_Code;RECORD_SOURCE). There is a neat SQL trick we can use to transpose this data into multiple rows by splitting the value at the semicolon (or any other selected value).
This works by converting the original Business Key definition value to XML, and then split the XML across the nodes. You can try this with the sample data using this query, which is a part of the overall process:
SELECT [HUB_NAME], [SOURCE_NAME], LTRIM(Split.a.value('.', 'VARCHAR(100)')) AS [BUSINESS_KEY_PART], ROW_NUMBER() OVER (PARTITION BY [HUB_NAME], [SOURCE_NAME] ORDER BY (SELECT 100)) AS [ROW_NR] FROM ( SELECT [HUB_NAME], [SOURCE_NAME], [SOURCE_BUSINESS_KEY_DEFINITION], CAST ('<M>' + REPLACE([SOURCE_BUSINESS_KEY_DEFINITION], ';', '</M><M>') + '</M>' AS XML) AS [BUSINESS_KEY_SOURCE_XML] FROM [INTERFACE_SOURCE_HUB_XREF] WHERE [HUB_NAME] = 'HUB_MEMBERSHIP_PLAN' AND SOURCE_NAME='STG_PROFILER_PLAN' ) sub CROSS APPLY sub.[BUSINESS_KEY_SOURCE_XML].nodes ('/M') AS Split(a)
The inner query from the snippet above returns the XML representation of the Business Key as two separate segments :
<M>COMPOSITE(Plan_Code</M> <M>RECORD_SOURCE)</M>
The Cross Apply then applies each row to the selection, and returns the following:

The individual key parts can then be further cleaned up (i.e. removing brackets and keywords) and iterated over to construct the complex Business Key SQL. In this iteration, and by using the available metadata, the metadata can be evaluated to either be treated as a composite key or a concatenated key.
The full SQL can be downloaded here, but is also shown below. Download
--USE [] -- Parameters DECLARE @targetDatabase VARCHAR(100) = '[DVI_200_Integration_Layer]'; DECLARE @targetSchema VARCHAR(100) = 'dbo' DECLARE @sourceDatabase VARCHAR(100) = '[DVI_100_Staging_Area]'; DECLARE @sourceSchema VARCHAR(100) = 'dbo' DECLARE @loadDateTimeAttribute VARCHAR(100) = 'LOAD_DATETIME' DECLARE @etlProcessIdAttribute VARCHAR(100) = 'ETL_INSERT_RUN_ID' DECLARE @recordSourceAttribute VARCHAR(100) = 'RECORD_SOURCE' -- Variables / metadata (from the metadata database) DECLARE @targetTable VARCHAR(100); DECLARE @sourceTable VARCHAR(100); DECLARE @targetBusinessKey VARCHAR(MAX); DECLARE @sourceBusinessKey VARCHAR(MAX); -- Variables / local DECLARE @pattern VARCHAR(MAX); -- The complete selection / generated output DECLARE @targetHashKeyName VARCHAR(100); -- The derived name of the Hash Key DECLARE @keyPartSource VARCHAR(100); -- The key component (source name), in case of composite or concatenated keys DECLARE @keyPartTarget VARCHAR(100); -- The key component (target name), in case of composite or concatenated keys DECLARE @selectPart VARCHAR(4000); DECLARE @groupByPart VARCHAR(4000); DECLARE @aliasPart VARCHAR(4000); DECLARE @joinPart VARCHAR(4000); DECLARE @wherePart VARCHAR(4000); DECLARE hub_cursor CURSOR FOR SELECT [HUB_NAME],[HUB_BUSINESS_KEY_DEFINITION],[SOURCE_NAME],[SOURCE_BUSINESS_KEY_DEFINITION] FROM [INTERFACE_SOURCE_HUB_XREF] WHERE [HUB_NAME] IN ('HUB_MEMBERSHIP_PLAN', 'HUB_SEGMENT') -- The complex examples, but simple ones are support also OPEN hub_cursor FETCH NEXT FROM hub_cursor INTO @targetTable, @targetBusinessKey, @sourceTable, @sourceBusinessKey WHILE @@FETCH_STATUS = 0 BEGIN --Clear local variables before each iteration SET @selectPart = ''; SET @groupByPart = ''; SET @aliasPart = ''; SET @joinPart = ''; SET @wherePart = ''; --Create JOIN and WHERE conditions DECLARE keypart_cursor CURSOR FOR WITH [MAINQUERY] AS ( SELECT [HUB_NAME], [SOURCE_NAME], [BUSINESS_KEY_COMPOSITION], CASE WHEN [BUSINESS_KEY_COMPOSITION]='Concatenate' THEN REPLACE(SOURCE_BUSINESS_KEY_DEFINITION,',','+') ELSE [SOURCE_BUSINESS_KEY_DEFINITION] END AS [SOURCE_BUSINESS_KEY_DEFINITION], [HUB_BUSINESS_KEY_DEFINITION] FROM ( SELECT [HUB_NAME], [SOURCE_NAME], CASE WHEN CHARINDEX('COMPOSITE(',[SOURCE_BUSINESS_KEY_DEFINITION], 1) > 0 THEN 'Composite' WHEN CHARINDEX('CONCATENATE(',[SOURCE_BUSINESS_KEY_DEFINITION], 1) > 0 THEN 'Concatenate' ELSE 'Regular' END AS [BUSINESS_KEY_COMPOSITION], REPLACE( REPLACE( REPLACE( REPLACE([SOURCE_BUSINESS_KEY_DEFINITION],'COMPOSITE(','') ,'CONCATENATE(','') ,')','') ,';',',') AS [SOURCE_BUSINESS_KEY_DEFINITION], -- Strip out any metadata information i.e. classification, commas and brackets [HUB_BUSINESS_KEY_DEFINITION] FROM [INTERFACE_SOURCE_HUB_XREF] WHERE [HUB_NAME] = ''+@targetTable+'' AND [SOURCE_NAME] = ''+@sourceTable+'' ) sub -- Define the source business key as XML ), [SOURCEKEY] AS ( SELECT [HUB_NAME], [SOURCE_NAME], [BUSINESS_KEY_COMPOSITION], [SOURCE_BUSINESS_KEY_DEFINITION], CAST ('<M>' + REPLACE([SOURCE_BUSINESS_KEY_DEFINITION], ',', '</M><M>') + '</M>' AS XML) AS [BUSINESS_KEY_SOURCE_XML] FROM [MAINQUERY] -- Define the target business key as XML ), [TARGETKEY] AS ( SELECT [HUB_NAME], [SOURCE_NAME], [BUSINESS_KEY_COMPOSITION], [HUB_BUSINESS_KEY_DEFINITION], CAST ('<M>' + REPLACE([HUB_BUSINESS_KEY_DEFINITION], ',', '</M><M>') + '</M>' AS XML) AS [BUSINESS_KEY_TARGET_XML] FROM [MAINQUERY] -- Break up the source business key in parts to support composite keys ), [SOURCEKEYPARTS] AS ( SELECT [HUB_NAME], [SOURCE_NAME], [BUSINESS_KEY_COMPOSITION], LTRIM(Split.a.value('.', 'VARCHAR(100)')) AS [BUSINESS_KEY_PART], ROW_NUMBER() OVER (PARTITION BY [HUB_NAME], [SOURCE_NAME] ORDER BY (SELECT 100)) AS [ROW_NR] FROM [SOURCEKEY] CROSS APPLY [SOURCEKEY].[BUSINESS_KEY_SOURCE_XML].nodes ('/M') AS Split(a) -- Break up the target business key to match the composite keys on ordinal position ), [TARGETKEYPARTS] AS ( SELECT [HUB_NAME], [SOURCE_NAME], [BUSINESS_KEY_COMPOSITION], LTRIM(Split.a.value('.', 'VARCHAR(100)')) AS [BUSINESS_KEY_PART], ROW_NUMBER() OVER (PARTITION BY [HUB_NAME], [SOURCE_NAME] ORDER BY (SELECT 100)) AS [ROW_NR] FROM [TARGETKEY] CROSS APPLY [TARGETKEY].[BUSINESS_KEY_TARGET_XML].nodes ('/M') AS Split(a) ) SELECT SOURCEKEYPARTS.BUSINESS_KEY_PART AS SOURCE_BUSINESS_KEY_PART, TARGETKEYPARTS.BUSINESS_KEY_PART AS TARGET_BUSINESS_KEY_PART FROM SOURCEKEYPARTS JOIN TARGETKEYPARTS ON SOURCEKEYPARTS.[HUB_NAME]=TARGETKEYPARTS.[HUB_NAME] AND SOURCEKEYPARTS.[SOURCE_NAME] = TARGETKEYPARTS.[SOURCE_NAME] WHERE SOURCEKEYPARTS.ROW_NR = TARGETKEYPARTS.ROW_NR OPEN keypart_cursor FETCH NEXT FROM keypart_cursor INTO @keyPartSource, @keyPartTarget WHILE @@FETCH_STATUS = 0 BEGIN -- Also support concatenate keys SELECT @keyPartSource = REPLACE(@keyPartSource,'+',' + stg.') -- Evaluate the various pattern snippets SET @selectPart = @selectPart+'ISNULL(RTRIM(CONVERT(NVARCHAR(100), stg.'+ @keyPartSource +')),''NA'')+''|'' +'+CHAR(13); SET @groupByPart = @groupByPart+' stg.'+@keyPartSource+','+CHAR(13); SET @aliasPart = @aliasPart + ' stg.'+@keyPartSource+' AS '+@keyPartTarget+','+CHAR(13); SET @joinPart = @joinPart + ' stg.'+@keyPartSource+' = hub.'+@keyPartTarget+' AND'+CHAR(13); SET @wherePart = @wherePart + ' stg.'+@keyPartSource+' IS NOT NULL AND hub.'+@keyPartTarget+' IS NULL AND'+CHAR(13); FETCH NEXT FROM keypart_cursor INTO @keyPartSource, @keyPartTarget END CLOSE keypart_cursor; DEALLOCATE keypart_cursor; --End of key part cursor --Remove trailing parts from key parts SET @selectPart = LEFT(@selectPart,DATALENGTH(@selectPart)-2) SET @groupByPart = LEFT(@groupByPart,DATALENGTH(@groupByPart)-2) SET @aliasPart = LEFT(@aliasPart,DATALENGTH(@aliasPart)-1) SET @joinPart = LEFT(@joinPart,DATALENGTH(@joinPart)-4) SET @wherePart = LEFT(@wherePart,DATALENGTH(@wherePart)-4) --Derive the hash key column name SET @targetHashKeyName = REPLACE(@targetTable,'HUB_','') +'_HSH'; --Insert into pattern SET @pattern = '-- Working on mapping to ' + @targetTable + ' from source table ' + @sourceTable+CHAR(13)+CHAR(13); SET @pattern = @pattern+'USE '+@sourceDatabase+CHAR(13)+CHAR(13); SET @pattern = @pattern+'INSERT INTO '+@targetDatabase+'.'+@targetSchema+'.'+@targetTable+CHAR(13); SET @pattern = @pattern+'('+@targetHashKeyName+', '+@targetBusinessKey+', '+@loadDateTimeAttribute+', '+@etlProcessIdAttribute+', '+@recordSourceAttribute+')'+CHAR(13); SET @pattern = @pattern+'SELECT'+CHAR(13); SET @pattern = @pattern+' HASHBYTES(''MD5'','+CHAR(13); SET @pattern = @pattern+''+@selectPart+CHAR(13); SET @pattern = @pattern+' ) AS '+@targetHashKeyName+','+CHAR(13); SET @pattern = @pattern+''+@aliasPart+CHAR(13); SET @pattern = @pattern+' MIN(stg.'+@loadDateTimeAttribute+') AS '+@loadDateTimeAttribute+','+CHAR(13); SET @pattern = @pattern+' -1 AS '+@etlProcessIdAttribute+','+CHAR(13); SET @pattern = @pattern+' stg.'+@recordSourceAttribute+''+CHAR(13); SET @pattern = @pattern+'FROM '+@sourceTable+' stg'+CHAR(13); SET @pattern = @pattern+'LEFT OUTER JOIN '+@targetDatabase+'.dbo.'+@targetTable+' hub ON '+CHAR(13); SET @pattern = @pattern+''+@joinPart+CHAR(13); SET @pattern = @pattern+'WHERE '+CHAR(13); SET @pattern = @pattern+''+@wherePart+CHAR(13); SET @pattern = @pattern+'GROUP BY'+CHAR(13); SET @pattern = @pattern+' HASHBYTES(''MD5'','+CHAR(13); SET @pattern = @pattern+''+@selectPart+CHAR(13); SET @pattern = @pattern+' ),'+CHAR(13); SET @pattern = @pattern+' stg.'+@recordSourceAttribute+','+CHAR(13);; SET @pattern = @pattern+@groupByPart PRINT @pattern+CHAR(13); FETCH NEXT FROM hub_cursor INTO @targetTable, @targetBusinessKey, @sourceTable, @sourceBusinessKey END CLOSE hub_cursor; DEALLOCATE hub_cursor;
Executing this code against the sample data will yield the five expected ETL processes with correct handling of complex Business Keys. An example is added below. The complex key handling is inserted in various places, such as the Hash calculation, the join condition and the Where clause.
INSERT INTO [DVI_200_Integration_Layer].dbo.HUB_MEMBERSHIP_PLAN (MEMBERSHIP_PLAN_HSH, PLAN_CODE,PLAN_SUFFIX, LOAD_DATETIME, ETL_INSERT_RUN_ID, RECORD_SOURCE) SELECT HASHBYTES('MD5', ISNULL(RTRIM(CONVERT(NVARCHAR(100), stg.Plan_Code)),'NA')+'|' + ISNULL(RTRIM(CONVERT(NVARCHAR(100), stg.RECORD_SOURCE)),'NA')+'|' ) AS MEMBERSHIP_PLAN_HSH, stg.Plan_Code AS PLAN_CODE, stg.RECORD_SOURCE AS PLAN_SUFFIX, MIN(stg.LOAD_DATETIME) AS LOAD_DATETIME, -1 AS ETL_INSERT_RUN_ID, stg.RECORD_SOURCE FROM STG_PROFILER_PERSONALISED_COSTING stg LEFT OUTER JOIN [DVI_200_Integration_Layer].dbo.HUB_MEMBERSHIP_PLAN hub ON stg.Plan_Code = hub.PLAN_CODE AND stg.RECORD_SOURCE = hub.PLAN_SUFFIX WHERE stg.Plan_Code IS NOT NULL AND hub.PLAN_CODE IS NULL AND stg.RECORD_SOURCE IS NOT NULL AND hub.PLAN_SUFFIX IS NULL GROUP BY HASHBYTES('MD5', ISNULL(RTRIM(CONVERT(NVARCHAR(100), stg.Plan_Code)),'NA')+'|' + ISNULL(RTRIM(CONVERT(NVARCHAR(100), stg.RECORD_SOURCE)),'NA')+'|' ), stg.RECORD_SOURCE, stg.Plan_Code, stg.RECORD_SOURCE
What’s next?
The code is quite a bit more complex compared to the original Hub example. Indeed, the SQL code will get progressively more complex the further we continue to support additional scenarios and patterns. The VEDW (code generation) and TEAM (metadata management) applications continue to try to separate concerns and are evolving to make managing this complexity a bit easier (hopefully). The more complex logic becomes, the easier it can be to adopt programming language such as C#. This allows for greater flexibility in managing code complexity compared to SQL.
This is the last post before I head to Europe to deliver the Virtual Data Warehouse implementation and generation training a few times (June 17-9 and 24-26 in The Netherlands). I’ll pick this things up again when I’m back again in Australia.
In the next post I’m planning to look into applying the templating engine approach to generate scenarios covered in this post using the generic interface format for ETL automation metadata .
After this, we can probably leave the Hub behind and start focusing on generating the remaining Data Vault components.
CoLaBug.com遵循[CC BY-SA 4.0]分享并保持客观立场,本站不承担此类作品侵权行为的直接责任及连带责任。您有版权、意见、投诉等问题,请通过[eMail]联系我们处理,如需商业授权请联系原作者/原网站。