SQL Server Integration Services - Toronto Code Camp
I wanted to thank the organizers and sponsors for the first (of many I hope) Toronto Code Camp. There was great content, with tracks on .NET development, ASP.NET, Data/Security and Future Technologies. Several hundred attendees registered and the event was sold out. Early feedback shows that people were very happy with the event as it provided information from some of the best technical minds in the country for a great price – free! There were eleven MVPs in attendance and a couple of regional directors so there were plenty of resources available.
I presented on common transformations in SQL Server Integration Services (SSIS). We started off with simple stuff but quickly moved to more complex issues like normalizing a poor data structure. It’s common to see data structures with repeating values like sales01, sales02…sales0n in a table. The repeating values problem is the cause of programming hacks when it comes to table inserts, deletes and reporting. They are very common and no one knows about them until application modifications have to be made or the data must be used for analytical applications. The Unpivot transform in SSIS makes this it very easy to move the data to the proper structure.
We also talked about importing data like address lists into a table. The problem here is to remove duplicates or to find near matches. Let’s say you have just purchased a mailing list and you want to see how many of the addresses in the purchased list match those in the list you already own. But names might not be spelled exactly the same as the names in your list. The name “John Smith” might be “John Smyth” or “Jonathon Smith” in your list. The fuzzy lookup transform allows you to perform a lookup and then decide whether or not to accept the new row based on a numeric confidence level. You decide how tolerant you want to be and the transform then allows you to keep only the rows that meet a certain threshold.
Customer information is often entered into independent systems for different reasons like accounting, service, sales, and technical support. When you need to join a query across all these systems you often end up with may more customers than you actually have! This one transform is great if you want to consolidate the same information form multiple databases into a single place.
I presented on common transformations in SQL Server Integration Services (SSIS). We started off with simple stuff but quickly moved to more complex issues like normalizing a poor data structure. It’s common to see data structures with repeating values like sales01, sales02…sales0n in a table. The repeating values problem is the cause of programming hacks when it comes to table inserts, deletes and reporting. They are very common and no one knows about them until application modifications have to be made or the data must be used for analytical applications. The Unpivot transform in SSIS makes this it very easy to move the data to the proper structure.
We also talked about importing data like address lists into a table. The problem here is to remove duplicates or to find near matches. Let’s say you have just purchased a mailing list and you want to see how many of the addresses in the purchased list match those in the list you already own. But names might not be spelled exactly the same as the names in your list. The name “John Smith” might be “John Smyth” or “Jonathon Smith” in your list. The fuzzy lookup transform allows you to perform a lookup and then decide whether or not to accept the new row based on a numeric confidence level. You decide how tolerant you want to be and the transform then allows you to keep only the rows that meet a certain threshold.
Customer information is often entered into independent systems for different reasons like accounting, service, sales, and technical support. When you need to join a query across all these systems you often end up with may more customers than you actually have! This one transform is great if you want to consolidate the same information form multiple databases into a single place.