SSIS provides capabilities for ETL operations using a control flow and data flow engine. It allows importing and exporting data, integrating heterogeneous data sources, and supporting BI solutions. Key concepts include packages, control flow, data flow, variables, and event handlers. SSIS can be optimized for scalability through techniques like parallelism, avoiding blocking transformations, and leveraging SQL for aggregations. Performance can be monitored using tools like SQL Server logs, WMI, and MOM. SSIS is interoperable with data sources like Oracle, Excel, and flat files.
Slide Title: Title Slide Keywords: Key Message: Title Slide Slide Builds: 0 Slide Script: Hello and welcome to this Microsoft TechNet session on Advanced SQL Server 2005 Integration Services. My name is {insert name}. Slide Transition: Let’s start this session by going into more detail on exactly what we will be covering. Slide Comment: Additional Information:
Slide Title: What We Will Cover Keywords: Key Message: What we will cover Slide Builds: 2 Slide Script: We’re going to cover three advanced techniques. The first thing we’ll cover is using Web Services with SQL Server Integration Services. We’ve had a number of questions about how to use Integration Services within the Web Services environment. Organizations now are exposing more of their data and business processes through Web Services, so it’s natural to want to use Web Services with the data integration processes. [BUILD1] We’ll also talk about text mining techniques for Integration Services. Much of the data found in businesses is unstructured, as we’ll discuss during this session. Businesses need the ability to pull key words and phrases from this unstructured, free-text data and build warehouses, reference tables, and other useful data structures from it. We’ll talk about how to use text mining with Integration Services to achieve some of those goals. [BUILD2] Finally, we’ll talk about how to use data mining within Integration Services. The ability to use data mining within Integration Services is one of the most compelling new features of Integration Services, and we’ll discuss some of the business cases for using it. Slide Transition: As with most TechNet sessions, some prior experience of Microsoft technologies or similar technologies is always helpful. Here’s a brief overview of what would be helpful, but not essential, for this session. Slide Comment: Additional Information:
Slide Title: Helpful Experience Keywords: Key Message: Helpful Experience Slide Builds: 1 Slide Script: As we go through today's session, you will hear various Microsoft acronyms and terminology. While we will explain all new terms related to today's session, there are some general terms from the industry or from other versions of Microsoft products that we may not spend time on. To help you out, we have listed the areas that it may be helpful to be familiar with, either prior to this session or to reference afterwards. A basic knowledge of how to build data flows within SQL Server Integration Services is required. [BUILD1] Familiarity with scripting using languages such as VBScript is helpful, but not absolutely essential. Slide Transition: To cover the topics mentioned and keep the session flow going, we have divided the session up into the following agenda items. Slide Comment: Additional Information:
Slide Title: Agenda: Using Web Services with SSIS Keywords: Key Message: This agenda item discusses how to use Web Services with Integration Services. Slide Builds: 0 Slide Script: First, we’ll look at how to use the Web Services with Integration Services, including using the Web Service as part of the Script component to retrieve and process data through the Web Service. [BUILD1] After that, we’ll take a look at how to do text mining with Integration Services, and discuss why it’s an important feature of Integration Services. [BUILD2] Finally, we’ll look at how to use data mining tools directly within an Integration Services package and what the ramifications are for being able to use this interesting new feature of Integration Services. Slide Transition: Let’s talk about using Web services with SQL Server Integration Services. Slide Comment: Additional Information:
Can use like DTS if you want to…
Compare OLD ETL approach to SSIS approach Especially mention: Flexible sources Flexible Transformations Flexible Destinations, especially OLEDB
Demo 1 – Going to use a small piece of Project Real Data. For those of you who have not heard of Project Real. It is a sample BI implementation in SQL 2005 based on Barnes and Noble. All the schemas, ETL’s, Reports and data are published along with a set of white papers and best practise guidelines for large scale projects. We are just looking at loading Vendors (Suppliers in non US speak!). Scenario 1: We have 250,000 active vendors and we wish to load them from our source database into our data warehouse. Accounts have provided a list of blacklisted suppliers and we need to clash this to add an attribute to supplier to indicate if he is black listed. Show Query Plan. Why Use Lookup ? Show execution and while executing talk about pipeline, buffering and caching. More to come on why to sue Data Flows later Scenario 2: Our beloved accounts department can only supply the blacklist on an excel and we need to import them. Scenario 3 (Workflow): Accounts now want you to filter out Active Vendors who are on the blacklist and insert them into a table in the DWH for later investigation. After demo show how to do some of in excel (Demo1_SQL). Discuss limits of SQL (one table scan per task). Scenario 4: Accounts (who have no concept of SOA or databases) have said they can only supply a spreadsheet in workbook form, each page is filled in by a diff dept, and departments come and go. They want you to record the reason/dept for the blacklist (the worksheet name) and to email the head of finance a spreadsheet with any active suppliers that are flagged as blacklisted, and the dept that flagged them. Discuss: Sequence Containers, Variables, Scripting, For Loop (data flow1). Multi cast, Conditional split, Unicode issues (data flow 2), send email task (control flow). Use of Control Flow for rest of tables.
Demo 1 – use of Scripting to Infer a Dimension Explain Early Arriving Facts e.g. Sales arriving before customer.
Row Transformations - Row transformations either manipulate data or create new fields using the data that is available in that row. Examples of SSIS components that perform row transformations include Derived Column, Data Conversion, Multicast, and Lookup. While these components might create new columns, row transformations do not create any additional records. Because each output row has a 1:1 relationship with an input row, row transformations are also known as synchronous transformations . Row transformations have the advantage of reusing existing buffers and do not require data to be copied to a new buffer to complete the transformation. Partially blocking transformations - Partially blocking transformations are often used to combine datasets. They tend to have multiple data inputs. As a result, their output may have the same, greater, or fewer records than the total number of input records. Since the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations . Examples of partially blocking transformation components available in SSIS include Merge, Merge Join, and Union All. With partially blocking transformations, the output of the transformation is copied into a new buffer and a new thread may be introduced into the data flow. Blocking transformations - Blocking transformations must read and process all input records before creating any output records. Of all of the transformation types, these transformations perform the most work and can have the greatest impact on available resources. Example components in SSIS include Aggregate and Sort. Like partially blocking transformations, blocking transformations are also considered to be asynchronous. Similarly, when a blocking transformation is encountered in the data flow, a new buffer is created for its output and a new thread is introduced into the data flow. Parallelism – Packages, Tasks and Transformations can be executed in parallel
Row Transformations - Row transformations either manipulate data or create new fields using the data that is available in that row. Examples of SSIS components that perform row transformations include Derived Column, Data Conversion, Multicast, and Lookup. While these components might create new columns, row transformations do not create any additional records. Because each output row has a 1:1 relationship with an input row, row transformations are also known as synchronous transformations . Row transformations have the advantage of reusing existing buffers and do not require data to be copied to a new buffer to complete the transformation. Partially blocking transformations - Partially blocking transformations are often used to combine datasets. They tend to have multiple data inputs. As a result, their output may have the same, greater, or fewer records than the total number of input records. Since the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations . Examples of partially blocking transformation components available in SSIS include Merge, Merge Join, and Union All. With partially blocking transformations, the output of the transformation is copied into a new buffer and a new thread may be introduced into the data flow. Blocking transformations - Blocking transformations must read and process all input records before creating any output records. Of all of the transformation types, these transformations perform the most work and can have the greatest impact on available resources. Example components in SSIS include Aggregate and Sort. Like partially blocking transformations, blocking transformations are also considered to be asynchronous. Similarly, when a blocking transformation is encountered in the data flow, a new buffer is created for its output and a new thread is introduced into the data flow. Parallelism – Packages, Tasks and Transformations can be executed in parallel
First Example has a blocking shape, so no parallelism In second Example only destination is in parallel In Third example, everything is in parallel If SQL is your source, look carefully at aggregating in select statement
Demo 1 – use of Scripting to Infer a Dimension Explain Early Arriving Facts e.g. Sales arriving before customer.