SlideShare uma empresa Scribd logo
1 de 60
Baixar para ler offline
Talend Open Studio 
Fundamentals 
gabrielebaldassarre.com
What is Talend for Data Integration? 
❏ Eclipse-based visual programming IDE for ETL 
applications 
❏ Java code generator 
❏ 600+ connectors for open and proprietary data systems 
❏ Easily embeddable in custom applications 
❏ Cross-platform 
❏ Central metadata repository 
❏ Available in both open source and premium flavours
What does ETL stand for? 
It summarizes every operation that loads, retrieves, 
digests, consumes, transforms and shapes data: 
❏ Extract - get the data from different sources. 
From flat files, RDBMS, Big Data systems, web services, business... 
❏ Transform - convert it in a form suitable for the destination 
data system. 
Aggregate, transform, combine, reshape, clean, filter, improve quality... 
❏ Load - move to target destination in a suitable way. 
Write the data in the target format.
Talend Open Studio 
❏ It’s the open source, free to use, community-supported 
version of Talend for Data Integration; 
❏ Often abbreviated in “TOS”, to differ from the premium 
version (“TIS”); 
❏ Features-lite, but still completely usable: 
❏ Same set of connectors and components of the premium 
version; 
❏ It misses team working and Enterprise capabilities like 
SVN, scheduling, process orchestrations and monitoring 
console.
Hands on! 
❏ Download Talend Open Studio for Data Integration 
❏ https://www.talend.com/download/data-integration 
❏ Download the user manual as well 
❏ Install it! 
❏ Optional: 
❏ Prepare a quick MySQL stack for a ready-to-start 
database and other commodities 
❏ https://github.com/r8/vagrant-lamp it’s worth the try
Say hello to TOS!
TOS Interface: Designer 
The Designer is the “canvas” where 
you’re going to “draw” your ETL job, 
graphically connecting components each 
others using different kinds of 
connectors.
TOS Interface: Components Palette 
The Palette on the right hosts the 
complete set of 600+ available 
components, both custom and built. 
Use the search field to quickly filter the 
palette views and find the component 
you need in a glance.
TOS Interface: Opened Jobs 
Currently Opened jobs are 
tabbed on top...
TOS Interface: Repository Pane 
The Repository pane hosts all the 
metadata, like DB connections 
credentials, external delimited file 
schemas, parameters and the whole set 
of ETL jobs themselves.
TOS Interface: Parameters Pane 
The Parameters pane hosts all the 
select-component settings, job settings 
and parameters, debug status and the 
diagnostic tab.
TOS Interface: Perspectives 
...and different Perspectives 
are available on top-notch 
corner. 
Both TOS and standard Eclipse 
perspectives are available 
here.
Workspaces 
A Workspace is a container of Projects which shares the 
same TOS version and the same components palette. 
Like Eclipse, you can choose which one to use when the 
program starts. 
❏ In TOS, it’s a folder in the local drive.
Projects 
❏ A Project is a set of jobs and involved metadata; 
❏ It’s defined under a subfolder into the Workspace; 
❏ Both TOS and Eclipse Preferences are Project-based 
❏ In other words, different projects in the same Workspace 
have different settings; 
❏ Internally, it’s a mix of XML, .items and .properties files 
in a classical Eclipse flavour.
Metadata: General Principles 
❏ TOS requires preliminary definition and 
description of jobs using metadata. 
The Repository holds this information. 
❏ There are 8 types of metadata, 
although custom components can 
define their owns. We’ll look the most 
important ones in details: 
❏ Business Models, Job Projects, Contexts, 
Code, Metadata.
Metadata: Business Models 
❏ It stores diagrams used to 
conveniently describe business models 
and to embed them with ETL; 
❏ It offers a small set of drawing 
capabilities in UML-fashion; 
❏ It’s not widely used, but it’s proven to 
be useful to quickly sketch-up 
transformation goals and for auto-documenting 
ETL.
Metadata: Jobs 
❏ It’s the warm heart of TOS Repository: 
the jobs themselves; 
❏ Here you’ll store all the metadata you 
need for graphically describing the jobs 
❏ Components used, connectors, signals, 
parameters, colors and presentation 
stuff are hosted here. 
❏ You can (you should!) organize them in 
a tree manner for better clarity.
Metadata: Contexts 
❏ It stores context groups which are 
parameters sets that can be used by 
any job in current Project. 
❏ A group is a set of initialized java 
variables of one of the allowed types in 
the global scope. 
❏ Groups are for presentation only: you’ 
ve no limitations on how many or how 
to use context variables in jobs.
Metadata: Code 
❏ It stores routines written in Java; 
❏ These routines are typically a set of 
static methods inside a class. 
❏ If your routine is going to be too much 
complex, consider writing a custom 
component instead. 
❏ Consider using maven and git while 
creating a routine for better reliability. 
❏ https://github.com/theclue/talend-routine-collection
Metadata: ...Metadata? 
❏ It stores a heterogeneous set of 
reusable, atomic elements for jobs. 
❏ They include database parameters and 
credentials, external files schema, web 
service interfaces, business 
applications connections and so on. 
❏ User components often add their 
metadata types to the list, but this 
often breaks compatibility
Anatomy of a Job 
❏ A Job is a visual set of components graphically 
connected using different connections; 
❏ From the visual canvas and the connection topology, 
TOS in turn generates Java code; 
❏ This code is procedural by design and not really object 
oriented: 
❏ It’s fast… 
❏ ...but the debug is a pain in the neck for the experienced 
programmer.
Anatomy of a job 
❏ Drag and Drop components from the Palette to the canvas, 
then visually connect them each other. 
❏ You cannot make closed paths in your jobs! 
❏ It’ll become clear later why.
Anatomy of a job: Subjobs 
❏ A set of connected components is part of a subjob if they are 
all enclosed by a light-blue background; 
❏ You can have as many subjobs you need in a given job.
Anatomy of a job: Starting Point 
❏ The starting point component of a subjob is the one with a 
green background; 
❏ Parallel execution is made using unconnected subjobs, but 
you won’t be able to predict the execution order!
Anatomy of a job: Main Connections 
❏ The Main connections are those that dictate the data flow; 
❏ They carry on vectors of data (one vector per row/tuple);
Anatomy of a job: Main Connections 
❏ The Main connections are those that dictate the data flow; 
❏ They carry on vectors of data (one vector per row/tuple); 
❏ When you have a split, the order dictates who’s come first. 
You may change it from the contextual menu.
Anatomy of a job: Lookup Connections 
❏ Lookup connections, as the name suggests, make data 
available for fast-lookup (ie join or match operations). 
❏ Typically, lookup data vectors are stored in-memory during 
job processing. So watch out for memory shortage!
Anatomy of a job: Endpoints 
❏ Endpoints are components that have not outgoing 
connection. 
❏ A given subjob can have as many endpoints as needed (think 
about of what’s going on after a split operation like the above).
Signals and Data Connections 
❏ There are three types of connections in standard TOS: 
❏ Row 
❏ Trigger 
❏ Iterator 
❏ You may select which connection to use from the 
contextual menu of any component instance.
Row 
❏ Rows are connections that carry on data, one tuple at 
once; 
❏ Their content is defined by a Schema; 
❏ They are used to connect components; 
❏ Components connected this way will end up in the same 
subjob; 
❏ Main, Lookup, Filter, Merge are all data connections; 
❏ Custom components can define their own Data 
Connection.
Schema 
❏ Schema is an important inner concept in TOS design; 
❏ Each Row connection must have non-null schema 
declaration which defines the dimensionality of the 
vector of data ingoing and outgoing to/from a given 
component; 
❏ Several primitive java types are supported.
Triggers 
❏ Triggers, as the name suggest, won’t carry on data, 
but are actually signals. 
❏ They are usually used to connect subjobs. 
❏ They comes in two main flavours, depending on their 
scope: Sub Job Triggers and Component Triggers. 
❏ They’re typically Go/No-Go events to trig the execution 
of one or more subjobs;
Sub Job Triggers 
❏ Sub Job Triggers are the most 
widely used in practice; 
❏ They are used to connect the 
starting points of subjobs; 
❏ When connected this way, 
subjobs will execute sequentially, 
forcing an execution order; 
❏ You’ll end up having only one 
starting point for the whole chain.
Run If Triggers 
❏ Run If Trigger is a special type of trigger that is fired 
only if the embedded expression is evaluated to true. 
❏ The expression must be written in Java and have a 
boolean outcome.
Iterators 
❏ Iterators stands in the middle between Data 
Connections and Triggers; 
❏ They won’t carry on data like Rows… 
❏ ...but they’re not fired only once like Triggers. 
❏ Think of them like Triggers which will be fired once for 
each incoming row. 
❏ They are connected to starting points, like SubJob 
Triggers, but originates from standard components like 
Row Connections.
Component Parameters 
❏ When you select a component instance, the parameter 
pane will show the relevant fields to you to fill up; 
❏ Several types of parameters are allowed: dropdown, 
radio buttons, schemas, text fields... 
❏ Text fields will often end up writing their value into the 
generated java code as-is, so be sure to write them 
properly: 
❏ Enclose strings in double quotes; 
❏ Be sure to match the expected type, or cast 
otherwise
Components and Repository 
❏ Very often, Components allows you to select a relevant 
metadata from the Repository; 
❏ Doing so, you will be able to keep parameters between 
jobs and component instances “in sync”; 
❏ However, this is not mandatory and at any time you 
can detach the component from the Repository. 
❏ This brings the component in “built in” state, which 
means that its parameters are locally defined and won’t 
be updated anymore if the Repository is.
The Context 
❏ The Context holds parameters defined at compile time 
❏ Those parameters are grouped in Context Groups and 
defined into the Repository as primitive java types. 
❏ Then, they will end up as public attributes of the 
context object inside the code. 
❏ For example, a parameter named “foo” will be referenced 
using the syntax context.foo in code and paramters 
fields. 
❏ Just like parameters, “built in” Context can be defined, 
too, to scope it in local job only.
The Global Map 
❏ The Global Map holds parameters defined at runtime 
❏ Those parameters live in a pure Java space. 
❏ It’s a Key-Value Map used to store generic Objects: 
❏ globalMap.put(“key”, Object) to store an object 
❏ globalMap.get(“key”) to get an Object 
❏ Since it’s a <Object> Java Map, you must explicitly 
cast to proper type when getting back the object. 
❏ It’s proven very handy when used in conjunction with 
Iterators, as they cannot carry data alone.
Talend Open Studio 
Common-use Components 
gabrielebaldassarre.com
Which component to use…? 
❏ TOS comes with more than 600 general-use items; 
❏ This because it must assure connectivity with tons of 
different data sources (ie RDBMS, appliances…); 
❏ Cleaning up those garbage, you’ll end up with a very 
small subset of life-saving components. We can group 
the most important ones in families and look in details: 
❏ Database, File, Custom Code, Processing, Orchestration
File Components 
❏ These components are used for input and 
output from/to local files; 
❏ Notable features includes the archiving 
capabilities and a complete set of file 
system management stuff, like copy, delete 
or directory listing; 
❏ Under Linux, you can use named pipe for 
streaming data into TOS directly from a 
caller shell.
Database Components 
❏ These components are used for performing 
operations on RDBMS; 
❏ Notable features includes the components 
for SCD and cloud support (ie AWS 
Redshift); 
❏ Unfortunately, for licensing issues, you often 
have to download the jdbc wrapper from 
the RDBMS vendor by yourself in order to 
use it in TOS; quite annoying!
Custom Code Components 
❏ These components allow you to directly 
write java code into your Job; 
❏ Although quite hard to manage, these are 
real life-saver in lot of different situations; 
❏ Typical use case is when you want to import 
and use an external java library or method. 
❏ Several components are available for 
different scopes, ie generate data flows, 
processing rows, etc...
Processing Components 
❏ These are probably the most important 
components at all; 
❏ They include sort, filter, aggregation, join, 
sampling, XML traversing; 
❏ But the most important component ever is 
the tMap; 
❏ It’s a general purpose multi-input, multi-output 
mapper component. 
❏ We’ll look on it in details...
tMap in a typical Job 
❏ Basically speaking, 
think about a set of 
joins, a set of splits 
and transformations 
set in the middle. 
❏ That’s why it has a 
special user interface.
Say hello to tMap
Say hello to tMap 
Here come the Input Data 
Connections with their own Schemas. 
Only one is the Main connection, the 
others are all Lookup connections. Here’ 
d you’ll set the join conditions. Clicking 
the wrench reveal more options, like the 
join type and how to load the lookup 
tables.
Say hello to tMap 
While on the right pane we’ve the 
Output Data Connections, each of 
them with its Schema, too. Again, the 
wrench reveal more options, for example 
if the connection must catch rows where 
the join has failed and more...
Say hello to tMap 
Each output field is a java expression. 
This mean you can call methods on it, 
user routines, combine expression and 
more. Click on it to open the powerful 
Expression Wizard.
Say hello to tMap 
As a commodity, you have the Var pane 
for adding temp variables. Use it if your 
inner transformations cannot be easily 
handled in a single-line java expression.
Say hello to tMap 
The Schema Editor is for both input 
and output connections. Check and set 
here the data types, the length, the 
nullable flag for each field.
Orchestration Components 
❏ These components, as the name states, are 
used to “make order” inside and outside the 
jobs; 
❏ They allows you to call a TOS jobs from 
another, to put a job in wait state and more. 
❏ Here’re you will find two components to switch 
between Row and Iterator Connections; 
❏ Typical use case is when you want to trig an 
event for each row in the incoming connection.
Other useful components 
❏ tPreJob and tPostJob are two special starting 
points that are respectively triggered before 
and after all other subjobs in the current job; 
❏ tLogRow is to log the content of a given Row 
connection into the console; 
❏ tHashInput and tHashOutput are useful to 
define reusable buffers of data inside a job; 
❏ tLibraryLoad is to import external jars into 
the classpath of the current job.
Talend Open Studio 
Tips and Tricks 
gabrielebaldassarre.com
Tips and Tricks 
❏ Use Repository metadata when possible: 
it’ll make your design more robust. 
❏ Generic Schema metadata, as the name 
suggests, are useful to define schema that you 
don’t want to be format and platform 
dependant, like file schema or database table 
schemas. 
❏ Always documentate your jobs: this can be 
exported to a ready-to-use document then!
Tips and Tricks 
❏ Clicking “Sync Schema” will propagate 
current schema forward changing any 
schema to “built in” in the way. 
❏ Built in Schemas won’t get updated when 
Repository changes! 
❏ If you have large lookups, sort, aggregate 
operations, you may need to rise the amount 
of ram devoted to jvm in Job Parameters. 
❏ You may get a java heap error otherwise.
Tips and Tricks 
❏ Every transformation is a java expression 
in Talend! 
❏ Handle the null value properly to avoid Java 
NullPointerExceptions; 
❏ Use primitive wrapper when possible (ie. 
‘Integer’ instead of ‘int’; 
❏ Use methods, not operators (ie .equals() and . 
concat()). 
❏ Perform filtering as soon as possible to 
reduce the memory consumption.
Getting Help 
❏ Talend Forge: forum, custom components, tutorials, 
bug trackers, example jobs 
❏ http://stackoverflow.com/questions/tagged/talend 
❏ Stack Overflow 
❏ http://stackoverflow.com/questions/tagged/talend 
❏ Books from Packt Publishing 
❏ “Getting started with Talend Open Studio for Data 
Integration” by Jonathan Bowen; 
❏ “Talend Open Studio Cookboo” by Rick D. Barton.
Contacts 
❏ Tutorials 
❏ Custom components 
❏ Ready-made jobs 
❏ Use Cases 
http://gabrielebaldassarre.com 
Need help? Questions? Consulting needs? 
http://gabrielebaldassarre.com/contacts/ 
@cerealping

Mais conteúdo relacionado

Mais procurados

Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Edureka!
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter ArchitectureBigClasses Com
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training PresentationApurba Biswas
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleEDB
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...Edureka!
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaSpark Summit
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 

Mais procurados (20)

Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter Architecture
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Oracle Data integrator 11g (ODI) - Online Training Course
Oracle Data integrator 11g (ODI) - Online Training Course Oracle Data integrator 11g (ODI) - Online Training Course
Oracle Data integrator 11g (ODI) - Online Training Course
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training Presentation
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration Hustle
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...Informatica Transformations with Examples | Informatica Tutorial | Informatic...
Informatica Transformations with Examples | Informatica Tutorial | Informatic...
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 

Semelhante a Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tricks

Performance Tuning of .NET Application
Performance Tuning of .NET ApplicationPerformance Tuning of .NET Application
Performance Tuning of .NET ApplicationMainul Islam, CSM®
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and ElasticsearchDean Hamstead
 
Learn advanced java programming
Learn advanced java programmingLearn advanced java programming
Learn advanced java programmingTOPS Technologies
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Ravi Okade
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)David McCarter
 
Dot Net Fundamentals
Dot Net FundamentalsDot Net Fundamentals
Dot Net FundamentalsLiquidHub
 
Asp.net interview questions
Asp.net interview questionsAsp.net interview questions
Asp.net interview questionsAkhil Mittal
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9google
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql serverChris Adkin
 
SAS Programming.ppt
SAS Programming.pptSAS Programming.ppt
SAS Programming.pptssuser660bb1
 
Interview questions(programming)
Interview questions(programming)Interview questions(programming)
Interview questions(programming)sunilbhaisora1
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)David McCarter
 
How to ace your .NET technical interview :: .Net Technical Check Tuneup
How to ace your .NET technical interview :: .Net Technical Check TuneupHow to ace your .NET technical interview :: .Net Technical Check Tuneup
How to ace your .NET technical interview :: .Net Technical Check TuneupBala Subra
 
ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2Neeraj Mathur
 
Oracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyOracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyAsis Mohanty
 

Semelhante a Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tricks (20)

Stl
StlStl
Stl
 
Performance Tuning of .NET Application
Performance Tuning of .NET ApplicationPerformance Tuning of .NET Application
Performance Tuning of .NET Application
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
Technical Interview
Technical InterviewTechnical Interview
Technical Interview
 
Learn advanced java programming
Learn advanced java programmingLearn advanced java programming
Learn advanced java programming
 
Dev381.Pp
Dev381.PpDev381.Pp
Dev381.Pp
 
Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)Optimizing Application Architecture (.NET/Java topics)
Optimizing Application Architecture (.NET/Java topics)
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
 
Dot Net Fundamentals
Dot Net FundamentalsDot Net Fundamentals
Dot Net Fundamentals
 
Asp.net interview questions
Asp.net interview questionsAsp.net interview questions
Asp.net interview questions
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql server
 
SAS Programming.ppt
SAS Programming.pptSAS Programming.ppt
SAS Programming.ppt
 
Interview questions(programming)
Interview questions(programming)Interview questions(programming)
Interview questions(programming)
 
Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)Building nTier Applications with Entity Framework Services (Part 1)
Building nTier Applications with Entity Framework Services (Part 1)
 
How to ace your .NET technical interview :: .Net Technical Check Tuneup
How to ace your .NET technical interview :: .Net Technical Check TuneupHow to ace your .NET technical interview :: .Net Technical Check Tuneup
How to ace your .NET technical interview :: .Net Technical Check Tuneup
 
ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2
 
Oracle to Netezza Migration Casestudy
Oracle to Netezza Migration CasestudyOracle to Netezza Migration Casestudy
Oracle to Netezza Migration Casestudy
 
73d32 session1 c++
73d32 session1 c++73d32 session1 c++
73d32 session1 c++
 
Ado
AdoAdo
Ado
 

Último

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Último (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tricks

  • 1. Talend Open Studio Fundamentals gabrielebaldassarre.com
  • 2. What is Talend for Data Integration? ❏ Eclipse-based visual programming IDE for ETL applications ❏ Java code generator ❏ 600+ connectors for open and proprietary data systems ❏ Easily embeddable in custom applications ❏ Cross-platform ❏ Central metadata repository ❏ Available in both open source and premium flavours
  • 3. What does ETL stand for? It summarizes every operation that loads, retrieves, digests, consumes, transforms and shapes data: ❏ Extract - get the data from different sources. From flat files, RDBMS, Big Data systems, web services, business... ❏ Transform - convert it in a form suitable for the destination data system. Aggregate, transform, combine, reshape, clean, filter, improve quality... ❏ Load - move to target destination in a suitable way. Write the data in the target format.
  • 4. Talend Open Studio ❏ It’s the open source, free to use, community-supported version of Talend for Data Integration; ❏ Often abbreviated in “TOS”, to differ from the premium version (“TIS”); ❏ Features-lite, but still completely usable: ❏ Same set of connectors and components of the premium version; ❏ It misses team working and Enterprise capabilities like SVN, scheduling, process orchestrations and monitoring console.
  • 5. Hands on! ❏ Download Talend Open Studio for Data Integration ❏ https://www.talend.com/download/data-integration ❏ Download the user manual as well ❏ Install it! ❏ Optional: ❏ Prepare a quick MySQL stack for a ready-to-start database and other commodities ❏ https://github.com/r8/vagrant-lamp it’s worth the try
  • 7. TOS Interface: Designer The Designer is the “canvas” where you’re going to “draw” your ETL job, graphically connecting components each others using different kinds of connectors.
  • 8. TOS Interface: Components Palette The Palette on the right hosts the complete set of 600+ available components, both custom and built. Use the search field to quickly filter the palette views and find the component you need in a glance.
  • 9. TOS Interface: Opened Jobs Currently Opened jobs are tabbed on top...
  • 10. TOS Interface: Repository Pane The Repository pane hosts all the metadata, like DB connections credentials, external delimited file schemas, parameters and the whole set of ETL jobs themselves.
  • 11. TOS Interface: Parameters Pane The Parameters pane hosts all the select-component settings, job settings and parameters, debug status and the diagnostic tab.
  • 12. TOS Interface: Perspectives ...and different Perspectives are available on top-notch corner. Both TOS and standard Eclipse perspectives are available here.
  • 13. Workspaces A Workspace is a container of Projects which shares the same TOS version and the same components palette. Like Eclipse, you can choose which one to use when the program starts. ❏ In TOS, it’s a folder in the local drive.
  • 14. Projects ❏ A Project is a set of jobs and involved metadata; ❏ It’s defined under a subfolder into the Workspace; ❏ Both TOS and Eclipse Preferences are Project-based ❏ In other words, different projects in the same Workspace have different settings; ❏ Internally, it’s a mix of XML, .items and .properties files in a classical Eclipse flavour.
  • 15. Metadata: General Principles ❏ TOS requires preliminary definition and description of jobs using metadata. The Repository holds this information. ❏ There are 8 types of metadata, although custom components can define their owns. We’ll look the most important ones in details: ❏ Business Models, Job Projects, Contexts, Code, Metadata.
  • 16. Metadata: Business Models ❏ It stores diagrams used to conveniently describe business models and to embed them with ETL; ❏ It offers a small set of drawing capabilities in UML-fashion; ❏ It’s not widely used, but it’s proven to be useful to quickly sketch-up transformation goals and for auto-documenting ETL.
  • 17. Metadata: Jobs ❏ It’s the warm heart of TOS Repository: the jobs themselves; ❏ Here you’ll store all the metadata you need for graphically describing the jobs ❏ Components used, connectors, signals, parameters, colors and presentation stuff are hosted here. ❏ You can (you should!) organize them in a tree manner for better clarity.
  • 18. Metadata: Contexts ❏ It stores context groups which are parameters sets that can be used by any job in current Project. ❏ A group is a set of initialized java variables of one of the allowed types in the global scope. ❏ Groups are for presentation only: you’ ve no limitations on how many or how to use context variables in jobs.
  • 19. Metadata: Code ❏ It stores routines written in Java; ❏ These routines are typically a set of static methods inside a class. ❏ If your routine is going to be too much complex, consider writing a custom component instead. ❏ Consider using maven and git while creating a routine for better reliability. ❏ https://github.com/theclue/talend-routine-collection
  • 20. Metadata: ...Metadata? ❏ It stores a heterogeneous set of reusable, atomic elements for jobs. ❏ They include database parameters and credentials, external files schema, web service interfaces, business applications connections and so on. ❏ User components often add their metadata types to the list, but this often breaks compatibility
  • 21. Anatomy of a Job ❏ A Job is a visual set of components graphically connected using different connections; ❏ From the visual canvas and the connection topology, TOS in turn generates Java code; ❏ This code is procedural by design and not really object oriented: ❏ It’s fast… ❏ ...but the debug is a pain in the neck for the experienced programmer.
  • 22. Anatomy of a job ❏ Drag and Drop components from the Palette to the canvas, then visually connect them each other. ❏ You cannot make closed paths in your jobs! ❏ It’ll become clear later why.
  • 23. Anatomy of a job: Subjobs ❏ A set of connected components is part of a subjob if they are all enclosed by a light-blue background; ❏ You can have as many subjobs you need in a given job.
  • 24. Anatomy of a job: Starting Point ❏ The starting point component of a subjob is the one with a green background; ❏ Parallel execution is made using unconnected subjobs, but you won’t be able to predict the execution order!
  • 25. Anatomy of a job: Main Connections ❏ The Main connections are those that dictate the data flow; ❏ They carry on vectors of data (one vector per row/tuple);
  • 26. Anatomy of a job: Main Connections ❏ The Main connections are those that dictate the data flow; ❏ They carry on vectors of data (one vector per row/tuple); ❏ When you have a split, the order dictates who’s come first. You may change it from the contextual menu.
  • 27. Anatomy of a job: Lookup Connections ❏ Lookup connections, as the name suggests, make data available for fast-lookup (ie join or match operations). ❏ Typically, lookup data vectors are stored in-memory during job processing. So watch out for memory shortage!
  • 28. Anatomy of a job: Endpoints ❏ Endpoints are components that have not outgoing connection. ❏ A given subjob can have as many endpoints as needed (think about of what’s going on after a split operation like the above).
  • 29. Signals and Data Connections ❏ There are three types of connections in standard TOS: ❏ Row ❏ Trigger ❏ Iterator ❏ You may select which connection to use from the contextual menu of any component instance.
  • 30. Row ❏ Rows are connections that carry on data, one tuple at once; ❏ Their content is defined by a Schema; ❏ They are used to connect components; ❏ Components connected this way will end up in the same subjob; ❏ Main, Lookup, Filter, Merge are all data connections; ❏ Custom components can define their own Data Connection.
  • 31. Schema ❏ Schema is an important inner concept in TOS design; ❏ Each Row connection must have non-null schema declaration which defines the dimensionality of the vector of data ingoing and outgoing to/from a given component; ❏ Several primitive java types are supported.
  • 32. Triggers ❏ Triggers, as the name suggest, won’t carry on data, but are actually signals. ❏ They are usually used to connect subjobs. ❏ They comes in two main flavours, depending on their scope: Sub Job Triggers and Component Triggers. ❏ They’re typically Go/No-Go events to trig the execution of one or more subjobs;
  • 33. Sub Job Triggers ❏ Sub Job Triggers are the most widely used in practice; ❏ They are used to connect the starting points of subjobs; ❏ When connected this way, subjobs will execute sequentially, forcing an execution order; ❏ You’ll end up having only one starting point for the whole chain.
  • 34. Run If Triggers ❏ Run If Trigger is a special type of trigger that is fired only if the embedded expression is evaluated to true. ❏ The expression must be written in Java and have a boolean outcome.
  • 35. Iterators ❏ Iterators stands in the middle between Data Connections and Triggers; ❏ They won’t carry on data like Rows… ❏ ...but they’re not fired only once like Triggers. ❏ Think of them like Triggers which will be fired once for each incoming row. ❏ They are connected to starting points, like SubJob Triggers, but originates from standard components like Row Connections.
  • 36. Component Parameters ❏ When you select a component instance, the parameter pane will show the relevant fields to you to fill up; ❏ Several types of parameters are allowed: dropdown, radio buttons, schemas, text fields... ❏ Text fields will often end up writing their value into the generated java code as-is, so be sure to write them properly: ❏ Enclose strings in double quotes; ❏ Be sure to match the expected type, or cast otherwise
  • 37. Components and Repository ❏ Very often, Components allows you to select a relevant metadata from the Repository; ❏ Doing so, you will be able to keep parameters between jobs and component instances “in sync”; ❏ However, this is not mandatory and at any time you can detach the component from the Repository. ❏ This brings the component in “built in” state, which means that its parameters are locally defined and won’t be updated anymore if the Repository is.
  • 38. The Context ❏ The Context holds parameters defined at compile time ❏ Those parameters are grouped in Context Groups and defined into the Repository as primitive java types. ❏ Then, they will end up as public attributes of the context object inside the code. ❏ For example, a parameter named “foo” will be referenced using the syntax context.foo in code and paramters fields. ❏ Just like parameters, “built in” Context can be defined, too, to scope it in local job only.
  • 39. The Global Map ❏ The Global Map holds parameters defined at runtime ❏ Those parameters live in a pure Java space. ❏ It’s a Key-Value Map used to store generic Objects: ❏ globalMap.put(“key”, Object) to store an object ❏ globalMap.get(“key”) to get an Object ❏ Since it’s a <Object> Java Map, you must explicitly cast to proper type when getting back the object. ❏ It’s proven very handy when used in conjunction with Iterators, as they cannot carry data alone.
  • 40. Talend Open Studio Common-use Components gabrielebaldassarre.com
  • 41. Which component to use…? ❏ TOS comes with more than 600 general-use items; ❏ This because it must assure connectivity with tons of different data sources (ie RDBMS, appliances…); ❏ Cleaning up those garbage, you’ll end up with a very small subset of life-saving components. We can group the most important ones in families and look in details: ❏ Database, File, Custom Code, Processing, Orchestration
  • 42. File Components ❏ These components are used for input and output from/to local files; ❏ Notable features includes the archiving capabilities and a complete set of file system management stuff, like copy, delete or directory listing; ❏ Under Linux, you can use named pipe for streaming data into TOS directly from a caller shell.
  • 43. Database Components ❏ These components are used for performing operations on RDBMS; ❏ Notable features includes the components for SCD and cloud support (ie AWS Redshift); ❏ Unfortunately, for licensing issues, you often have to download the jdbc wrapper from the RDBMS vendor by yourself in order to use it in TOS; quite annoying!
  • 44. Custom Code Components ❏ These components allow you to directly write java code into your Job; ❏ Although quite hard to manage, these are real life-saver in lot of different situations; ❏ Typical use case is when you want to import and use an external java library or method. ❏ Several components are available for different scopes, ie generate data flows, processing rows, etc...
  • 45. Processing Components ❏ These are probably the most important components at all; ❏ They include sort, filter, aggregation, join, sampling, XML traversing; ❏ But the most important component ever is the tMap; ❏ It’s a general purpose multi-input, multi-output mapper component. ❏ We’ll look on it in details...
  • 46. tMap in a typical Job ❏ Basically speaking, think about a set of joins, a set of splits and transformations set in the middle. ❏ That’s why it has a special user interface.
  • 47. Say hello to tMap
  • 48. Say hello to tMap Here come the Input Data Connections with their own Schemas. Only one is the Main connection, the others are all Lookup connections. Here’ d you’ll set the join conditions. Clicking the wrench reveal more options, like the join type and how to load the lookup tables.
  • 49. Say hello to tMap While on the right pane we’ve the Output Data Connections, each of them with its Schema, too. Again, the wrench reveal more options, for example if the connection must catch rows where the join has failed and more...
  • 50. Say hello to tMap Each output field is a java expression. This mean you can call methods on it, user routines, combine expression and more. Click on it to open the powerful Expression Wizard.
  • 51. Say hello to tMap As a commodity, you have the Var pane for adding temp variables. Use it if your inner transformations cannot be easily handled in a single-line java expression.
  • 52. Say hello to tMap The Schema Editor is for both input and output connections. Check and set here the data types, the length, the nullable flag for each field.
  • 53. Orchestration Components ❏ These components, as the name states, are used to “make order” inside and outside the jobs; ❏ They allows you to call a TOS jobs from another, to put a job in wait state and more. ❏ Here’re you will find two components to switch between Row and Iterator Connections; ❏ Typical use case is when you want to trig an event for each row in the incoming connection.
  • 54. Other useful components ❏ tPreJob and tPostJob are two special starting points that are respectively triggered before and after all other subjobs in the current job; ❏ tLogRow is to log the content of a given Row connection into the console; ❏ tHashInput and tHashOutput are useful to define reusable buffers of data inside a job; ❏ tLibraryLoad is to import external jars into the classpath of the current job.
  • 55. Talend Open Studio Tips and Tricks gabrielebaldassarre.com
  • 56. Tips and Tricks ❏ Use Repository metadata when possible: it’ll make your design more robust. ❏ Generic Schema metadata, as the name suggests, are useful to define schema that you don’t want to be format and platform dependant, like file schema or database table schemas. ❏ Always documentate your jobs: this can be exported to a ready-to-use document then!
  • 57. Tips and Tricks ❏ Clicking “Sync Schema” will propagate current schema forward changing any schema to “built in” in the way. ❏ Built in Schemas won’t get updated when Repository changes! ❏ If you have large lookups, sort, aggregate operations, you may need to rise the amount of ram devoted to jvm in Job Parameters. ❏ You may get a java heap error otherwise.
  • 58. Tips and Tricks ❏ Every transformation is a java expression in Talend! ❏ Handle the null value properly to avoid Java NullPointerExceptions; ❏ Use primitive wrapper when possible (ie. ‘Integer’ instead of ‘int’; ❏ Use methods, not operators (ie .equals() and . concat()). ❏ Perform filtering as soon as possible to reduce the memory consumption.
  • 59. Getting Help ❏ Talend Forge: forum, custom components, tutorials, bug trackers, example jobs ❏ http://stackoverflow.com/questions/tagged/talend ❏ Stack Overflow ❏ http://stackoverflow.com/questions/tagged/talend ❏ Books from Packt Publishing ❏ “Getting started with Talend Open Studio for Data Integration” by Jonathan Bowen; ❏ “Talend Open Studio Cookboo” by Rick D. Barton.
  • 60. Contacts ❏ Tutorials ❏ Custom components ❏ Ready-made jobs ❏ Use Cases http://gabrielebaldassarre.com Need help? Questions? Consulting needs? http://gabrielebaldassarre.com/contacts/ @cerealping